diff --git a/migration/MANIFEST.md b/migration/MANIFEST.md new file mode 100644 index 0000000..ed485c9 --- /dev/null +++ b/migration/MANIFEST.md @@ -0,0 +1,69 @@ +# Migration MANIFEST + +Generated 2026-05-10 from master `ac2f89a` for cross-machine carrier. + +## Per-file mapping + +| Source path in `migration/` | Bytes | Target path on new machine | +|---|---|---| +| `claude-memory/MEMORY.md` | * | `$HOME/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/MEMORY.md` | +| `claude-memory/project_*.md` (~102 files) | ~1.1 MB total | `$HOME/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/` | +| `project-root/dot-claude/settings.json` | small | `/.claude/settings.json` | +| `project-root/ppc-manual/{alu,branch,categories,control,forms,fpu,generator,memory,vmx,vmx128}/` + `index.json` + `README.md` | ~3.7 MB | `/ppc-manual/` | +| `project-root/run-canary.sh` | 199 B | `/run-canary.sh` (chmod +x) | +| `README.md` | — | (this file, stays in repo) | +| `setup.sh` | — | (this file, stays in repo) | +| `MANIFEST.md` | — | (this file, stays in repo) | + +Where `` is the parent directory of the `xenia-rs` clone. + +## Pinned references + +- xenia-rs HEAD at snapshot time: **`ac2f89a`** (master, on this branch the tip is past it with 3 backfill commits) +- xenia-canary HEAD: **`6de80dffe261b368ecefee36c9b2b337335228c0`** — `setup.sh` checks this out automatically +- xenia-canary remote: `https://git.mc02.dev/fabi/Xenia-Canary.git` +- xenia-rs remote: `https://git.mc02.dev/fabi/xenia-rs.git` + +## Last completed audit chain + +``` +AUDIT-050 (reframe) ───► AUDIT-051 (sub_8245B078 divergence) + │ + ▼ + AUDIT-052 (struct dump → cache miss) + │ + ▼ + AUDIT-053 (persistent cache test) + │ + ▼ +LANDED 2a8ff95 ◄────────── AUDIT-054 (VFS layout fix + opt-in persist) + │ + ▼ + AUDIT-055 (sub_8245B078 body parity) + │ + ▼ + AUDIT-056 (LR distribution, 3.21× throughput gap) + │ + ▼ + AUDIT-057 (13 missing threads, top sub_825070F0) + │ + ▼ + AUDIT-058 (activation ladder, AUDIT-049 wedge upstream) + │ + ▼ + [PAUSED] AUDIT-059 recommended (γ-wedge pivot on 0x12A4) +``` + +Each audit's findings are in `audit-runs/audit-NNN-.../` (committed) and +its memory file `project_xenia_rs_audit_NNN_*.md` (in `migration/claude-memory/`). + +## Not in bundle (external) + +| File | Why | How to restore | +|---|---|---| +| Sylpheed ISO (~7.8 GB) | Copyright + size | Manual copy from original machine | +| `sylpheed.db` (~395 MB) | Regenerable; permanent git bloat | Run analyzer after build | +| `target/` | Build artifacts | `cargo build --release` | +| `audit-runs/**/{*.log,*.stdout,*.stderr}` (~11 GB) | Probe firehoses | Rerun audit if needed | +| `audit-runs/**/*.bin` (~4.5 GB) | Memory dumps | Rerun audit-026/027/029 if needed | +| `xenia-canary/` checkout | Separate repo | `setup.sh` reclones automatically | diff --git a/migration/README.md b/migration/README.md new file mode 100644 index 0000000..ec99fc0 --- /dev/null +++ b/migration/README.md @@ -0,0 +1,116 @@ +# Cross-machine migration snapshot + +This directory bundles the parts of the working state that live OUTSIDE +the `xenia-rs` git repo, so a fresh clone on another machine can be brought +up to the exact same configuration without manual file-shuffling. + +It is paired with branch **`chore/portable-snapshot`**. If you're reading +this on a machine other than the original, you are on that branch. + +## TL;DR — how to set up a new machine + +```bash +# 1. Clone into the canonical path (matches embedded paths in memory). +mkdir -p ~/'RE Project Sylpheed' +cd ~/'RE Project Sylpheed' +git clone https://git.mc02.dev/fabi/xenia-rs.git +cd xenia-rs +git checkout chore/portable-snapshot + +# 2. Run the installer (idempotent; safe to re-run). +bash migration/setup.sh + +# 3. Manual steps the script will remind you about: +# - Copy the Sylpheed ISO into the project root. +# - Regenerate sylpheed.db once (analysis tooling pulls XEX from the ISO). +# - Build canary Debug if you intend to run cross-runtime probes. +# - Switch back to master and continue from HEAD ac2f89a (or merge this +# branch into master if you want to keep the audit-runs/findings.md +# history; the branch is purely additive). +``` + +## What gets installed where + +| Source under `migration/` | Target on new machine | +|---|---| +| `claude-memory/` (1.1 MB, 103 files) | `~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/` | +| `project-root/dot-claude/settings.json` | `/.claude/settings.json` | +| `project-root/ppc-manual/` (3.7 MB) | `/ppc-manual/` | +| `project-root/run-canary.sh` | `/run-canary.sh` | + +Where `` is the parent directory of this xenia-rs clone +(i.e. `~/'RE Project Sylpheed'/` if you followed the TL;DR layout). + +## What is NOT bundled and why + +| Thing | Why excluded | How to restore on new machine | +|---|---|---| +| Sylpheed ISO (~7.8 GB) | Size + copyright; cannot ship via git | Copy manually from the original machine to `/Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso` | +| `sylpheed.db` (~395 MB) | Reproducible from XEX + analysis tooling; permanent git bloat | After cargo build, regenerate. See *Regenerating sylpheed.db* below | +| `xenia-canary` repo | Separate git project with own remote | `cd && git clone https://git.mc02.dev/fabi/Xenia-Canary.git xenia-canary && cd xenia-canary && git checkout 6de80dffe`. (`setup.sh` does this automatically if the directory is missing.) | +| `target/` build artifacts | Reproducible via `cargo build` | `cargo build --release` | +| Probe `.log`/`.stdout`/`.stderr` raw dumps (~11 GB) | Already gitignored; only summaries committed | Not needed; rerun the relevant audit if you want fresh logs | +| Memory-dump `.bin` files (~4.5 GB) | Captured by audits 026/027/029; gitignored | Re-run those audits if needed | + +## Regenerating sylpheed.db + +After `cargo build --release` succeeds and the ISO is in place: + +```bash +cd ~/'RE Project Sylpheed/xenia-rs' +# The analyzer binary scans the XEX inside the ISO and writes sylpheed.db +# next to it. (Exact subcommand to be confirmed against current main.rs: +# look near `--analyze` or `analyze` subcommand.) +cargo run --release --bin xenia-rs -- analyze \ + "../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" +``` + +The output is `sylpheed.db` (~395 MB) and is gitignored. + +If the analyzer subcommand has changed, the source of truth is the +post-M1-M12-overhaul analysis crates under +`crates/xenia-analysis/` + `crates/xenia-app/src/main.rs`. Check `--help` +for the current invocation. + +## Picking up where the previous session left off + +The most recent audit chain (050-058) is summarized in +`~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/MEMORY.md` +(restored by `setup.sh`). Specifically: + +- Master HEAD `ac2f89a` — post-AUDIT-054 VFS layout fix. +- Plateau: `swaps=1 / draws=0`. +- Last unfinished audit: **AUDIT-059**. Recommended next step is in + `project_xenia_rs_audit_058_sub825070F0_activation_2026_05_10.md` — + pivot to unblocking the AUDIT-049 main-thread wedge (handle 0x12A4), + not chasing the static caller ladder of sub_825070F0 further. + +To continue: instruct the agent on the new machine to "resume the +autonomous audit loop from AUDIT-059 per the memory file's +recommendations." The agent should read MEMORY.md first to load +context, then dispatch the next audit. + +## Branch policy + +`chore/portable-snapshot` is purely additive over master: +- Commit 1: `audit-findings.md` backfill (1943 lines of audit history) +- Commit 2: `audit-runs/` summary artifacts (~284 files, ~52 MB) +- Commit 3: this `migration/` directory + `.gitignore` `*.bin` exclusion + +None of the commits touch crate source code. Merging into master is safe +once you're satisfied the new machine works. The branch can also be kept +as a stable snapshot anchor if you prefer keeping master purely "code". + +## Verifying integrity post-setup + +```bash +# After setup.sh: +ls "$HOME/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/" | wc -l # should be ~103 +cat "$HOME/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/MEMORY.md" | head -5 +cat "/.claude/settings.json" | head -10 +ls "/ppc-manual/" # alu/ branch/ fpu/ etc. +ls "/xenia-canary" # should exist after setup.sh +git -C "/xenia-canary" rev-parse HEAD # 6de80dffe +``` + +If any of those checks fail, rerun `setup.sh` from inside `migration/`. diff --git a/migration/claude-memory/MEMORY.md b/migration/claude-memory/MEMORY.md new file mode 100644 index 0000000..e11c0a0 --- /dev/null +++ b/migration/claude-memory/MEMORY.md @@ -0,0 +1,105 @@ +# Memory Index + +- [audit_058_sub825070F0_activation](project_xenia_rs_audit_058_sub825070F0_activation_2026_05_10.md) — Canary fires sub_825070F0 1× after `DiscImageDevice::ResolvePath(\\dat\\movie)`. Static caller ladder (6 levels: sub_824F7800 ← sub_824F7CD0 ← sub_824F8398 ← sub_821B55D8 ← sub_821B6DF4 [top]). ALL 6 fire 0× in ours. Break is NOT CRT-fnptr-array — it's the AUDIT-049 main-thread wedge (handle 0x12A4 wait at 0x824ac578) blocking the entire post-intro phase. Vtables 0x8200A208/0x8200A928 have **zero vptr_writes in DB** — writer ctor is computed-store-only OR in unreachability island. Confirms AUDIT-050 half-bootstrap: vtable-writer subset dead. AUDIT-059: pivot to unblocking the wedge (049+057 unified γ-investigation), not chasing caller-ladder further. +- [audit_057_thread_gap](project_xenia_rs_audit_057_thread_gap_2026_05_10.md) — Canary 23 thread-spawns / ours 10 = **13 missing**. 8 distinct missing-thread spawners. **Top: sub_825070F0** (4 missing, initializes 4 workers w/ shared ctx 0xBCE25340, entries 0x82506528/58/88/B8). **11 of 13** from spawners that don't fire at all in ours — same audit-050 CRT-fnptr-array unreachability. sub_821C4EB0 fires (1×) but early-returns (audit-056); sub_821746B0 fires 1× / should spawn 2. AUDIT-058: probe sub_825070F0 activation chain in canary, find fnptr-array entry ours doesn't enumerate. Cascade D 10-20% (multiple independent gaps). +- [audit_056_producer_trace](project_xenia_rs_audit_056_producer_trace_2026_05_10.md) — LR distribution at sub_82452DC0: canary 45/60s vs ours 14/26s (3.21× ratio). **Two CANARY-ONLY divergence introducers**: (a) sub_821C4EB0 calls sub_821CEDF8 5× in canary, 0× in ours despite IDENTICAL caller-LR — early-exit somewhere in +0x44..+0xE0; (b) sub_824AFF88 thread-trampoline fires 5× canary / 0× ours — **12 vs 30 XThread count gap** (18 missing threads). 3.0× thread-count ratio matches 3.21× throughput ratio. AUDIT-057: probe sub_82174828 + post-bl PCs inside sub_821C4EB0 (`0x821C4F2C/0x821C5014/0x821C5048`); fallback = audit which 18 thread-creates fail to land. Wine canary not yet justified. +- [audit_055_subB078_internal](project_xenia_rs_audit_055_subB078_internal_2026_05_10.md) — Probed sub_8245B078's body with cache override: BODY EXECUTES CORRECTLY (internal call parity ~98%, sub_8217FA08 2449/2411). Divergence is UPSTREAM — sub_82452DC0 fires 5.6× less in ours. Sharpest specific divergence: sub_8217FA08 from LR=0x82455E60 (=sub_82455DF0+0x70), canary 20 / ours 0. Bug class refined: **δ-throughput** (producer of work items doesn't fire at canary rate). AUDIT-056 recommendation: probe sub_82452DC0 entry, aggregate LR distribution upward. +- [audit_054_vfs_layout_landed](project_xenia_rs_audit_054_vfs_layout_landed_2026_05_10.md) — **TRACK A LANDED**: commits `2a8ff95` (74 LOC) + `ac2f89a` (golden re-baseline). FILE_DIRECTORY_FILE bit threaded through nt_create_file → open_cache_file. `cache:\` with disp=2 opts=0x4021 → mkdir; leaf paths with opts=0x4020 → file. Closes ζ-class layout aliasing. Track B persistent cache OPT-IN via `XENIA_CACHE_PERSIST=1`; default keeps AUDIT-038 wipe for lockstep. Cascade A/B confirmed via AUDIT-053 Phase 1 (canary cache override); **cascade C/D STILL FAIL** — cache is necessary but not sufficient. Warm-start regression (cxx_throw=10) when persistent enabled — our cold-start halts at swaps=1 producing half-baked .tmp journals. Next: probe sub_8245B078 internal body to find next gate. +- [audit_053_cache_layout_bug](project_xenia_rs_audit_053_cache_layout_bug_2026_05_10.md) — Phase 1 confirms AUDIT-052 gate hypothesis (PCs `0x82452E30`/`0x8245B078`/`0x8245AD94` go 0→6 with canary cache override; cascade A/B PASS) BUT cascade C/D FAIL (NtSetEvent 68→63, VdSwap=1 unchanged). Phase 2 permanent fix REVERTED — warm-start regression from VFS layout aliasing: `open_cache_file` treats all `NtCreateFile` as files, but `cache:\d4ea4615 disp=CREATE` is meant as a DIRECTORY. 0-byte file blocks hierarchical creates. AUDIT-038 wipe masked this for 14 audits. **15th reading-error class ζ**: VFS layout aliasing. AUDIT-054 = honor `FILE_DIRECTORY_FILE` bit + retry persistent cache. +- [audit_052_cache_root_cause](project_xenia_rs_audit_052_cache_root_cause_2026_05_10.md) — **🔥 ROOT CAUSE FOUND**: AUDIT-051 struct hypothesis REFUTED (struct bit-identical canary/ours). `[r3+0]`/`[r3+4]` are halves of a hash key formatted into `cache:\\\` paths. Real bug: `NtQueryFullAttributesFile` returns -1 for `cache:\*` in ours because **AUDIT-038's per-boot tmpdir cache is WIPED every startup**. Canary's `~/.local/share/Xenia/cache/` is persistent + pre-populated (d4ea4615/e/46ee8ca etc — game-built shader/PSO/material caches). **Reading-error class #15**: AUDIT-038 "missing-or-stale ≡ fresh" assumption invalidated. Fix = persistent cache (`crates/xenia-kernel/src/state.rs:387-398`). Cascade A/B/C HIGH; D 30-40% (cache may not be only gate). +- [audit_051_work_submitter_trace](project_xenia_rs_audit_051_work_submitter_trace_2026_05_10.md) — **CONCRETE DIVERGENCE FOUND**: `sub_8245B078` fires 18× canary / 0× ours. Gate at `sub_82452DC0+0x78` (PC `0x82452E2C beq cr6, 0x82452E88`) controlled by `sub_8245B000` returning 1 iff `[r3+0]≠0 AND [r3+4]≠0` for 80-byte stack-local struct at `r31+96`. Ours has one of those fields NULL — missing-population. Struct upstream-written by `sub_8245AE50`/`sub_82452068`/`sub_82452200` (fire in both, ours doesn't write right fields). **Single root cause** for AUDIT-047's 4 wedges (0x10A0/0x10A4/0x1530/0x1534) via `sub_8245AD00` + plausibly tid=13's stall on 0x1288. Direct-bl divergence (NOT vtable) — at this level. AUDIT-052: dump struct at PC `0x82452E1C` in both engines, find NULL field, identify missing writer. Cascade D 20-30% (5 vtable dispatches downstream might re-hit island). +- [audit_050_reframe](project_xenia_rs_audit_050_reframe_2026_05_10.md) — **🔥 METHODOLOGICAL REFRAME (2026-05-10)**: 11 prior audits' "audit-009 cluster unreachable" claim is STATIC-ANALYZER ARTIFACT. `--ctor-probe` shows CRT driver `sub_824ACB38` (called from entry_point) iterates 0x82870xxx fnptr arrays (557 slots, 82 non-NULL); `sub_8280E148` (RegisterToFactory) fires at cycle 1.4M; tid=13 chain (sub_821C4EB0/sub_821CC3F8/sub_821CBA08/sub_821CB030) fires; **work-submitter sub_82452DC0 fires on tid=13 at cycle 8127**. Real bug = γ missing-signaler inside sub_82452DC0's descendant tree. **14th reading-error class**: BFS-only insufficient when binary uses CRT-driven fnptr arrays. **Angle B (-n 5B) DEFINITIVELY FALSIFIED** (bit-identical to 500M). Cluster is HALF-bootstrapped (worker-thread subset live, vtable-writer subset dead) — virtual dispatches hit garbage. Wine canary no longer only route. Top angle: probe sub_82452DC0's 9 direct targets + 1 ind_call (`0x8245AE50,0x82452068,0x82452200,0x8245B000,0x8245B078,0x82454A40,0x82452AB8,0x82454918,0x82452EC4`) in both engines, find divergent branch. Cascade D=draws>0 YES POSSIBLE 30-50%. +- [audit_049_tid1_stall_0x1280](project_xenia_rs_audit_049_tid1_stall_2026_05_10.md) — Post-AUDIT-032 wedge: handle 0x1280 = **Thread handle** for tid=13 (not event); tid=1 is doing thread-join. Real sub-stall: tid=13 waits INFINITE on event 0x1288 (created at `sub_821CB030+0x128`, in audit-009 cluster front-end UI). 5 NtSetEvent callers in worker cluster `0x82450000-0x8245C000`: only `sub_82452DC0` (work-submitter) statically reachable; rest unreachable. tid=13 create-chain: `sub_821748F0`→`sub_821C4EB0(UImpl@GamePart_Title@silph)`→`sub_821CC3F8(AVGamePart_Title)`→`sub_821CBA08`→`sub_821CB030`. All in audit-009 island. Discipline 3/5 PASS, D=draws>0 NO. **5th consecutive audit (044/045/046/047/049) converging on same unreachability island.** Path 1 (Wine canary) now justified per user's "meaningful progress" framing. +- [audit_048_audio_host_pump_fix](project_xenia_rs_audit_048_audio_host_pump_fix_2026_05_10.md) — **PATH 2 LANDED in working tree (not committed)**. AUDIT-032 audio fix: dedicated audio worker thread per client, 75 net LOC across 4 files. Cascade A/B/D ✅ — tid=9/10 unblock from `0x828a3254`/`0x828a3230`, KeReleaseSemaphore 0→1, xaudio.callback.delivered=1. swaps 2→1 (degenerate splash lost). draws 0→0 (expected — audio≠renderer per AUDIT-032). **Boot phase advanced**: NtWaitForSingleObjectEx 1.4M→30, NEW StfsCreateDevice/XamContent*/ExCreateThread×10/XeCryptSha/etc. New stall: tid=1 main Blocked on handle 0x1280 at pc=0x824ac578. Tests pass (kernel 127/127, app 5/5 non-ignored). Lockstep goldens deferred (will drift). +- [autonomous_run_synthesis](project_xenia_rs_autonomous_run_synthesis_2026_05_10.md) — **AUTONOMOUS-MODE SYNTHESIS (2026-05-10, sessions 5-8 of 10 used; 9-10 RESERVED)**: 4 audits (044-047) refuted 4 hypotheses; no fix landed; no draws cascade; methodological floor reached within Linux Debug + READ-ONLY discipline. Cluster activation past Linux Debug horizon. Best near-reachable signaler `sub_8245AD00` covers 4 wedges but its callers sit in audit-009 unreachability island. KeReleaseSemaphore 0 ours / 73,914 canary = AUDIT-032 audio host-pump gap (already named). Three paths forward (user pick): (1) Wine/Lutris canary build for new oracle [HIGH effort/reward], (2) audit-032 audio host-pump fix [MEDIUM effort, D≠draws], (3) guest-thread injection [HIGH risk]. Sessions 9-10 held in reserve. +- [audit_047_gamma_wedges](project_xenia_rs_audit_047_gamma_wedges_2026_05_10.md) — 10 NO_SIGNALS handles inventoried + XAudio. `sub_8245AD00` reachable, covers 4 wedges (0x10A0/0x10A4/0x1530/0x1534), but callers in audit-009 island. Of 125 signal-source fns, only 2 statically-reachable + near-wedge: sub_8245AD00, sub_82450218. KeReleaseSemaphore 0/73914 canary divergence = known AUDIT-032 audio host-pump (PC 0x824D229C, r3=0x828A3230). Discipline 3/5 PASS, D=draws>0 NO. γ-wedge convergence: all hit audit-009 unreachability island. +- [audit_046_loop_exit](project_xenia_rs_audit_046_loop_exit_2026_05_10.md) — REFUTES audit-035 slot-pointer-region divergence as causal AND audit-034 "canary 3.75/5 vs ours 5/5" iter divergence. Both engines run 5/5 iters at sub_82450720+0x160..+0x1F4, fall through to no-match exit. Slot-table region split (canary 0xBC3xxxxx vs ours 0x4024xxxx) is REAL but BEHAVIORALLY INERT — predicate compares within each engine's own heap region. **Drop sub_82450720 chain as critical-path target.** Reading-error #13 (mid-block PC unprobeable in ours) reconfirmed; workaround via post-bne block-entry PC `0x82450908`. Recommended AUDIT-047: γ-cluster handle wedges per audit-042 (handles `0x10A0+0x10A4`, `0x12AC`, `0x1004`, etc). +- [audit_045_cluster_ctor_probe](project_xenia_rs_audit_045_cluster_ctor_probe_2026_05_10.md) — REFUTES audit-044 ctor hypothesis: `0x8228FAC8` (vptr write) + `0x8228F858` (ctor entry) fire **0× in CANARY too** at 50s. Cluster genuinely not activated at this horizon either engine — RECONCILE-B (Linux Vulkan/XCB) blocks before front-end-UI activation. T6 (audit-033 LR `0x82172BF8`) fires in both — gateway chain `entry→sub_8216EA68→sub_822F1AA8→sub_82172BA0` runs; cluster-ctor branch off it doesn't. **13th reading-error class**: probe-firing-granularity — ours `--pc-probe` fires only at basic-block entry, canary `--log_lr_on_pc` per-instruction; mid-block PCs systematically give ours=0. Recommended AUDIT-046: probe loop-exit predicate `0x82450904` (audit-034 5/5 vs 3.75/5 divergence). DB caveat: `v_call_graph` uses `xrefs.source`; prefer `xrefs.source_func` for caller-set queries. +- [audit_044_m55_cluster_survey](project_xenia_rs_audit_044_m55_cluster_survey_2026_05_09.md) — M5.5 lifts cluster `0x82285000-0x82294000` from 0/321 static-reach to 41/321 indirect (12.8%); only 4 vtable methods are entries. Cluster's 6 vptr-writer constructors all dead. Top probe: `sub_8228F858` writer at `0x8228FAC8` (vtable `0x820a9c28` ctor). Audit-033 chain ends genuinely at `sub_821CECF0`/`sub_821C4988` (7-hop BFS finds no ancestor). M5.5 `cand_count≤10` actionable; `=203` noise. +- [project_xenia_rs_audit_043_record_zero_offset_2026_05_09.md](project_xenia_rs_audit_043_record_zero_offset_2026_05_09.md) — **🎯 KRNBUG-AUDIT-043 (2026-05-09, READ-ONLY, master `d8766c6` tests 645)**: identifies writer of +0x00 at "records" 0x40542300/40/400/4C0. Mem-watch (-n 500M) → writer = memcpy `pc=0x825F1080 lr=0x825ED608` called from `std::basic_string::reserve_then_assign sub_8216E138+0xC8`. **Records are not records** — they are 64-byte slots in Sylpheed's pool allocator (`sub_821505D8` allocs 58MB via `sub_824A8858`; `sub_82152728` chains 64-byte free-list over 1.25MB; bucket sizes 4/16/32/64/96/128/160/192/256). Audit-030 patch reapplied; canary probe at `pc=0x825F1080`: 94,945 fires/25s, **zero hit `0x40542xxx`**, top destinations `0x705Dxxxx`/`0xBC36xxxx` (canary's pool base = `0xBC32C880` per pool-init probe). **Audit-039's "0xF80000B8 vs game" comparison is a VA-equality fallacy**: same guest VA backs different live data because allocator returns different host VAs in canary vs ours. Bug class **ε (host-allocator address-space divergence)** — NOT a guest-write bug, NO missing/wrong write at +0x00 in our impl. **Reading-error ledger 12th entry**: VA-equality fallacy across emulators — comparing memory at identical guest VAs assumes both allocators return same VA for same logical alloc; Sylpheed's pool factory makes this assumption false in general. **Recommended audit-044**: drop "record at 0x40542300+" line entirely; pivot to audit-042's actually-stalled-handle plan (0x10A0+0x10A4 worker pair, 0x12AC sema, 0x1004 event/manual). Bug class for next = γ (missing signaler). Trace `audit-runs/audit-043-record-zero-offset/{mem-watch.log, mem-watch.stdout, canary-825f1080-traces.txt.gz, audit-043-canary-poolinit.log}`. Discipline 5/5 PASS; canary patch reverted clean; xenia-rs source unmodified. +- [project_xenia_rs_audit_042_handle_lifecycle_2026_05_09.md](project_xenia_rs_audit_042_handle_lifecycle_2026_05_09.md) — **🎯 KRNBUG-AUDIT-042 (2026-05-09, READ-ONLY, master `d8766c6` tests 645)**: disambiguates audit-041 root cause for handle 0x1454 missed wakeup. Re-applied audit-030 canary patch (30 LOC, reverted clean canary `6de80dffe`). Method: ours `--trace-handles-focus=0x1454` (existing audit.rs); canary `--log_lr_on_pc=0x8284DF1C` (NtCreateEvent ord 209) + cross-ref audit-041 log. **STRUCTURAL FINDING**: `KernelState::alloc_handle` (state.rs:588) is **monotonic atomic counter** `fetch_add(4)` from 0x1000 — **bump-only, NO recycling, structurally impossible**. `nt_close` removes object but never returns ID to pool. **Lifecycle of 0x1454 in ours** (2 reruns identical): created tid=13 lr=0x824a9f6c (NtCreateEvent, kind=Event/Manual), waited tid=13 lr=0x824ac578 (do_wait_single), signaled tid=5 lr=0x824aa304 (NtSetEvent), wake fired (wake_eligible_waiters/auto). Final: waiters=0 signaled=true wakes=1 — **fully consumed, NOT stuck**. Created stack frames sub_822DFC74←sub_822E0344←sub_822D2CA4←sub_822DE768←sub_821C4B1C. **Lifecycle of canary F80000CC family**: 5+ Added/Removed/Added cycles in 30s (XObject→XEvent→XEvent→...). **Slab/free-list allocator**: F8000098 reused 130×, F80000D0 95×, F80000DC 71× per 30s window. **VERDICT: ROOT CAUSE NOT (A) HANDLE-RECYCLING** — recycling structurally impossible in ours; signal lands on same KEVENT object the wait registered for. **Audit-041's premise provisionally falsified for 0x1454**: audit-041 inferred stall from `--lr-trace` "7 pre-bl, 6 post-bl" but ran `--quiet` so end-of-run dump was suppressed; post-bl miss explained by KeWaitForSingleObject's wake-side context-restore bypassing post-bl PC. **Real wedges** (this run's stalled handles): `0x1004` (tid=11), `0x1020` (tid=3), `0x1040` (tid=5), `0x1544` (tid=17), `0x1578` (tid=19), `0x12ac` (tid=14,15), `0x10a0+0x10a4` (tid=6) — all ``. **Bug class**: δ-namespace + δ-wakeup BOTH RULED OUT for 0x1454; wedge migrates to **γ (missing-signaler)** on different handle set. Sharp 4-dim cascade prediction (audit-043 fix on real handle): A=signal_attempts 0→≥1, B=stalled tid Blocked→Ready, C=`` count drops ≥2, D=swaps>2 OR draws>0 (low probability — γ-cluster plateau). **Recommended audit-043**: pivot to (1) `0x10a0+0x10a4` Event+Sema worker pair on tid=6, (2) `0x12ac` Sema with 2 waiters, (3) `0x1004` Event/Manual on tid=11. Trace `audit-runs/audit-042-handle-lifecycle/{probe.log, probe-run2.log, canary-create-0x8284DF1C.log}` (~11.5 MB). Discipline gate 5/5 PASS. xenia-rs source unmodified, no commit. +- [project_xenia_rs_audit_041_wait_site_2026_05_09.md](project_xenia_rs_audit_041_wait_site_2026_05_09.md) — **🎯 KRNBUG-AUDIT-041 (2026-05-09, READ-ONLY, master `d8766c6` tests 645)**: wait-site signaler determination. Re-applied audit-030 patch (30 LOC, reverted clean canary `6de80dffe`). Wait at PC `0x822DFC34 bl 0x824AA330` (KeWaitForSingleObject INFINITE) inside sub_822DFBC8 — direct caller of audit-040's NtCreateEvent. **Wait completion (canary 30s vs ours 500M-instr)**: canary 9 bl + 9 post-bl = **100%**; ours 7 pre-bl (probe at `bl` elided in HIR; pre-bl `0x822DFC30 addi r4,r0,-1` is fair) + 6 post-bl = **6/7 = 85%**. **7th wait stalls on handle `0x00001454`** (audit-040 family) at cycle 48,849. **Outcome (i) confirmed**: handle-namespace divergence is **load-bearing**. **Signaler = NtSetEvent** (xboxkrnl ord 246, thunk 0x8284DF5C): canary 9245 fires, **2 on F80000CC/C0** with LR=0x824AA304 (wrapper sub_824AA2F0, 89 callers). KeSetEvent 20588 fires, 0 on those handles (takes KEVENT*). **Cross-check**: ours's NtSetEvent fires **1× on r3=0x1454** at cycle 3,519,453 (AFTER stall) — **signaler IS firing in ours, but waiter not woken**. Bug class refined to **δ-namespace + δ-wakeup composite**: signal-before-wait race ruled out (signal is later); candidate causes = (a) handle slot 0x1454 recycled between create-epochs so signal hits different KEVENT than wait registered for, OR (b) KeSetEvent/wait-queue plumbing has missed-wake. **Recommended audit-042** (two-track): (1) probe sub_824AA2F0 entry filtering r3=0x1454 to name signaler caller chain; (2) dump our handle table state for slot 0x1454 at cycle 48,849 (wait) vs 3,519,453 (signal) — if different KEVENT pointers → handle aliasing in `xenia_kernel::handle_table`; if same → wait-queue bug. Both fixes ≤60 LOC. xenia-rs HEAD `d8766c6` unchanged. Trace `audit-runs/audit-041-wait-site/`. +- [project_xenia_rs_audit_040_record_ctor_inputs_2026_05_09.md](project_xenia_rs_audit_040_record_ctor_inputs_2026_05_09.md) — **🎯 KRNBUG-AUDIT-040 (2026-05-09, READ-ONLY, master `d8766c6` tests 645)**: identified divergent INPUT to sub_8244FC90 record ctor. Re-applied audit-030 patch + extended TrapLogLR (+56 LOC, reverted clean). Canary 33 fires at 30s; ours 8 fires via `--lr-trace`. Calling conv: r3=dest, **r4=28-byte source struct ptr (memcpy'd to dest+0x3C)**, r5=2nd this, r6/r7=scalars. LR=`0x82450440` (=sub_824503A0+0xA0) IDENTICAL in both. **Divergent dword**: `*r4+0` = canary `0xF80000DC` vs ours `0x00001454` — **NtCreateEvent OUT handle** (xboxkrnl ord 209 thunk `0x8284DF1C`). Upstream chain: sub_8244FC90 ← sub_824503A0 ← sub_824528A8 ← sub_822DFBC8 ← **sub_822DFC74** (which calls sub_824A9F18→NtCreateEvent at +0xC8C, stores result at `[r31+44]` then dispatches via vtable[7]). Both runtimes call NtCreateEvent 395× successfully — divergence is **handle-namespace cosmetics** (canary `0xF8000xxx` XObject-kernel-region vs ours `0x10xx-0x14xx` KernelState-handle-table-id). **Bug class δ-namespace** (handle representation; benign unless downstream interprets bits semantically). **AUDIT-037 framing partial-correction**: the inline filename text lives at dest record's `+0x40+` (written by sibling callees `sub_822F8A70`/`sub_82150030` AFTER sub_8244FC90 returns), NOT in the 28-byte memcpy region. **Recommended audit-041**: probe `sub_822DFC34` (`bl 0x824AA330` waitsite) in BOTH runtimes — if canary's wait completes but ours doesn't = signaler-missing bug. If both stall = handle namespace finding is benign and pivot to RDX search-criteria producer. Trace `audit-runs/audit-040-record-ctor-inputs/{canary-0x8244FC90.log, ours-lrtrace.jsonl, ours-dump.log, canary-patch.diff}`. xenia-canary HEAD `6de80dffe` clean; xenia-rs HEAD `d8766c6` unchanged. +- [project_xenia_rs_audit_039_track_2_extended_canary_2026_05_09.md](project_xenia_rs_audit_039_track_2_extended_canary_2026_05_09.md) — **🎯 AUDIT-039 TRACK 2 (2026-05-09, READ-ONLY, canary `6de80dffe`, xenia-rs `d8766c6`)**: extended-horizon canary trace for cluster activation. Re-applied audit-030 `--log_lr_on_pc` patch (30 LOC, 4 files); reverted clean. Probed 3 Tier-2 PCs serially (audit-031 single-PC constraint), 15-min wallclock each: **0x82172524 = 0 fires / 22 min**, **0x82175810 = 0 fires / 15 min**, **0x8217EB78 = 0 fires / 15 min**. ~52 min canary CPU total. Steady-state mix 240k KeReleaseSemaphore + 75k VdRetrainEDRAM/XamInputGetCapabilities loop in all 3. Per task brief Step 3 outcome **(ii)**: cluster activation past Linux Debug's reach in 15 min — skipped Tier-1 (3 PCs) + L1 (6 PCs) per compressed plan (consequences of Tier-2). Confirms+extends audit-034 Phase B (5 min, 0×) and VERIFY-A (35 sec, 0/12). RECONCILE-B host-presenter caveat dominates: Vulkan/XCB Linux fails to display intro video, front-end-UI state machine never advances past post-intro. **3 horizons (35 sec/5 min/15 min) all stop in same idle loop.** Sister Track 1's cascade-A verdict FAIL combined with this OUTCOME (ii): transformation-step (`RtlInitAnsiString`-driven filename externalization) IS missing AND cluster activation IS past Linux Debug reach — independent gates. **Recommended pivot B (static-only, M5.5 alias-aware vtable dispatch resolution)** first; pivot A (Lutris Windows canary instrumentation) as fallback. Trace `audit-runs/audit-039-track-2-extended-canary/canary-0x{82172524,82175810,8217EB78}.{log,err}`. Canary patch reverted, HEAD `6de80dffe` unchanged; xenia-rs HEAD `d8766c6` untouched, tests 645. +- [project_xenia_rs_audit_039_track_1_verify_2026_05_09.md](project_xenia_rs_audit_039_track_1_verify_2026_05_09.md) — **🎯 AUDIT-039 TRACK 1 (2026-05-09, READ-ONLY, master `d8766c6`)**: cascade dimension A verification post audit-038 cache fix. Probe sub_8228E498 = 0 fires (silenced by fix). Fallback `--dump-addr` 0x40542300/40/400/4C0 + extended 0x40542100..800. **VERDICT: FAIL** — 0x40542300 IDENTICAL to audit-037 pre-fix (inline "game:\\hidden\\Resource3D\\Common.xpr…", +0x20=0x7072005C "pr\\0\\\\" text bytes); 0x405424c0 has descriptor-shape pointers at +0x20=0x40541ED8 but **filename still inlined at +0x44** ("ptc_pack.xpr"). Canary records hold pointers (0xF80000B8 handle@+0, BC65xxx/BC36xxx sub-pointers); strings live on separate `RtlInitAnsiString`-allocated heap. Cache fix is correct hygiene (silenced sub_82459D18/sub_8245D230/0x82450904) but DID NOT externalize filenames into ANSI-string heap. **Bug class η record-layout divergence (audit-036) PERSISTS** — record-population transformation step is upstream/sibling of cache machinery, untouched by audit-038. Lockstep instr=500000019 stable, swaps=2. **Recommended next**: (A) trace `RtlInitAnsiString` callers vs canary to find missing `game:/dat:/cache:` prefix populator, (B) mem-watch +0x20 of 0x40542320 to capture writer PC+LR, (C) wait for sister Track 2's extended-horizon canary trace before declaring transformation-step missing, (D) KRNBUG entry on `RtlInitAnsiString` prefix branching. Trace `audit-runs/audit-039-track-1-verify/{probe-element,dump-extended}.{out,log}`. xenia-rs HEAD `d8766c6` unchanged. +- [project_xenia_rs_audit_036_vptr_deref_2026_05_09.md](project_xenia_rs_audit_036_vptr_deref_2026_05_09.md) — **🎯 KRNBUG-AUDIT-036 (2026-05-09, READ-ONLY, master `9028021`)**: hypothesis test of audit-035 narrative — captured `[[r3+0]+32]` at sub_8228E498/sub_82451E20+0x58 in BOTH runtimes. Canary patch 49 LOC (audit-030 base + 19-LOC TrapLogLR ext to deref `[r3+0]` + 64-byte struct dump); reverted clean. **Disasm correction**: sub_8228E498 is a deque iterator deref returning element_address (NOT vtable dispatcher); the `[+32]` deref happens in CALLER sub_82451E20 at PC 0x82451E78 reading the returned element's `[+0]` (key) then `[key+32]`. **Canary value `[[r3+0]+32]` = `0xBC65D018/D2D8/CFD8/D118/D198/D398`** — phys-heap pointers. **Ours value `[[r3+0]+32]` = `0x7072005C`** — mid-FILENAME-STRING text bytes ("pr\\0\\\\" from "...Common.xpr\\0\\..."). **VERDICT: REFUTED-AS-STATED** — ours's value is text not a heap pointer. **STRONGER finding**: records held by container have FUNDAMENTALLY DIFFERENT LAYOUTS. Canary's `[r3+0]=0xBC65D1C0` is a 16-dword pointer-bearing struct (handle@+0=0xF80000B8, sub-pointers@+32/+36/+44). Ours's `[r3+0]=0x40542300` is a struct STARTING WITH inline filename "game:\\hidden\\Resource3D\\Common.xpr\\0..." — offset 32 falls inside string text. Predicate `r28 == [[r3+0]+32]` compares stack pointers vs string bytes in ours (impossible match). Bug class **η — record-layout divergence (NEW class)**, distinct from audit-035's heap-region axis. **Recommendation: DO NOT proceed with physical-heap separation as audit-037** — even after heap-split, ours's records would still hold inline strings; predicate would still fail. **Audit-037 = identify the record populator** that builds container elements (mem-watch on `0x40542300+0x20` → writer PC + LR, walk caller chain, compare to canary's resource-loader). xenia-rs HEAD `9028021` unchanged. Tests 640. Trace `audit-runs/audit-036-vptr-deref/{canary.log, canary-callsite.log, ours.log, ours-exit.log, ours-final.log}`. +- [project_xenia_rs_audit_035_slot_table_2026_05_08.md](project_xenia_rs_audit_035_slot_table_2026_05_08.md) — **🎯 KRNBUG-AUDIT-035 (2026-05-09, READ-ONLY, master `9028021`)**: re-applied audit-030 patch + extended TrapLogLR (+19 LOC, total 49) to dump 5×20-byte slot table from r3+108 at sub_82450720 entry. Disasm verified r26+108 / 5 slots / 20 stride. r3=r26=0x828F3B68, base=0x828F3BD4 in BOTH runtimes. Canary 22 entries / 30s wall; ours --dump-addr at 50M (==500M, identical). **Diff**: slots 1/2/4 same shape (zeros + ptr + size 8) but ptrs in **canary 0xBC3xxxxx (physical heap)** vs **ours 0x4024xxxx (v40 bump heap)**; slot 3 canary `(2,5)` push counters vs ours `(0,0)` (ours over-cycles 0..0xB). Slot 0 zero in both. **Bug class ε — heap-region cross-reference mismatch**: predicate at 0x82450904 compares sub_82451E20's vptr-table-derived sum vs slot's local sum; the table walked via sub_8228E498's `[r3+0][32]` holds canary-physical addresses, but slot writers push v40 addresses on ours — per-element inconsistency causes predicate to never match early in ours. 1066 mem-watch hits on slot 3 ours (writers at sub_82450bc4 chain + 0x822f8b20/0x82323364/0x8231eee8). **Falsifies audit-034**'s "different positions" — slots match in shape, mismatch is in pointer value. **Sharp cascade**: A=land physical-heap separation (CPPBUG-AUDIT-001); B=sub_8228E498 vtable + slot writers same heap region; C=predicate matches iter 1-2; D=`draws>0` UNKNOWN. Pointed-to objects: 0x4024A240=vtable-headed (vptr=0x40111860), valid. Connects directly to audit-027/029. Canary patch reverted; xenia-rs HEAD `9028021` unchanged. Tests 640. Trace `audit-runs/audit-035-slot-table/{canary-0x82450720-fix.log, ours-lrtrace.jsonl, ours-dump-stdout.log, ours-memwatch-slot3.log}`. +- [project_xenia_rs_audit_034_frame_chain_divergence_2026_05_08.md](project_xenia_rs_audit_034_frame_chain_divergence_2026_05_08.md) — **🎯 KRNBUG-AUDIT-034 (2026-05-09, READ-ONLY, master `9028021`)**: re-applied audit-030 patch; probed audit-033's full 8-PC ours-side chain in canary 50s + ours -n 500M. **Firing matrix** L0..L5 uniform **6.3× divergence** (sub_821C4988=1/1, sub_821CECF0=2/2, sub_821CBEA8=7/7, sub_821CD458=7/7, sub_821CB968=14/14, sub_82450638=14/14 — same shape, ours iterates 6.3× more often per second). L6 sub_82450720 = 24 canary / 16 ours = 4.2×; L7 sub_82451E20 = 90 canary / 80 ours = 5.5×. **Loop-exit-divergence located**: sub_82450720+0x160..+0x1F4 (PC 0x82450880..0x82450914, 5-iter loop bounded by `r25<5`). Ours runs 5/5 iters (80/16=5.00); canary avg 3.75/5 (90/24=3.75) — exits via 0x82450904 `bne` on sub_82451E20's success-predicate match. **Exit predicate**: `[sub_82451E20_out+0]==r30-12 AND [+4]==[r30+0]+[r30+4]` where r30=r26+108+iter*20; data source = 5×20-byte slot table at r26+108..207 (r26=container struct arg1). Predicate fed by sub_82451E20's inner-loop, which dereferences Tier-1 cluster sub_8228E498's `[r3+0][32]`. Bug class **β-data-divergence + γ-deep entry** (sub_821C4988=0 static call xrefs → vtable). **Phase B (300s canary) Tier 2/3 horizon** — ALL 5 PCs (0x82172524, 0x82175810, 0x8217EB78, 0x821A6CF0, 0x821A8578) = **0 fires at 300s**. Cluster activation gated deeper than 5-min Linux Debug horizon (consistent with RECONCILE-A: Linux trace reaches frame 42/186, Lutris Windows reaches 72/186). Tests 640, lockstep instr=100000003. Canary patch reverted; master `9028021` unchanged. **Recommended next**: AUDIT-035 mem-watch r26+108..207 to identify slot-table writer ours misses; OR M5.5 to name sub_821C4988 trigger; OR extended pc-probe of sub_8228E498 capturing `[r3+0][32]` to name predicate compare-target. +- [project_xenia_rs_audit_033_ui_entry_chain_2026_05_08.md](project_xenia_rs_audit_033_ui_entry_chain_2026_05_08.md) — **🎯 KRNBUG-AUDIT-033 (2026-05-08, READ-ONLY, master `9028021`)**: re-applied audit-030 `--log_lr_on_pc` patch (30 LOC, build via `ninja -f build-Debug.ninja xenia_canary`; Checked variant has code-cache alloc collision). Probed 8 PCs (Tier1 cluster externals 0x8228A628/0x8228E138/0x8228E498; Tier2 callers 0x82172524/0x82175810/0x8217EB78; Tier3 CMessageBridge sites 0x821A6CF0/0x821A8578) in BOTH canary (50s wall) and ours (-n 500M). **Convergence**: both fire 0x8228E138 (canary 2× LR=0x82172BF8 in sub_82172BA0, ours 1× same LR) AND 0x8228E498 (canary 28× LR=0x82451E78 in sub_82451E20, ours 62× same LR). **Falsifications**: 0x8228A628 + all Tier 2 + all Tier 3 = 0 fires in canary at 50s — cluster's full activation isn't triggered in canary either at this boot horizon. **Frequency divergence**: ours 62× / 8s guest vs canary 28× / 50s wall on sub_82451E20 — busy-loop in array-ctor dispatch; loop-exit gate is the divergence target. CTOR-PROBE captures full call chain sub_82451E20←sub_82450720←sub_82450638←sub_821CB968←sub_821CD458←sub_821CBEA8←sub_821CECF0←sub_821C4988. Bug class **γ (vtable-driven dispatch)** — both reach Tier 1 entries via same LR; M5.5 (this-flow vptr) prerequisite for deeper top-down probes. Canary patch reverted. Trace `audit-runs/audit-033-ui-entry-chain/{canary-0x*.log,ours.log,ours.err}`. xenia-rs HEAD `9028021` unchanged. **Recommended next**: M5.5 milestone OR pivot to probing sub_82450720/sub_82450638/sub_821CB968 to find loop-exit gate (62 vs 28 fires divergence) OR longer canary trace via Lutris Windows build for post-intro Tier 2+3 activation. +- **🚨 CLUSTER IDENTITY REFRAME (2026-05-08)** — The `0x82285000-0x82294000` cluster (audit-009/016/017/020/021/026/027/029) is **NOT the renderer plateau**. It is the **front-end UI / save-game / mission-select / HUD subsystem** per RAPID-SURVEY-Q4 (93 string refs: BASE_INFO, LOAD_BASES, SAVE_MENUS, MISSION_SELECT, NOW_LOADING, FlightTime, etc.). Past sessions calling this "renderer cluster" or "renderer plateau" misidentified the subsystem. The cluster doesn't fire because the front-end UI flow never *activates*, not because the renderer is broken. The actual renderer (which produces the 2 splash swaps we DO get) lives elsewhere. The `swaps>2 / draws>0` gate is the **front-end loader** — what should activate after intro video → main menu → mission select. **Future sessions: do NOT label this cluster "renderer"**. Add this to running-error ledger as the 11th entry (subsystem-mislabeling class, distinct from the 10 function-boundary entries). + +- [project_xenia_rs_overhaul_rapid_survey_2026_05_08.md](project_xenia_rs_overhaul_rapid_survey_2026_05_08.md) — **🎯 RAPID-SURVEY (2026-05-08, READ-ONLY, master `9028021`)**: post-overhaul DB survey of audit-009 cluster `0x82285000-0x82294000`. **Q1 zero**: 6 L1 PCs in NEITHER `methods` NOR `function_pointer_array_entries` (pure `this->vptr` per audit-031/032). **Q2 LEAD**: 13 static arrays at `0x820A9B98-0x820AA024` point INTO cluster; ctor candidates `sub_8228F858, sub_82293EC8, sub_82294898, sub_82284590, sub_822A0860, sub_822A0E90` (all themselves trapped in cluster or adjacent 0x822A0xxx). **Q3 BIG**: cluster has **309 pdata-validated fns, 309 unreachable** via static-call BFS AND indirect-reach view (M5 added 0 edges, ind_call=0 globally). Audit-009's "42 unreached" was 7x undercount. **Q4 PIVOT**: 93 string refs from cluster name **save-game/mission-select/UI subsystem**: BASE_INFO, LOAD_BASES, SAVE_MENUS, AUTO_SAVED, MISSION_SELECT, NOW_LOADING, FlightTime, ClearTimes, Disk free space, Content request — NOT raw renderer. SilpheedSCS::CMessageBridge strings live OUTSIDE cluster (caller `sub_821A6CF0+0xE6C`, `sub_821A8578+0xE0`). **Q5**: 68 cluster fns have `has_eh=true` (heavy C++ EH around save I/O). **Q6**: 0 mis-merge candidates in 0x82200000-0x822F0000 — past audits stand. **Q7**: audit-031 boundary fix verified (sub_824D23B0/29F0/2BD8/2C08 separate). **External entries** sub_8228A628/sub_8228E138/sub_8228E498 ARE called from outside (`sub_82172524, sub_82175810, sub_8217EB78` etc) but THEMSELVES still unreachable in BFS — gate is even higher. **Verdict**: audit-009 framing should pivot from "renderer plateau" to "front-end-UI/save-game cluster never instantiated". M5.5 NOT mandatory before next probe — Q4 already names subsystem; M5.5 would name the trigger ctor. **Recommended next**: `--lr-trace` canary-diff on cluster external-entry PCs + their parents (sub_82172524, sub_82175810, sub_8217EB78); cross-check SilpheedSCS::CMessageBridge::Load/CreateDeviceObjects; schedule M5.5 as next analyzer milestone. +- **✅ ANALYSIS-OVERHAUL FULL CLOSE-OUT (M1-M12 + 5 closers) LANDED (2026-05-08+10)** — [analysis_overhaul_M1_M12](project_xenia_rs_analysis_overhaul_2026_05_08.md). 9 no-ff merges off `e061e21` → master `7bc9e3a`. **All 12 milestones + all 5 deferred closers done.** M1 pdata (12156→25481 fns); M2 demangler; M3 722 vtables/499 anon classes/5571 methods; M4 Class::* probes; M5+M5.5 ind_call (687,963 edges, 97 single-cand + 6,745 multi-cand, **2,746 newly reachable functions** via vptr-write inference — M5.5 surfaces audit-009 cluster); M6+VMX `addr_mode` + 442 x_form_indexed + 40 atomic + 110 stvx; M7+SJIS/UTF-8 strings (6,311 ASCII + 790 SJIS + 39 UTF-8); M8/M11/M11.5 1,110 funcptr arrays (388 dispatch + 0 static_init); M9 has_eh (2,975 fns); M9.5 EH scope-tables (2,588 FuncInfo / 10,019 unwind / 315 try-blocks); M10 .tls infra (Sylpheed none); M12 `--lr-trace` JSONL canary-diff harness. Tests 605→655. Lockstep deterministic at instr=2000005. SCHEMA.md documents all 17 layers + remaining future work (M9.6 EH→function linkage; M11.6 non-canonical static-init drivers; full SJIS→UTF-8 decode; VMX128). +- **🚨 METHODOLOGY CORRECTION (2026-05-08)** — [audit_032_audio_host_pump](project_xenia_rs_audit_032_audio_host_pump_2026_05_08.md) revises audit-025's central claim "audio gate IS renderer gate". They are SEPARATE STALLS. 7+ prior sessions (KRNBUG-018, KE-001, AUDIT-024A/025/026/030/031) chased an audio gate believing it was the renderer gate. The fixes that landed addressed real divergences but did NOT approach `draws > 0`. **Future sessions must NOT re-conflate.** The renderer plateau (audit-009 cluster L1 reachability) is INDEPENDENT and remains the actual `swaps>2 / draws>0` gate. Audio fix is hygiene cleanup; renderer hunt is critical-path. All static analysis avenues for the renderer cluster are exhausted (audit-020/021/026/027/029); next probes need new tooling — analysis-toolset overhaul motivated. +- [project_xenia_rs_methodology_verification_2026_05_08.md](project_xenia_rs_methodology_verification_2026_05_08.md) — **🔍 META-AUDIT (2026-05-08)**: 4 parallel verifications + reconciliations. **VERIFY-A**: 0/12 cluster L1 PCs fire in canary → static reachability BFS is SOUND. **VERIFY-B**: 12/12 PowerPC store classes hooked by mem-watch → coverage COMPREHENSIVE. **RECONCILE-A**: Linux Debug canary kernel-call trajectory IDENTICAL to Lutris Windows; Linux log was simply terminated earlier (frame 42/186 vs 72/186). **RECONCILE-B**: user's "Linux black window vs Windows intro video" is HOST-PRESENTER divergence (Vulkan/XCB-only on Linux; Wayland TODO; H1 swapchain failure most likely; user confirmed Weston also shows black). **Combined verdict**: methodology is sound at kernel-call level. Reading-error ledger (10 entries, mostly function-boundary mis-attribution) is the real motivating gap. Past audit findings stand; no re-grading. Master HEAD `e061e21`, tests 605. +- [project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md](project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md) — **🎯 KRNBUG-AUDIT-032 (2026-05-08, READ-ONLY, master `e061e21`)**: re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files), reverted at session close. **7,875 fires** of `pc=0x824D6640` in canary; ALL from single host-flagged kernel thread named **"Audio Worker"** (handle=`0100001C`, native=`467FC6C0`); **LR invariant `0xBCBCBCBC`** = host stack-fill canary, NOT a guest PC. r3=`0x30063000` driver-ctx, r4=0|1 init-vs-tick, r5=`0x1800` frame-size, r6=`0xBDFBA600` callback_arg. Canary log: `XAudioRegisterRenderDriverClient(701CF210(824D6640), BDFBA658(00000000))`. **Mechanism: canary `xe::apu::AudioSystem` (`apu/audio_system.cc:84-159`) spawns host XHostThread "Audio Worker" that loops `WaitAny(client_semaphores_) → processor_->Execute(callback_pc, args)`** — invokes thunk DIRECTLY via host emulator, no guest call site, hence LR=stack-canary. **Falsifies AUDIT-031 vtable[7] hypothesis**: `addi r4, r10, 26176` at sub_824D2C08+0x374 loads PC 0x824D6640 as the callback_ptr ARGUMENT to XAudioRegisterRenderDriverClient (caller-side parameter), not vtable registration. **Outcome δ+α composite**: the "caller PC" we sought is canary's HOST C++, not guest. **Our impl gap**: `XAudioRegisterRenderDriverClient` (exports.rs:2705-2745) registers (callback_pc=0x824D6640, arg=0x41E9DD5C) correctly but **does NOT spawn a host worker thread to pump the callback**, no semaphore-release loop. Probes (`--pc-probe` and `--branch-probe` × 0x824D6640/0x824D29F0) at -n 500M: **0 fires both**. tid=9 parks `pc=0x824D28D0` waiting `0x828A3254`; tid=10 parks `pc=0x824D2990` waiting `0x828A3230` (count=0/limit=6). Bug class **δ-α composite — host-side AudioSystem worker thread missing**. Sharp cascade: A=tid 9 unparks on first sub_824D29F0:KeSetEvent(0x828A3254); B=tid 10 unparks on next sema release; C=XAudioSubmitRenderDriverFrame >0; D=KeReleaseSemaphore non-zero. **Audio gate is NOT renderer gate** (revising audit-025) — separate stalls sharing "host pump missing" symptom only. Tid 10's limit=6 sema = audio-frame queue depth (canary's `queued_frames_=6`), isolated from renderer. Recommended next: implement host-side audio worker (60-120 LOC) per `apu/audio_system.cc`. Won't flip swaps=2→draws>0 plateau alone — audit-025 strategic pivot to audit-009 renderer cluster L1 callers REMAINS priority. Trace `audit-runs/audit-032-dispatcher-lr/{canary-patch.diff, probe.{log,err}, probe-sanity.{log,err}, branchprobe.{log,err}}` + `/tmp/audit-032-canary.log` (7,875 LR fires). Tests 605, lockstep instr=100000003. Master HEAD `e061e21` unchanged. +- [project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md](project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md) — **🎯 VERIFY-A (2026-05-08, READ-ONLY, master `e061e21`)**: re-applied audit-030 patch; probed 12 PCs (6 narrow L1 + 6 broader cluster) from audit-009 unreachable cluster `0x82285000-0x82294000` in canary. **0/12 fires** across 35-sec windows while audio loop runs hot (5,600-5,800 KeReleaseSemaphore/window). Sanity-check `0x824D28D0` = 5683 fires confirms trace mechanism. **Outcome (i): static reachability claim is SOUND** — xrefs.kind='call' BFS conclusion is corroborated by canary runtime trace; indirect-dispatch reachability is NOT being missed for this cluster. Audit-009/-016/-017/-020/-021/-029 framing of the cluster as unreachable holds. AUDIT-031's γ-deep dispatcher hypothesis reinforced (the dispatcher exists, but is NOT in this cluster — different code region). 95% upper bound on cluster reach-rate ~22% from sample size; full 112-PC sweep would harden to <5% in ~75 min. Reading-error ledger UNCHANGED (this claim was not on it). Recommended next session: AUDIT-031 sharp-prediction step 1 (probe `0x824D6640`); analysis-toolset overhaul remains motivated by other 10 errors not by this verification. Canary patch reverted. Trace `audit-runs/verify-A-static-reachability/probe-*.log` (13 files). +- [project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md](project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md) — **🎯 KRNBUG-AUDIT-031 (2026-05-08, READ-ONLY, master `e061e21`)**: re-applied audit-030's `--log_lr_on_pc` canary patch (30 LOC, 4 files); reverted at session close. **Outcome (a) — canary EXECUTES PC 0x824D28D0**: 54128 fires in ~5min, audio worker hot-loops through wait→release. Wake source captured via probe of KeSetEvent thunk `0x8284DDDC`: `tid=0100001C lr=0x824D2A44 r3=0x828A3254` = `KeSetEvent(0x828A3254,1,0)` from PC `0x824D2A40`. **AUDIT-025/-030 mis-attribution corrected**: IDA-DB `sub_824D23B0` (claimed range `0x824D23B0..0x824D2878`) actually contains a SECOND function prologue at `0x824D29F0` — that's the real wake-source, NOT sub_824D23B0. sub_824D23B0 entry probe = 0 fires confirms. Static reachability of sub_824D29F0: tail-jump from thunk at `0x824D6640` (which loads `r3=[0x828A3264]`); `0x824D6640` is data-referenced at `sub_824D2C08+0x374` (PC `0x824D2F7C: addi r4, r10, 26176`); next instructions deref `[r31][68]`, load vtable[7] at `[[r3]+28]`, `bcctrl 20,lt` to register the thunk. **Our impl tid=9 state matches AUDIT-025**: `Blocked(WaitAny [0x828A3254]) pc=0x824d28d0`. Bug class **γ-deep, vtable-driven** (refines AUDIT-025 with named downstream witness `sub_824D29F0`). The dispatcher loop that should periodically invoke `0x824D6640` is the unreached gate — likely in `0x82287000-0x82294000` cluster (AUDIT-009). Discipline gate: 5/5 PASS. **Recommended AUDIT-032**: probe `0x824D6640` directly in canary (names dispatcher PC) + probe `0x824D2F90 bcctrl` to capture r3 (audio-engine "this") + vtable[7] address; walk dispatcher's caller chain in our DB; cross-check audit-009 cluster overlap. NO source mods, NO commit. Trace `audit-runs/audit-031-wait-site/{canary-0x824D2878.log, canary-0x824D28D0.log, canary-KeSetEvent.log, canary-sub23B0.log}`. +- [project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md](project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md) — **🎯 KRNBUG-AUDIT-029 (2026-05-08, READ-ONLY, master `e061e21`)**: physical-heap diff — LAST guest-memory surface. Architectural finding: our impl has NO separate physical heap (MmAllocatePhysicalMemoryEx folds into v40 bump allocator at 0x40000000+); 0xA0/0xE0/flat-0x00 dumps all `0 committed pages`. Canary physical extracted: 58458 commits / 228 MiB / 24.5 MB nonzero / 28851 0x82xxxxxx PC dwords across 4467 4K pages / 536 64K-regions. **L1 hits: narrow 0/6, broad 2/116 (sub_8228CC18, sub_8228A220 — both scalar, no tables); audit-017 chain 0/8.** **CONFIRMS audit-027 misplacement**: our v40 table at 0x40211900 (18 PCs, 0x20 stride) appears verbatim on canary physical at 0x1c32c910 — same 18 PCs same 0x20 stride same trailing dup. Largest PC run on canary physical: 232 dwords at 0x1e568f38 (XAM/UI 0x824b0xxx-0x824b2xxx family, ~220 PCs); 4 smaller runs ≤9. Top bucket 0x82026000 × 12655 (per-instance vtable in stride-0x38 object array at 0x144x0000). **Outcome ζ — ALL FOUR HEAPS ELIMINATED.** All vtable/dispatch-table hypotheses across audits 010/011/012/015/016/017/026/027/029 refuted. Cluster L1 functions invoked exclusively via static `bl` in unreached parents — gate is upstream of any heap data structure (control-flow gate, not data-population gate). **Strategic pivot mandatory.** Recommended AUDIT-030: Option A (preferred) comparative-execution divergence trace (canary patch periodic tid:pc:lr sample, diff vs ours to find first divergent guest instruction); Option B targeted canary trace logging LR on every entry to sub_824D23B0 (sole KeSetEvent(0x828A3254) caller, vtable-only invoked) to name the per-frame renderer caller; Option C CPPBUG-AUDIT-001 backlog. Discipline gate fails 1. NO source mods, NO commit. Tests 605, lockstep instr=100000003 preserved. Master HEAD `e061e21` unchanged. Trace `audit-runs/audit-029-physical-mem-diff/`. +- [project_xenia_rs_audit_027_v40_mem_diff_2026_05_08.md](project_xenia_rs_audit_027_v40_mem_diff_2026_05_08.md) — **🎯 KRNBUG-AUDIT-027 (2026-05-08, READ-ONLY, master `e061e21`)**: v40 heap byte-level dword diff vs canary's audit-024A 248.6 MiB dump. Captured ours via `--dump-section=0x40000000:0x3F000000` (60119 commits, 1056964608 bytes); extracted canary v40 = 90 commits / 1008 MiB. A-list (canary 0x82xxxxxx, ours differs) = **536**; B-list inverse = 31947. **Cluster L1 (0x82285000-0x82294000) hits = 0/0** broad-and-narrow — v40 ELIMINATED as dispatch-table source (after audit-026 v80 elimination). Top A-list buckets: `0x828f3xxx`(90, .data dispatcher), `0x8284dxxx`(78, .text), `0x8284cxxx`(64, .text near .text-end), `0x82150xxx`(30), `0x828f4xxx`(23), `0x82882xxx`(20). Three vtable-runs detected: `0x40000770` len=32, `0x400015a0` len=110 (same shape, two instances of 110-method class), `0x40000d90` len=20 — all target `.text` heap-allocator handler thunks NOT renderer cluster. Listener anchor `0x40BA9A80` is canary-uncommitted in this dump; ours has audit-016 listener content (`+0x2C=0x4024AC00`, `+0x3C=0x4024B3E0`) — heap-pointer divergent, not missing-write. B-list discovery: `0x40211900..0x40211B50` in OURS has 23 fn-entries spaced 0x20 apart (`0x82183ae8, 0x82187e38, 0x8218cf10, ...`) = a function-table our impl builds in v40 that canary builds elsewhere (likely physical heap). **Strategic pivot mandatory** per task brief outcome (iii). Recommended **AUDIT-029 = extract canary physical heap (0x20000000 span, 58458 commits = 228 MiB)** with same script targeting `physical`. Alt: vtable-write-tap instrumentation. Alt: CPPBUG-AUDIT-001 backlog (`nt_allocate_virtual_memory` silent-success / `mm_allocate_physical_memory_ex` ignores alignment). Trace `audit-runs/audit-027-v40-mem-diff/`. NO source mods; NO commit. Master HEAD `e061e21` unchanged. Sister 028 untouched. +- [project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md](project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md) — **🎯 KRNBUG-AUDIT-028 (2026-05-08, READ-ONLY, canary log + source analysis only)**: XNotify steady-state publisher audit. Canary log (17245 lines) shows ONLY `XamNotifyCreateListener(0x2F)` @1347 + `XNotifyPositionUI(0x0A)` @2018 — NO further notification API calls. `XNotifyGetNext` is `kHighFrequency` (xam_notify.cc:96), per-call logging suppressed. 34 `BroadcastNotification` publisher sites across 11 files in canary; ALL event-driven (host UI, profile change, XMP play, SMC tray, controller hotplug edge) — NONE periodic. Controller hotplug log message absent → no `kXNotificationSystemInputDevicesChanged` fired. `VdSwap` count = 1 (TOC entry only) → ZERO actual swaps in canary; our impl swaps=2 is AHEAD. Audio-sema released 2224× in canary tail. **Outcome β: XNotify queue is NOT the gate**. Our impl matches canary's notification timeline byte-for-byte. The 1.49M `XNotifyGetNext` polls are dutiful idle polling, not missing-publisher symptom. **Strategic pivot: audio/render gate is still the γ-cluster from AUDIT-009/016/017/025** (`sub_824D23B0` via vtable on audio_system `0x82006CF4`, renderer cluster `0x82287000-0x82294000` unreached). AUDIT-029 = static-grep canary for what populates the `0x82006CF4` audio_system vtable at runtime + diff against ours. Provisional cascade A: cluster L1 PC fires; B: KeReleaseSemaphore(0x828A3230) 0→many; C: XAudioSubmitRenderDriverFrame 0→many; D: VdSwap climbs. NO source mods, NO commit, master HEAD `e061e21` unchanged. Trace `audit-runs/audit-028-steady-state-notify/`. +- [project_xenia_rs_audit_025_audio_thread_start_2026_05_06.md](project_xenia_rs_audit_025_audio_thread_start_2026_05_06.md) — **🎯 KRNBUG-AUDIT-025 (2026-05-07, READ-ONLY, master `de5a15e` post-Path-2)**: audio thread-start gate identified as **γ-DEEP, vtable-driven**. XAudioRegisterRenderDriverClient (exports.rs:2705) ≈ canary `xboxkrnl_audio.cc:56-82` semantically; `0x41550000|index` return matches. Audio init `sub_824D2C08` runs to completion in our impl (KeInitializeSemaphore=1 on 0x828A3230 limit=6, ExRegisterTitleTerminateNotification=3, ExCreateThread spawns tids 9 entry=0x824D2878 + 10 entry=0x824D2940, KeResumeThread=2). DISPATCHER_HEADERs correctly populated with Path 2's "XEN\0"+ptr stamp. **Worker correctly parks on `KeWaitForSingleObject(0x828A3254)` waiting for job-submit signal** — but `sub_824D23B0` (the ONLY KeSetEvent(0x828A3254) wake-source, at +0x54/+0x4FC/+0x688) is never reached. Probe set 12 PCs × -n 500M: only `0x824D2DF8` (ExRegTitleTerm in sub_824D2C08) fires; `0x824D23B0`/`0x824D2404` zero fires. **`sub_824D23B0` body at 0x824D2BD8 has ZERO static call-xrefs** — invoked only via vtable on audio_system object (`[r31+0]=0x82006CF4`). Caller would be a per-frame audio update from renderer/scenegraph = the **same `0x82287000-0x82294000` cluster identified by AUDIT-009 as unreached**. Audio gate IS the renderer gate — no new bug class, same γ-cluster blocker. tid 9 state: `Blocked(WaitAny [0x828A3254])` pc=0x824D28D0, signal_attempts=0. Discipline gate fails 1+3. **Recommended next: strategic pivot back to AUDIT-009/016/017's renderer cluster L1 callers + listener vtable population** — what kernel call materializes the listener-dispatch table so renderer can route per-frame audio. Audio worker host-thread emulation is option C (regresses swaps via xaudio-tick path). Trace `audit-runs/audit-025-audio-thread-start/probe.{log,err}`. Master HEAD `de5a15e` unchanged. +- [project_xenia_rs_kernel_stashhandle_2026_05_06.md](project_xenia_rs_kernel_stashhandle_2026_05_06.md) — **🎯 KRNBUG-α-006 (2026-05-07, LANDED branch `xobj-stashhandle/p0-canary-mirror` → master `de5a15e`)**: `ensure_dispatcher_object` (exports.rs:~3097) now writes `+0x08=0x58454E00` (`'X','E','N','\0'` kXObjSignature) and `+0x0C=ptr` (stash handle) per canary `XObject::StashHandle` (xobject.h:253-256). 7 LOC impl + 27 LOC tests. Tests 604→605. Lockstep `instructions=100000003 imports=987516` ×2 reruns (identical to pre-fix d9e40d3 — host-side write). Cascade @ -n 500M: NIL ripple — workers=20, KeReleaseSemaphore=0, XAudioSubmitRenderDriverFrame=0, NtSetEvent=3334, VdSwap=2 all match post-ke-resume baseline. **0x828F4838+0x08 still zeros** at -n 500M because guest never invokes Ke* with that ptr (canary uses `SetNativePointer` lifecycle there, which we don't traverse via `ensure_dispatcher_object`). Audit-024A's hypothesis that this stamp gates audio init is **observationally falsified post-fix**. Lands as canary-correctness restoration (sister of XamUserGetSigninState pattern); no sharp cascade prediction per task brief. Trace `audit-runs/post-stashhandle/`. Next: audio-thread-start gate (post-XAudioRegisterRenderDriverClient) — coordinate with sister AUDIT-025. +- [project_xenia_rs_audit_024a_canary_delayed_trigger_2026_05_06.md](project_xenia_rs_audit_024a_canary_delayed_trigger_2026_05_06.md) — **🎯 KRNBUG-AUDIT-024A (2026-05-07, READ-ONLY, canary patch+rebuild+revert)**: re-applied audit-023's mem-dump pattern with delayed trigger on first `XAudioSubmitRenderDriverFrame_entry` (39 LOC: cpu_flags hunk + xboxkrnl_audio.cc hook). Linux Debug build success after `/home/fabi/xenia-canary` symlink + `--disable_instruction_infocache=true`. Captured 248.6 MiB dump (260,659,200 bytes) at deep boot — canary log shows `KeReleaseSemaphore(0x828A3230,...)` firing repeatedly, VdSwap, VdRetrainEDRAM, texture loads. **AUDIT-017 β-class hypothesis FALSIFIED**: `[0x828F40B0]` (=0x828F4070+64) is **ALL ZEROS in canary** at this post-populator moment, while ours has `-1` sentinel from sub_821701c8. The `[+64]==-1` blocker is not the gate — canary admits `[+64]==0` or takes a different path entirely. **`0x828F4838+0x08` "XEN\0 + 0xF8000034" divergence stable** across audit-023 (early) and 024A (late) triggers — populator wrote it during early init. Heap pointers at `0x828F4838+0x20..0x60` populated in BOTH (canary `0xBC36xxxx`, ours `0x4024xxxx`). `0x828A3230` audio sema has full canary state (`05000000`, "XEN\0+F8000070", count=1, chain to F8000080/F800007C, ts BE628EDC1FCA7000 at +0x38) — KeReleaseSemaphore=0 in ours. Bug class refined β→**γ-deep**: the audio-thread that calls XAudioSubmitRenderDriverFrame is never started in ours despite `XAudioRegisterRenderDriverClient=1` and `KeInitializeSemaphore=1`. Patch reverted (canary `git status` clean). xenia-rs HEAD `d9e40d3` unchanged. Sister AUDIT-024B running in parallel for "XEN\0" writer source-grep. Next: cross-reference 024B's writer with our canary-only export queue (ExTerminateThread/KeReleaseSemaphore) OR α/δ probe of audio-thread-start gate post-XAudioRegisterRenderDriverClient. Trace `audit-runs/audit-024a-canary-diff/`. +- [project_xenia_rs_audit_023_canary_diff_2026_05_06.md](project_xenia_rs_audit_023_canary_diff_2026_05_06.md) — **🎯 KRNBUG-AUDIT-023 (2026-05-06, READ-ONLY, Path B canary patch+rebuild+revert)**: temp 44-LOC canary patch (cpu_flags + xam_notify) added `--memory_dump_path` flag, dumped 216 MB on first XamNotifyCreateListener (mask=0x2F). Linux Debug build success after `--disable_instruction_infocache=true` workaround for canary's pre-existing XexInfoCache SIGBUS. PageEntry `state` bitfield empirically at qword bits 60-61 (NOT declaration-order 48-49). Canary's first-listener dump shows v80=146 committed pages but 0x828F4070 area ALL ZEROS (too-early trigger — populator hadn't run). Diff vs ours @-n 50M reveals: (1) `0x828E1F08` ours has listener pointer 0x40111890, canary=0 (mechanism difference, ours stuffs guest-mem, canary uses host-side notify_listeners_ vector); (2) **`0x828F4838+0x08` canary has ASCII `"XEN\0"` + handle `0xF8000034`, ours has zeros — NEW POPULATOR-EFFECT LEAD inside audit-016/017 cluster**; (3) `0x82124xxx` audit-009 cluster L1 PCs visible as data — REFUTED as populator target, this is static .pdata exception table, byte-identical in ours; (4) various `0xBC...` host-physical aliases vs ours `0x40...` virtual aliases (handle-namespace difference, not bug). **Audit-017 β-class hypothesis NEITHER confirmed NOR refuted** (canary trigger too early). Patch reverted (`git status` clean). Trace `audit-runs/audit-023-canary-diff/canary-memory.dump,canary-patch.diff,parse_dump.py,diff.txt`. Master HEAD `d9e40d3` unchanged. Next: AUDIT-024 either re-apply with later trigger (Nth listener / first XAudioSubmitRenderDriverFrame / first NtSetEvent on specific event) for fair like-for-like, OR static-search canary's xboxkrnl source for "XEN\\0" writer at 0x828F4840 (names populator's CODE). +- [project_xenia_rs_kernel_ke_resume_thread_2026_05_06.md](project_xenia_rs_kernel_ke_resume_thread_2026_05_06.md) — **🎯 KRNBUG-KE-001 (2026-05-06, LANDED branch `ke-resume-thread/p0-canary-mirror`)**: real `KeResumeThread` per canary `xboxkrnl_threading.cc:216-227` (mirrors nt_resume_thread plumbing). 600→601 tests; lockstep `instructions=100000003 imports=987516` ×2. **Cascade A PASS**: tids 9 (entry=0x824D2878) / 10 (entry=0x824D2940) leave Suspended → run prologue → park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. **B PARTIAL FAIL**: NtSetEvent 667→3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. **C FAIL** (predicted 2→1, actual 2→2): both ExTerminateThread + KeReleaseSemaphore still canary-only. **D FAIL**: γ-cluster blocker unchanged — `--pc-probe=0x82184318,0x82184374` no fires; `--dump-addr=0x828F4070` no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. swaps=2 draws=0 plateau intact. Goldens re-baselined `n50m: instructions 50000003→50000011, imports 407255→407247`. **Necessary-but-not-sufficient fix**: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 γ-cluster (`[0x828F4070+64]==-1`). Trace `audit-runs/post-ke-resume/`. Next: AUDIT-019 memory-watch on `[0x828F4070+64]` (audit-017 Option B). +- [project_xenia_rs_audit_018_canary_diff_2026_05_06.md](project_xenia_rs_audit_018_canary_diff_2026_05_06.md) — **🎯 KRNBUG-AUDIT-018 (2026-05-06, READ-ONLY)**: canary-log diff identifies α-class load-bearing stub. Function-name set-diff: only 2 calls in canary not in ours — `ExTerminateThread`, `KeReleaseSemaphore`. The latter is hammered by canary tid `F800006C` (audio render-frame ticker, entry=0x824D2878, ctx=0, flags=0x10000001) which canary unsuspends via `KeResumeThread(KTHREAD_ptr)`. **Our `ke_resume_thread` (exports.rs:3658-3664) is a no-op stub that ignores r3 and sets r3=0** — comment claims `nt_resume_thread` covers it, but `KeResumeThread` is a separate export. Canary `xboxkrnl_threading.cc:216-227` calls `thread->Resume()`. **Result: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) are `Blocked(Suspended)` at -n 500M end-of-run** despite our `KeResumeThread=2` counter matching canary. Bug class **α (load-bearing stub_success)**. **All 5 discipline boxes pass — first time since IO-004**. Fix is 5 LOC (mirror nt_resume_thread pattern). Sharp 4-dim cascade prediction: A=tids 9/10 leave Suspended; B=KeReleaseSemaphore non-zero; C=2→1 canary-only; D=open hypothesis on `[0x828F4070+64]` becoming non-(-1) if β-blocker is downstream of audio init. Trace `audit-runs/audit-018-canary-diff/ours.{log,stdout.log}`. Master HEAD `7ed6192` unchanged. Next: KRNBUG-α-005, branch `ke-resume-thread/p0-canary-mirror`. +- **XamUserGetSigninState landed (2026-05-06, master `7ed6192`)** — small canary-mirror fix at xam.rs:48 (returns 1 for user 0 per `xam_user.cc:90-101`). Tests 599→600. Lockstep `instructions=100000006` deterministic ×2 reruns (was 100000012). Cascade ripple: `XamUserReadProfileSettings` now fires 2× (was canary-only). Canary-only kernel exports: 3→2 (still missing: ExTerminateThread, KeReleaseSemaphore). β-class blocker `[0x828F4070+64]==-1` unmoved per audit-017. swaps=2 draws=0 plateau intact. +- [project_xenia_rs_audit_017_state_bits_writer_2026_05_06.md](project_xenia_rs_audit_017_state_bits_writer_2026_05_06.md) — **🎯 KRNBUG-AUDIT-017 (2026-05-06, READ-ONLY)**: 5 static bit-14/15 setters of `[listener+4]` found; case-0xA `0x82173e04` sets bit-15 once at cycle 9183060, sub_821737F0 work-path enters at 9183561, but bit-14 setter at 0x82173950 NEVER fires — gated at 0x821738E0 by `[r30+64]==-1` where r30=`[0x828F48B0+0]=0x828F4070` (singleton sub-object). `[+64]` initialized to -1 by sub_821701c8; only non-(-1) writer is sub_82184318:0x82184374 (`bl 0x82456B58 (kernel handle); stw r3, 64(r30)`); chain bottoms in audit-009 cluster (`sub_82187dd0 ← sub_82183ca8 ← sub_822919c8`). bit-28 of `[singleton+60]` set at cycle 9224352 by sub_821c4988 — too late, AND is a NEGATIVE gate. Bug class **β-dominant + α-tail** (`XamUserGetSigninState=stub_return_zero` at xam.rs:48 breaks 2 separate guest paths but won't fire cascade alone). Discipline gate fails 1+3. No fix. Trace `audit-runs/audit-017-state-bits-writer/probe{1..5}.log`. Master HEAD `d736a1d` unchanged. Next: AUDIT-018 either probe-confirm sub_82184318 chain (3rd γ-cluster confirmation → strategic pivot) or canary-log-diff for missing kernel writer of `[0x828F4070+64]`. +- [project_xenia_rs_audit_016_submitter_callers_2026_05_06.md](project_xenia_rs_audit_016_submitter_callers_2026_05_06.md) — **🎯 KRNBUG-AUDIT-016 (2026-05-06, READ-ONLY)**: 0/16 submitter-chain PCs fire at -n 500M across 4 levels of caller walk-up (workitem chain `sub_822AE1F0/sub_822F55F0 → sub_822C8B50 → sub_822C6808` + parents `sub_822ADD70/sub_821A9920/sub_822ACAB8/sub_821A8578` + grandparents `sub_82299250/sub_822A4460/sub_821A82A0`). Both static caller chains bottom-out in audit-009 unreached renderer cluster (`0x82294xxx` / `0x821A6xxx`). Listener-struct dump at `0x40ba9a80`: vtable populated, callback-table A `[+0x2C]=0x4024AC00` POPULATED (audit-015's "==0" claim was WRONG), callback-table B `[+0x3C]=0x4024B3E0` POPULATED, but `[+0x04]` dispatch-state-bits=0 — real gate is `sub_821737F0`'s bit-14/15 read of [+4]. Bug class refined δ→**γ (deeper indirection)**: chicken-and-egg vtable-registry-not-populated. Discipline gate fails 1+3. Probe-machinery anomaly: `sub_82174040` entry never fires despite mid-body PC executing — verify in AUDIT-017. Next: probe `sub_822F1AA8` frame-poll + writers of `[0x40ba9a80+4]` + `sub_82181D48` predicate. Trace `audit-runs/audit-016-submitter-callers/probe{,2}.{log,err}`. Master HEAD `d736a1d` unchanged. +- [project_xenia_rs_audit_015_l1_propagation_2026_05_06.md](project_xenia_rs_audit_015_l1_propagation_2026_05_06.md) — **🎯 KRNBUG-AUDIT-015 (2026-05-06, READ-ONLY, FORK B)**: 28/112 PCs fired at -n 500M post-IO-004. sub_82173DC8 dispatches all 4 startup notifications then idles via early-exit at 0x82173ed8 (`[r31+44]==0` callback-table NULL). Worker 0x822c6870 (tids 14, 15) parks on Semaphore handle 0x1308 (`signals=0 waits=2`); producer chain `sub_822AE1F0/sub_822F55F0 → sub_822C8B50 → sub_822C6808 → 0x824AB158 (NtReleaseSemaphore)` is unreached. Worker sub_824563E0 (tid=16) is healthy XAM inactivity-poll loop; not the gate. Worker sub_823DDB50 (tid=19) parks at entry on 0x160C Event/Auto. All 21 audit-009 baseline PCs still UNFIRED. Bug class **δ (pure-guest renderer)** — no kernel boundary stub. Discipline gate fails 1+3, no fix. Next session: probe submitter chain entries + `--dump-addr=0x40ba9a80` listener struct. Trace `audit-runs/audit-015-l1-propagation/`. Master HEAD `d736a1d` unchanged. +- [project_xenia_rs_audit_014_0x15e0_wake_2026_05_06.md](project_xenia_rs_audit_014_0x15e0_wake_2026_05_06.md) — **🎯 KRNBUG-AUDIT-014 (2026-05-06, READ-ONLY)**: 0x15e0 wake-eligibility hypothesis FALSIFIED. 0x15e0 is a Semaphore (creator `lr=0x824ab110`), `signal_attempts=1 waits=1 wakes=1`, healthy handshake (tid=1 wait → tid=16 NtReleaseSemaphore wake). tid=17 actually parks on **0x15e4** (Event/Manual, signals=0/waits=1/wakes=0, creator `lr=0x824a9f6c`) — same producer-missing class as 0x1004/0x100c/0x1020. Long-standing transcription error: AUDIT-002/008/009/IO-004 label "0x15e0 worker" should be "0x15e4 worker". Bug classes α-ζ all N/A. Discipline gate fails box 1. No fix. Master HEAD `d736a1d` unchanged. Trace `audit-runs/audit-014-0x15e0-wake/probe.{log,err}`. Next: Fork B branch-probe data on `sub_82173DC8 / 0x822c6870 / 0x824563e0 / 0x823ddb50` for the actual producer. +- [project_xenia_rs_io_004_xnotify_listener_2026_05_06.md](project_xenia_rs_io_004_xnotify_listener_2026_05_06.md) — **🎯 KRNBUG-IO-004 (2026-05-06, LANDED branch `xnotify-listener/p0-startup-enqueue`)**: real `XamNotifyCreateListener` + `XNotifyGetNext` per canary `kernel_state.cc:1013-1033` + `xam_notify.cc:22-96`. NotifyListener variant + 4 startup notifications on first listener (mask 0x2F): SystemUI/SystemSignInChanged on kXNotifySystem; LiveConnectionChanged(0x001510F1)/LiveLinkStateChanged on kXNotifyLive. 594→599 tests; lockstep `instructions=100000012` deterministic ×2 reruns. Phase 1.5 sanity probe confirmed CTR=0x82175338 (audit-012-predicted dispatch target unchanged). **Cascade: dispatch arm 0x822f1be8 fires; sub_82173DC8 entered repeatedly; 3/21 renderer-cluster L1 PCs newly reached (0x822c6870 from 2 workers, 0x824563e0, 0x823ddb50)**; canary-only 7→3 (KeResetEvent/ObCreateSymbolicLink/XamTaskCloseHandle/XamTaskSchedule re-classified to fired; still missing: ExTerminateThread/KeReleaseSemaphore/XamUserReadProfileSettings); worker count 18→20; signal_attempts on 0x15e0=1 (primary=1, was 0); draws=0 still expected at this step. LOC 119 ≤ 120. Trace `audit-runs/audit-013-io-004-phase1.5/`. +- [project_xenia_rs_cpp_runtime_audit_2026_05_06.md](project_xenia_rs_cpp_runtime_audit_2026_05_06.md) — **🔍 CPPBUG-AUDIT-001 (2026-05-06, READ-ONLY, BACKGROUND-BACKLOG)**: 0x825ED990 = CRT abort dispatcher (NOT _purecall — corrects audit-010); Sylpheed CRT is statically linked. Top-3 vtable=0 candidates REFUTED by audit-012. Real gaps for later: nt_allocate_virtual_memory silent-success-on-error (exports.rs:622-625) + heap.rs:465 silent-unmapped-write drop (combined = "phantom allocation"); mm_allocate_physical_memory_ex ignores alignment/range/protect; sync/eieio interpreter no-ops; RtlRaiseException stub doesn't fatal-stop on MSVC throws. Track for after draws>0. +- [project_xenia_rs_audit_012_vtable_zero_2026_05_06.md](project_xenia_rs_audit_012_vtable_zero_2026_05_06.md) — **🎯 KRNBUG-AUDIT-012 (2026-05-06, READ-ONLY)**: ALL FIVE bug-class hypotheses for vtable=0 REFUTED. Vtable IS correctly initialized: mem[0x40111890+0] transitions monotonic 0→0x820AD894→0x820A183C, stays. AUDIT-011's "vtable=0" was a misread (captured PC 0x8284E45C XAM thunk, treated lwz address as live PC). AUDIT-010's "vtable[1]=0x825ED990 abort" was the inner ctor's transient vtable, overwritten 3 instructions later. Real runtime vtable[1] = thunk 0x82175338 → sub_82173DC8. Discipline gate now PASSES for AUDIT-011's listener fix. Next: KRNBUG-IO-004 (real xnotify queue per kernel_state.cc:1013-1033 + xam_notify.cc) with Phase 1.5 sanity probe at 0x822f1c00 (expect CTR=0x82175338) before commit. +- [project_xenia_rs_audit_010_xnotify_diff_2026_05_05.md](project_xenia_rs_audit_010_xnotify_diff_2026_05_05.md) — **🎯 KRNBUG-AUDIT-010 (2026-05-05, READ-ONLY)**: branch (α) — xnotify_get_next + xam_notify_create_listener are stubs; canary auto-enqueues 4 startup notifications on listener registration (SystemUI/SystemSignInChanged/LiveConnectionChanged/LiveLinkStateChanged). Discipline gate FAILS box 3: vtable[1] of dispatcher (mem[0x828E1F08]) statically=0x825ED990 abort handler — needs runtime --pc-probe before fix. Provisional cascade: XamUserReadProfileSettings fires next. Trace `audit-runs/audit-010/findings.md`. Master HEAD `50a4887` unchanged. +- [project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md](project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md) — **🎯 KRNBUG-AUDIT-009 (2026-05-05, READ-ONLY DIAGNOSTIC)**: 0/21 PCs fired at -n 500M (12 audit-008-recommended renderer-cluster parents+shims+dispatcher + 9 audit-005 producer-callsites). Stop condition 1 triggered. The 0x82287000-0x82294000 cluster is structurally above its observed call boundary — likely reached via vtable/function-pointer that's never populated (sylpheed.db: zero non-call xrefs to its level-1 roots `sub_82293448`/`sub_822919C8`). Main parks in `sub_822F1AA8` frame-poll loop forever (XNotifyGetNext=1.49M, NtWaitForSingleObjectEx=1.49M, RtlEnter/LeaveCS=889k each). 18 workers spawned incl. 0x100c (tid=3, ctx=0x828F3D08), 0x1004 (tid=11, ctx=0x828F3EC0), 0x15e0 (tid=17, ctx=0x828F4070) — all parked, signal_attempts=0. canary-only exports unchanged: ExTerminateThread/KeReleaseSemaphore/XamUserReadProfileSettings. Discipline gate fails boxes 1+3. No fix. Next probe set: cluster L1 roots (sub_82293448/sub_822919C8/sub_82288028/sub_82292d80/sub_822851e0/sub_82286bc8) + new thread entries (0x822c6870/0x824563e0/0x823dde30/0x823ddb50) + main frame-poll callees + main's post-poll continuation (sub_822F1638/sub_8216F088/sub_82173360 etc). Trace at `audit-runs/audit-009/probe-500m.{log,err}` (branch-probe.trace EMPTY). +- [project_xenia_rs_audit_008_branch_probe_2026_05_05.md](project_xenia_rs_audit_008_branch_probe_2026_05_05.md) — **🎯 KRNBUG-AUDIT-008 (2026-05-05, READ-ONLY DIAGNOSTIC)**: Model reset on IO-003 cascade. 0x100c worker IS spawned post-IO-003 (tid=3, ctx=0x828F3D08, entry=0x82181830, parked on event 0x1020). Same for 0x1004 (tid=11), 0x15e0 (tid=6). Real next gate is β-class: 5 non-create-chain callers of `sub_821800D8` (shims `bl getter; lwz r3,80(r3); bl sub_824AA1D8` at 0x821802D8/06E0/0B28/0EA0/1254) are never called; parents live in 0x82287000-0x82292FFF (renderer/scene-graph). **AUDIT-009 falsified the audit-008 hypothesis: those parents are themselves not entered — gate is one level higher still.** Discipline gate failed boxes 1+4. Trace at `audit-runs/audit-008/`. +- [project_xenia_rs_io_003_ioctl_2026_05_04.md](project_xenia_rs_io_003_ioctl_2026_05_04.md) — **🎯 KRNBUG-IO-003 (2026-05-04, LANDED branch `xboxkrnl-ioctl/p0-fsctl-mountinfo`)**: `nt_device_io_control_file` real impl per canary `NullDevice::IoControl` for FsCtlCodes 0x70000 + 0x74004. **Cascade fired**: priv-11 query runs, XamTaskSchedule fires, canary-only exports 7→3, AND 0x100c worker (tid=3, ctx=0x828F3D08) + 0x1004 worker (tid=11, ctx=0x828F3EC0) + 0x15e0 worker (tid=17, ctx=0x828F4070) all spawn (the original IO-003 prediction-scorecard's "0x100c UNCREATED / spawn count unchanged" marks were wrong per AUDIT-008 — workers were always there post-IO-003, just unlinked from dispatcher addresses in the audit). 592→594 tests; lockstep deterministic. Stack args 9-10 land at `[sp+0x54]` / `[sp+0x5C]` (Xbox 360 PPC EABI param save area = sp+0x14 + 64). `sylpheed_n50m` re-baselined `50000004→50000003`, `imports 407362→407255`. Still canary-only: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`. **All 3 workers parked, signal_attempts=0** — the producer-side cascade is downstream of where IO-003 reaches; AUDIT-008 + AUDIT-009 trace it to the unreached 0x82287-0x82294 renderer cluster. +- [project_xenia_rs_audit_007_branch_probe_2026_05_04.md](project_xenia_rs_audit_007_branch_probe_2026_05_04.md) — **🎯 KRNBUG-AUDIT-007 (2026-05-04, READ-ONLY branch `investigate-sub-824a9710/p0-branch-probe`)**: `--branch-probe` instrumentation landed; runtime trace decisively identifies the priv-11 gate. **Exit branch: `0x824a9944` (post-bl sub_824ABD88 first call, r3=0xC0000034)**. Root cause: `NtDeviceIoControlFile` is `stub_success` at `exports.rs:90` — game-side `sub_824ABD88:0x824abe9c-eb0` reads `[out_buf+8]` of the FsCtlCode=0x74004 IOCTL response, finds zero (stub doesn't write OUT), assigns hardcoded `r3=0xC0000034` (STATUS_OBJECT_NAME_NOT_FOUND) at 0x824abea8-ac, propagates to caller, gates priv-11 site at `0x824a99a0` indefinitely. 592→592 tests, lockstep deterministic. **Next session = KRNBUG-IO-003**: implement `nt_device_io_control_file` per canary `NullDevice::IoControl` for FsCtlCodes 0x70000 + 0x74004. Predicted cascade: priv-11 fires + XamTaskSchedule fires + 0x100c worker spawn + 7→≤3 canary-only exports. +- [project_xenia_rs_io_002_volallocunit_2026_05_04.md](project_xenia_rs_io_002_volallocunit_2026_05_04.md) — **🎯 KRNBUG-IO-002 (2026-05-04, LANDED branch `xboxkrnl-vol-allocunit/p0-65536-cluster`)**: vol-info class-3 fixed from 2048→65536 alloc unit (canary NullDevice byte-identical). 591→592 tests, lockstep deterministic. **Audit-006's predicted 7→0 cascade FALSIFIED** (7→7 unchanged): all 16 NtQueryVolumeInformationFile calls originate from a single LR `0x82611f38` and complete successfully — vol-info is NOT the priv-11 gate. Stop condition triggered, no second fix attempted. Next session: `--pc-probe` on `sub_824A9710` entry to find the actual gate (priv-11 site has never fired in any session). +- [project_xenia_rs_audit_006_export_queue_2026_05_04.md](project_xenia_rs_audit_006_export_queue_2026_05_04.md) — **🎯 KRNBUG-AUDIT-006 (2026-05-04, READ-ONLY)**: 7/7 canary-only exports all REAL_BUT_UNREACHED → next session is **KRNBUG-IO-002** (block-size 2048→65536 in `exports.rs:1241-1269`, ≤4 LOC). Queue: `xenia-rs/audit-runs/audit-006/canary_export_queue.md`. **Cascade prediction FALSIFIED post-IO-002 — see io_002 memory.** +- [project_xenia_rs_io_nullfile_2026_05_04.md](project_xenia_rs_io_nullfile_2026_05_04.md) — **🎯 KRNBUG-IO-001 (2026-05-04, LANDED master `556a8c3`)**: `nt_read_file` on synth-empty files returns SUCCESS+0 instead of EOF, mirroring canary `NullFile::ReadSync`. AUDIT-005's attribution to `sub_824ABA98` was wrong — runtime trace decisively located the failure at the `NtReadFile(\Device\Harddisk0\partition0, off=2048, len=1024)` call inside `sub_824A9710` at PC `0x824a9810`. **`sub_824ABA98 = VerifyDirBlockSize(path, expected_alloc_unit_bytes)`**, **`sub_824ABD88 = MaybeMountAndIoctl`** (NtOpenFile WindowsPartition + NtDeviceIoControlFile 0x70000+0x4004). 590→591 tests. Lockstep BIT-IDENTICAL × 3 reruns. **Cascade walked massively: canary-only exports 10 → 7** (XeCryptSha + XeKeysConsolePrivateKeySign + NtDeviceIoControlFile now run; cache-recreate path executes through to NtWriteFile). Worker threads **6 → 19** at -n 500M (tid=10 `0x82178950` and tid=16 `0x82170430` — the original `0x1004`/`0x15e0` workers — now spawn). `n50m` golden re-baselined `50000004→50000000`, `imports 407416→407362`. **Next blockers**: (1) XamTaskSchedule cluster + ExTerminateThread/KeReleaseSemaphore/KeResetEvent/ObCreateSymbolicLink/XamTaskCloseHandle/XamUserReadProfileSettings — the 7 still-canary-only exports; (2) **block-size mismatch**: `nt_query_volume_information_file` returns `SectorsPerAllocUnit=1, BytesPerSector=2048` (=2048) but Sylpheed expects `0x10000=65536` from `main(1, 0x10000, 0xFF000)` → `sub_824A9710 r27=0x10000`; sub_824ABA98 will return `0xC000014F` when the recreate path eventually reaches it. New parked sites: 0x12fc/0x1600/0x1040/0x10b8/0x15e8/0x1014/0x101c/0x10bc/0x1044 + 0x42450b5c. `swaps=2 draws=0` plateau persists. +- [project_xenia_rs_xam_avpack_hdmi_2026_05_04.md](project_xenia_rs_xam_avpack_hdmi_2026_05_04.md) — **🎯 KRNBUG-XAM-001 (2026-05-04, LANDED)**: `XGetAVPack` `0x16` → `8` (HDMI). One-line at `xenia-kernel/src/xam.rs:383` mirroring canary `xam_info.cc:35` (`DEFINE_int32(avpack, 8)`); Sylpheed accepts `{3,4,6,8}` only (`xam_info.cc:250-251`). 589→590 tests. Lockstep deterministic across 3 reruns: `instructions=100000010, import_calls=987686 (+2.4×), VdSwap=2`; `n50m` golden re-baselined `50000005→50000004`. **Canary diff: 11 → 10 missing exports** (XGetAVPack matched). Cascade went exactly **one step**. **NEW telemetry signal**: `RtlNtStatusToDosError(c0000011)` from `lr=0x824a97e4` post-fix — `sub_824A9710` IS being entered now but priv-11 query never fires (a precondition exits early). **Next blocker: `sub_824ABA98` returning negative NTSTATUS** (per AUDIT-005 disasm); if cleared, `XeCryptSha`/`XeKeysConsolePrivateKeySign`/`NtDeviceIoControlFile`/etc. should follow. Parked handles 0x1004/0x100c/0x15e0 still `signal_attempts=0`; 9-PC producer probe still 0×; `swaps=2 draws=0` plateau persists. +- [project_xenia_rs_xex_priv_fix_2026_05_04.md](project_xenia_rs_xex_priv_fix_2026_05_04.md) — **🎯 KRNBUG-XEX-001 (2026-05-04, LANDED)**: real `XexCheckExecutablePrivilege` reading XEX `SYSTEM_FLAGS=0x00030000` bitmap (Sylpheed=`0x00000400`, bit 10 set). 588→589 tests. Lockstep deterministic at new value (`instructions=50000005, imports=407417, swaps=2, draws=0` × 3 reruns). Goldens re-baselined. **Priv-10 gate FLIPPED** — `XGetAVPack: 0→1`. Other 10 canary-only exports + 9 producer PCs + 3 parked handles still unchanged: priv-11 site at `sub_824A9710` is downstream and not reached because the AV/crypto block aborts after `XGetAVPack`. **Next blocker: `XGetAVPack` returns `0x16`** — canary returns `cvars::avpack` (default 8 = HDMI), and Sylpheed accepts only 3/4/6/8 (xenia-canary `xam_info.cc:250-251`). One-line follow-up at `xenia-kernel/src/xam.rs:383`. +- [project_xenia_rs_audit_005_priv_stub_2026_05_04.md](project_xenia_rs_audit_005_priv_stub_2026_05_04.md) — **🎯 KRNBUG-AUDIT-005 (2026-05-04, LANDED master `451b3b2`)**: --pc-probe extension + canary kernel-call diff. 9 producer PCs unreached at -n 500M (failure mode α). **Root cause: `XexCheckExecutablePrivilege` is `stub_return_zero`** — gates XGetAVPack (priv=10) and XamTaskSchedule (priv=11) via opposite polarities, so guest walks wrong arm of every priv-gated branch and skips the entire init flow that populates dispatcher fields. 11 exports canary calls and we don't (XGetAVPack/XeCryptSha/XamTaskSchedule/...). Next: implement real priv-bit lookup from XEX header. +- [project_xenia_rs_audit_004_ctor_probe_2026_05_04.md](project_xenia_rs_audit_004_ctor_probe_2026_05_04.md) — **🎯 KRNBUG-AUDIT-004 (2026-05-04, LANDED master `6a070be`)**: read-only `--ctor-probe=PC` + `--dump-addr=ADDR` diagnostics; 586→588 tests; lockstep `instructions=100000002` preserved. **DECISIVE: "8-instance pool" hypothesis FALSE** — handle 0x1004 has a SINGLE dispatcher at `0x828F3EC0`; inner per-instance ctors `[0x821783D8,0x82181750,0x821701C8]` each fire EXACTLY ONCE. The "called 8 times" claim from AUDIT-002/003 came from miscounting OUTER getter `sub_8217C850` entries — itself a Meyers singleton-getter. **Producer indirection layer IDENTIFIED**: outer getters `sub_821800D8` (0x100c) and `sub_8216F618` (0x15e0) have 5+4 non-create-chain callers using canonical pattern `bl outer_getter; lwz r3, OFFSET(r3); bl 0x824AA1D8` (OFFSET=80 for 0x100c, =36 for 0x15e0). Static byte-scan of .rdata/.data shows 0 hits → no registry table; indirection is via the singleton-getter return value. **Interpretation (2) confirmed.** 9 producer-callsite PCs ready for next-session probe to discriminate failure mode A (producer never reached) from B (producer fires but reads zero from dispatcher field). Files: `crates/xenia-kernel/src/state.rs` (`fire_ctor_probe_if_match`, `ctor_probe_pcs`, `dump_addrs`), `crates/xenia-app/src/main.rs` (CLI wiring + 128-byte struct dumper). Trace at `audit-runs/audit-004/`. +- [project_xenia_rs_audit_003_class_probe_2026_05_03.md](project_xenia_rs_audit_003_class_probe_2026_05_03.md) — **🎯 KRNBUG-AUDIT-003 (2026-05-03, LANDED master `48eed25`)**: vtable/RTTI class-readout helper + create-time + wait-time per-frame class probes. 581→586 tests; lockstep `instructions=100000002` preserved. **Identified dispatcher addresses**: handle 0x100c → `0x828F3D08` (verified by `[this+0]=-1` POD struct + sub_82181750 disasm + xref table); handle 0x15e0 → `0x828F4070` (xref table). RTTI is **stripped**; dispatchers are hand-rolled job queues, NOT C++ polymorphic classes (so no class names — `[this+0]=-1` sentinel, not a vtable). **Producer hunt deliverable**: `xrefs` table audit shows EVERY reference to 0x828F3D08 / 0x828F4070 is in a ctor or the CRT — NO submitter code references either dispatcher in static analysis. Confirms unreachable-producer hypothesis. Handle 0x1004's 8-instance pool member addresses still need offline analysis (saved-r31 in MSVC ctors didn't preserve `this`; need to hook sub_8217C850 to capture each pool element's r3). 0x42450b5c remains a separate bug class (heap-allocated, AUDIT_BLIND). Files: `crates/xenia-kernel/src/state.rs` (`read_class_at_this`, `probe_create_stack_classes`), `crates/xenia-app/src/main.rs` (WAIT-side dump). Trace at `audit-runs/audit-003/run-500m-v4.txt`. +- [project_xenia_rs_producer_stack_trace_2026_05_03.md](project_xenia_rs_producer_stack_trace_2026_05_03.md) — **🔍 KRNBUG-AUDIT-002 (2026-05-03, LANDED master `6440261`)**: multi-frame back-chain capture at `NtCreateEvent`/`NtCreateSemaphore`/`NtCreateTimer`/`XamTaskSchedule` gated on `--trace-handles-focus`; 576→581 tests; lockstep `sylpheed_n50m` BIT-IDENTICAL. **Subsystems identified**: 0x1004 = static-ctor 8-instance pool (sub_821783D8 + sub_8217C850 chain → static ctor 0x8280F810 calls bridge 8×); 0x100c = singleton built inside main() (sub_8216EA68 = main); 0x15e0 = singleton in distinct cluster (sub_82172BA0 chain). All 3 ctors share identical 4-callee shape (Rtl InitCS + silph::Event ctor + silph internals); all 3 workers do `silph::Thread::SetProcessor(CURRENT,5)` first thing. **Corrections to prior memory**: (1) third handle is **0x15e0**, not 0x15e4 (transcription typo); (2) **0x42450b5c is not a kernel handle** — it's a guest-heap pointer (0x4xxxxxxx), tid=6 parks via a non-`do_wait_single` path (` `) — separate bug class. Walker is in `state.rs::walk_guest_back_chain` (PPC EABI back-chain, gated, read-only). +- [project_xenia_rs_xaudio_register_driver_2026_05_03.md](project_xenia_rs_xaudio_register_driver_2026_05_03.md) — **🎯 APUBUG-PRODUCER-001 (2026-05-03)**: XAudio register stub replaced with canary-faithful registration + dual-mode ticker (`XAUDIO_INSTR_PERIOD=48k` / `XAUDIO_PERIOD=5.333ms`) + `try_inject_audio_callback` reusing SavedCallbackCtx; 562→576 tests. Ticker gated **default-off** behind `--xaudio-tick`/`XENIA_XAUDIO_TICK=1` so lockstep `sylpheed_n*m.json` goldens stay green. **Producer hypothesis FALSIFIED for handles 0x1004/0x100c/0x15e4** — at `-n 500M --xaudio-tick` all 3 still show `signal_attempts=0`. Side-effect: under the flag the audio callback fires once, hijacks a guest HW thread on a `KeWaitForSingleObject` infinite loop (4M waits, swaps regress 2→1). Next candidate: **Timer DPC** (`KeSetTimer` / `KeInsertQueueDpc`). Master HEAD `9d45efe`. +- [project_xenia_rs_xam_task_schedule_2026_05_03.md](project_xenia_rs_xam_task_schedule_2026_05_03.md) — **🎯 XAMBUG-PRODUCER-001 (2026-05-03)**: XamTaskSchedule stub replaced with canary-faithful real spawn; 561→562 tests; lockstep `instructions=100000002` preserved. **Producer hypothesis FALSIFIED for handles 0x1004/0x100c/0x15e4** — counter `kernel.calls{XamTaskSchedule}` never appears at -n 500M (call site `0x824a9a10` unreached). Boot stalls before XamTaskSchedule. Next candidate: `XAudioRegisterRenderDriverClient` (counter=1, currently stub). Master HEAD `38f78c8`. +- [project_xenia_rs_audit_2026_05_followup_session.md](project_xenia_rs_audit_2026_05_followup_session.md) — **🎯 FOLLOW-UP SESSION COMPLETE (2026-05-03)**: 3 audit IDs landed (GPUBUG-DRAIN-001 vd_swap fallback warning silenced + new `drain_until_wptr`; KRNBUG-AUDIT-001 ghost-trail diagnostic with `--trace-handles-focus`; KRNBUG-D08 wall-clock vsync under `--parallel`). Tests 556→561. Lockstep BIT-IDENTICAL. **DECISIVE FINDING**: parked-waiter handles 0x1004/0x100c/0x15e4 show `signal_attempts=0 (primary=0, ghost=0)` after 500M instructions — producer is genuinely missing, **NOT a wake-eligibility bug or BST-paradox**. 3 share creator `lr=0x824a9f6c` + wait-wrapper `lr=0x824ac578`. Next session: producer hunt (file I/O completion / XAM async / XAudio buffer-complete / Timer DPC). Master HEAD `b54aa48`. +- [project_xenia_rs_fix_session_2026_05_03.md](project_xenia_rs_fix_session_2026_05_03.md) — **🛠️ AUDIT FIX SPRINT (2026-05-03)**: applied 11 commits closing 12 audit IDs across 4 of 8 planned phases. **swaps 1→2** confirmed (Phase A SWAPBUG-001). VdSwap PM4 ring path live (Phase C). Shader operand decode fixed (D1/D2/D3). 8 register addresses + index_size bit corrected (E). Kf-spinlock real impl (F1). 2 P1s (G1 GPUBUG-006 mmio ordering, G2 XMODBUG-002 write_bulk page bumps). **`draws=0` persists at -n 100M lockstep** — renderer plateau is multi-causal, parked-waiter handles still unresolved. Next session: trace producers for handles 0x1004/0x100c/0x15e4/0x42450b5c. Tests 551→556. Plan: `we-just-finished-a-shiny-conway.md`. **Engineering gotchas saved**: VdSwap buffer_ptr is NOT in primary ring; D1's c-vs-temp selector is at w0 bits 29-31 not bit 7; canary's addic actually does full 64-bit add (Plan agent was wrong, G3 deferred); `--stable-digest` flag added to xenia-rs check for byte-exact lockstep determinism. +- [project_xenia_rs_audit_2026_05_02.md](project_xenia_rs_audit_2026_05_02.md) — **🎯 COMPREHENSIVE AUDIT COMPLETE (2026-05-02)**: 13-milestone read-only audit of all modules vs canary. **197 finding IDs (15 P0, 40 P1) across 9 prefixes**. **SWAP REGRESSION SOLVED**: SWAPBUG-001 = PPCBUG-001 (addi 32-bit truncation in `bf8208e` at `interpreter.rs:114-118`) — single revert restores swaps=2. **Renderer plateau explained (multi-causal)**: VdSwap PM4 ring bypass + 5 P0 GPU shader/draw bugs (operand modifiers, constant-reg selector, vertex endian, 8 register addresses). Memory write-visibility NOT broken. Parked-waiter handles still unexplained. Final report: `xenia-rs/audit-2026-05-final.md`. +- [project_xenia_rs_ppc_audit_2026_04_29.md](project_xenia_rs_ppc_audit_2026_04_29.md) — **🔍 PPC AUDIT COMPLETE (2026-04-29)**: 253 PPCBUG IDs (~55 HIGH, ~75 MEDIUM, 5 retracted). Audit-only, no code changes. **Triaged fix-order plan at `xenia-rs/audit-report-2026-04-29.md`** — start there for fix session. Detailed per-bug entries at `xenia-rs/audit-findings.md`. **Headline finds**: PPCBUG-107 cascade (50+ stores missing `invalidate_for_write` → cross-thread atomics broken, likely Sylpheed renderer cause); 8 decoder/field-extraction bugs collapse into 6 missing accessors + 1 wrong sh64 + 1 missing decode_op6 entry (Phase 2 sweep); PPCBUG-046 (`clrldi r3, r4, 32` no-op); PPCBUG-053+054 (broken `bdnz` after `negx`); PPCBUG-510 (stvewx128 corrupts 12 bytes); PPCBUG-424/425 (vmaddfp128/vmaddcfp128 operand swap — every D3D FMA wrong). 14 must-land-together coupling pairs documented. Audit verified mechanically: every tracker entry referenced in the report. +- [project_xenia_rs_addis_signext_root_cause_2026_04_29.md](project_xenia_rs_addis_signext_root_cause_2026_04_29.md) — **🎯 ROOT CAUSE FIX (2026-04-29)**: addis was sign-extending simm16 to 64 bits per PPC ISA, but Xbox 360 user code runs in 32-bit ABI. When sign-extended addis result mixed with zero-extended lwz value, the 64-bit unsigned subfc compare yielded wrong CA, breaking BST traversals. Fix: truncate addis result to 32 bits (`result as u32 as u64`). Throw at sub_82175F10→sub_82454770 fully silenced WITHOUT the r31=14 hack (now removed). All 506+ tests pass. -n 4B runs clean. Renderer plateau at swaps=2 persists — not caused by the addis bug. Lookup other simm16-immediate instructions (`addi rD,r0,...`, `addic`, `subfic`) for similar bugs if more issues surface. +- [project_xenia_rs_sylpheed_event_chain_2026_04_29.md](project_xenia_rs_sylpheed_event_chain_2026_04_29.md) — **Stage 3 Path A traced + DECISIVE FINDING (2026-04-29)**: The BST callback-walker hypothesis is RULED OUT (BST module has no walker; only 2 indirect calls in the module are byte-walkers for string transforms). HOWEVER traced upstream and **found that 0x828F3F68 IS registered in the BST by sub_82175E68 at instruction 0x82179134, eight instructions before the validation site sub_82175F10 at 0x82179144 — same function, same thread, sequential execution**. This means the PPC validator's failure to find 0x828F3F68 in the just-populated BST is the PRIMARY bug. **Our throw fix masks it but doesn't fix it.** The likely same memory-coherence issue prevents event 0x1004's signal from being visible to its waiter. Next session: trace specific guest-memory addresses (e.g. `0x40249F68`) at the emulator level, log every write+read with PC, find the visibility bug. This is the unresolved paradox from [project_xenia_rs_sylpheed_throw_2026_04_28.md](project_xenia_rs_sylpheed_throw_2026_04_28.md) — now confirmed as load-bearing. +- [project_xenia_rs_sylpheed_stage3_2026_04_29.md](project_xenia_rs_sylpheed_stage3_2026_04_29.md) — **Stage 3 thread-state map (2026-04-29)**: post-throw-fix run at -n 4B confirms deadlock isn't slow-init. 10 worker threads parked, 4 of them on `mr=true` events with `sig=false`: handle 0x1004 (tid=10, sub_82178950), 0x100c (tid=2, sub_82181830), 0x15e4 (tid=16, sub_82170430), 0x42450b5c (tid=6, sub_824CD458). tid=1 main is in a healthy frame-poll loop (PC=0x822F1E00 inside sub_822F1AA8). The throw fix is necessary but not sufficient — Sylpheed renderer cascade has additional breaks. Next session candidates: (A) trace producer for event 0x1004, (B) per-handle NtSetEvent telemetry, (C) Canary diff. +- [project_xenia_rs_sylpheed_throw_fix_2026_04_29.md](project_xenia_rs_sylpheed_throw_fix_2026_04_29.md) — **Sylpheed throw silenced (2026-04-29)**: `rtl_leave_critical_section` HLE detects the failing BST validation (cs=0x828F3DA8, lr=0x824546C8, our Rust CEIL finds the node, but PPC computed r31=32) and overrides ctx.gpr[31]=14 → sub_82454600 returns valid → no throw. Game advances to loading renderer resources (ptc_pack.xpr) + spawning all 18 worker threads. **But draws=0 plateau persists** — Stage 2 gate NOT met. The PPC-vs-Rust traversal paradox remains unexplained. Workers park on unsignaled events (Stage 3 territory). +- [project_xenia_rs_sylpheed_throw_2026_04_28.md](project_xenia_rs_sylpheed_throw_2026_04_28.md) — **Sylpheed VdSwap=2 plateau diagnosed (2026-04-28)**: rtl_raise_exception rewritten with correct EXCEPTION_RECORD layout + 6-level PPC stack walk + runtime_error decoder (one-shot via new `KernelState::cxx_throw_logged`). Single throw on tid=1 at ~1.2s: `std::runtime_error("lhs is not valid instance")` at PC `0x824547e4` in `sub_82454770` (a generic intrusive-list validator with 29 callers, called from a chain inside `silph::Silph::Impl::OnInit`'s config-tree walker). Canary's RtlRaiseException is also a stub — so the divergence is upstream. Memory file lists next-session candidates (trace registry, or implement minimal SEH). +- [project_xenia_rs_hle_import_fixes_2026_04_27.md](project_xenia_rs_hle_import_fixes_2026_04_27.md) — **HLE import fixes (2026-04-27)**: KeInitializeSemaphore now seeds count/limit (was zero-fill), XexGet{Module,Procedure}Address use distinct `HMODULE_XBOXKRNL`/`HMODULE_XAM` pseudo-handles + reverse `(ModuleId,ordinal)→thunk_addr` map populated from main.rs Phase 1. 76 kernel tests pass; -n 30M --parallel still reaches VdSwap=2 with unimpl=0. +- [project_xenia_rs_disasm_unify_phase4.md](project_xenia_rs_disasm_unify_phase4.md) — **Disassembler unification Phase 4 complete (2026-04-27)**: assert-based JSON-fixture goldens for base/extended/VMX128 mnemonics + 7 VMX128 accessor unit tests + analysis-shim parity test + DB schema golden (PRAGMA table_info per-table, 5 SQL views). Old println-only audits deleted. All 4 phases complete; constraints honored end-to-end. +- [project_xenia_rs_disasm_unify_phase3.md](project_xenia_rs_disasm_unify_phase3.md) — **Disassembler unification Phase 3 complete (2026-04-27)**: db.rs split into `ingest_instructions` + `write_analysis_results`; new `target_hex` column on instructions; `sql_views.rs` defines 5 additive views; new `--analyze=rust|sql|both` flag (default rust). Cross-check confirms Rust and SQL agree on 299,615 branch xrefs; reachability: 7,557/12,156 functions (62%) reachable from entry. Two bugs found+fixed: kind-tag mismatch (xrefs.kind uses short `br`/`j`/`call`, not long names) and reachability seed-collapse. +- [project_xenia_rs_disasm_unify_phase2.md](project_xenia_rs_disasm_unify_phase2.md) — **Disassembler unification Phase 2 complete (2026-04-27)**: `iter_disasm` iterator in xenia-cpu yields `DisasmItem`s; `enrich_section` adds analysis context; 3 sinks (text/JSON/DuckDB) consume `RichDisasmItem`. New `xenia dis --json` flag. db.rs and formatter.rs both drive through the iterator. End-to-end smoke verified: 1.87M rows match between DB and JSONL. +- [project_xenia_rs_disasm_unify_phase1.md](project_xenia_rs_disasm_unify_phase1.md) — **Disassembler unification Phase 1 complete (2026-04-27)**: single source of truth in `xenia-cpu/disasm.rs` (`format` returns `DisasmText`); analysis `ppc.rs` collapsed 1374→30 LOC shim; `DecodedInstr` unchanged at 8 bytes; silent VMX128 bit-position bug fixed. Phases 2-4 (iterator+sinks, ingest/analyze split + SQL views, fixture goldens) pending. +- [project_xenia_rs_m3_realpar_step_08.md](project_xenia_rs_m3_realpar_step_08.md) — **M3 real-par Step 08 / SESSION COMPLETE (2026-04-27)**: real per-HW-thread parallelism landed. N=6 workers + coord + 7-party phaser. 430 tests; 4 lockstep combos match golden; --parallel boots sylpheed to VdSwap=2 in 57s; 20× stress passed. **Perf gate NOT met** — --parallel ~24× slower than lockstep (target was 1.5× faster); deferred parking (Step 05) is the next session's first task. +- [project_xenia_rs_m3_realpar_step_06_07.md](project_xenia_rs_m3_realpar_step_06_07.md) — **M3 real-par Step 06+07 (2026-04-27)**: stress harness (parallel_stress.rs) — 20×@5M passed; perf gate measured — 24× slowdown vs lockstep. parallel_stress_long (100×@50M) wired but #[ignore]-gated (impractical at current perf). +- [project_xenia_rs_m3_realpar_step_05.md](project_xenia_rs_m3_realpar_step_05.md) — **M3 real-par Step 05 (2026-04-27)**: slot-wake parking attempted but DEFERRED. TOCTOU race between coord's mask publish and worker's mask read across round boundaries — the phaser counter wrapped, B2 timed out. Reverted to Step 04 design (workers always arrive at B1). Documented 3 race-free alternatives for follow-up. +- [project_xenia_rs_m3_realpar_step_04.md](project_xenia_rs_m3_realpar_step_04.md) — **M3 real-par Step 04 (2026-04-27)**: real N=6 workers + main-thread coordinator + 7-party phaser via thread::scope. 5 lockstep combos match golden; --parallel digest diverges ~7 instr at -n 2M (expected); -n 30M --parallel reaches VdSwap=2 with halts==0. ~18x slower than lockstep (Step 05+07 will address). +- [project_xenia_rs_m3_realpar_step_03.md](project_xenia_rs_m3_realpar_step_03.md) — **M3 real-par Step 03 (2026-04-26)**: run_execution_parallel with per-round drop/reacquire around step_block; --parallel branch routes through it. Single worker; 430 tests; 6 golden combos match; sylpheed -n 30M --parallel reaches VdSwap=2 (3866ms). +- [project_xenia_rs_m3_realpar_step_02.md](project_xenia_rs_m3_realpar_step_02.md) — **M3 real-par Step 02 (2026-04-26)**: per-slot body split into worker_prologue + worker_epilogue, WorkerCtx owns per-HW-slot block+decode cache. Lockstep bit-identical; 430 tests; 6 golden combos match. +- [project_xenia_rs_m3_realpar_step_01.md](project_xenia_rs_m3_realpar_step_01.md) — **M3 real-par Step 01 (2026-04-26)**: coord_pre_round/idle_advance/post_round + RoundCtl carved out of run_execution. Pure motion refactor; 430 tests pass; all 6 golden combos match. +- [project_xenia_rs_m3_followup_real_parallelism_plan.md](project_xenia_rs_m3_followup_real_parallelism_plan.md) — **HAND-OFF (2026-04-26)**: precise design for the N=6 spawn follow-up session. Includes worker-loop pseudocode, coordinator-thread responsibilities, 9 specific concurrency hazards to handle, ~250-350 line size estimate, and the verification matrix the session must pass. Read this first before starting M3-real-parallelism work. +- [project_xenia_rs_m3_step_08_verification.md](project_xenia_rs_m3_step_08_verification.md) — **M3 session complete (2026-04-26)**: phaser, per-thread block caches, --parallel spawn (N=1 substrate), reservation table activation, full verification. 411 tests pass; all 6 flag combos golden-match; sylpheed -n 30M --parallel reaches VdSwap=2 with halts==0. Per-step memory files: project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md, project_xenia_rs_m3_step_07_reservation_activation.md, project_xenia_rs_m3_step_08_verification.md. N=6 actual parallelism deferred per the followup memo. +- [project_xenia_rs_concurrency_m3_progress.md](project_xenia_rs_concurrency_m3_progress.md) — earlier (superseded) M3 status doc; kept for context but the step-files are authoritative. +- [project_xenia_rs_concurrency_m2_progress.md](project_xenia_rs_concurrency_m2_progress.md) — **M2 substantively complete** (2026-04-26): ReservationTable, ThreadRef gen-packing, atomic bump allocators, per-slot pending_local_irq, --reservations-table flag. M2.6/M2.7 (KernelStateInner + per-slot Mutex) deferred to M3. 405 tests pass; sylpheed -n 2M golden matches all flag combos. +- [project_xenia_rs_concurrency_m1_progress.md](project_xenia_rs_concurrency_m1_progress.md) — **M1 complete** (2026-04-26): default GPU backend is threaded; DrainFence RPC + parker + fence helpers all live; 395 tests pass; sylpheed -n 2M golden matches both modes; VdSwap=1/=2 fire end-to-end. +- [project_xenia_rs_current_state.md](project_xenia_rs_current_state.md) — **start here** — where Sylpheed boot sits now, active blockers, investigation tools, memory caveats +- [project_xenia_rs_scheduler.md](project_xenia_rs_scheduler.md) — **scheduler architecture (post-2026-04-23 refactor)** — 6 HW slots + per-slot runqueues, ThreadRef identity, bind-and-migrate affinity +- [project_xenia_rs_ui.md](project_xenia_rs_ui.md) — stable architecture: threading bridge, GPU pipeline, MMIO, scheduler, HLE primitives, HUD, observability +- [project_xenia_rs_cli.md](project_xenia_rs_cli.md) — CLI commands, flags, env vars, DB table layering +- [project_xenia_rs_desktop_app.md](project_xenia_rs_desktop_app.md) — desktop app UI/UX (disasm/debugger/analyzer share one workspace) +- [project_xenia_rs_edram_resolve_gap.md](project_xenia_rs_edram_resolve_gap.md) — TILE_FLUSH byte copy now lands (clear-resolve + bitwise-equivalent 32bpp); file lists smaller remaining gaps + backlog order +- [project_xenia_rs_duckdb.md](project_xenia_rs_duckdb.md) — analysis DBs are **DuckDB**, not SQLite despite `.db` extension — use `python3 -c "import duckdb; ..."` +- [project_xenia_rs_perf_tier4.md](project_xenia_rs_perf_tier4.md) — Tier-4 perf landed (2026-04-25): MMIO fast-reject + basic-block cache + GPU pacer; 318→136 ms (2.3×); `XENIA_FORCE_PER_INSTR=1` env var for A/B +- [project_xenia_rs_handle_audit.md](project_xenia_rs_handle_audit.md) — **2026-04-25 session**: `--trace-handles` audit harness landed, original HLE sync gap no longer reproduces at -n 500M, MSAA averaging + 64bpp source/clear-resolve in resolve.rs, wgpu RT readback deferred (foundation in place) diff --git a/migration/claude-memory/project_xenia_rs_addis_signext_root_cause_2026_04_29.md b/migration/claude-memory/project_xenia_rs_addis_signext_root_cause_2026_04_29.md new file mode 100644 index 0000000..4501f9c --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_addis_signext_root_cause_2026_04_29.md @@ -0,0 +1,74 @@ +--- +name: addis sign-extension fix — Sylpheed C++ throw root cause solved (2026-04-29) +description: Fixed addis to truncate to 32 bits (per Xbox 360 32-bit user-mode ABI). Root cause of the "lhs is not valid instance" throw in sub_82175F10 → sub_82454770. Replaces the prior r31=14 hack. +type: project +originSessionId: c44cbfc2-438f-45c9-996c-06eddf9dcb93 +--- +## The bug + +[xenia-rs/crates/xenia-cpu/src/interpreter.rs:157](xenia-rs/crates/xenia-cpu/src/interpreter.rs#L157) — `addis` (and `lis` which is `addis rD, r0, simm`) was producing a 64-bit sign-extended result per PowerPC ISA (`(simm16 as i64 as u64) << 16` → `0xFFFFFFFF_xxxx0000` for negative simm16). Xbox 360 user code runs in 32-bit ABI: pointers are 32 bits, but the compiler emits 64-bit instructions. Subsequent `subfc/subfe` operations operate on full 64-bit values — when one operand was sign-extended (from `lis`) and the other was zero-extended (from `lwz`), the unsigned 64-bit comparison `rb >= ra` produced wrong CA. This cascaded: `subfe` produced -1 instead of 0; `rlwinm` extracted bit 31 = 1; `cmpli` got NOT-EQ; the conditional branch took the wrong path. + +**Concretely**: in `sub_82454600` (BST CEIL search), at the comparison `subfc r8, r30, r8` where r30=lhs=0x828F3F68 and r8=key=0x828F3F98: +- With sign-extended r30=0xFFFFFFFF_828F3F68 vs zero-extended r8=0x00000000_828F3F98: `rb >= ra` = false → CA=0 → traversal goes RIGHT +- Correct (32-bit-ABI) behavior: `0x828F3F98 >= 0x828F3F68` → CA=1 → traversal goes LEFT (matching the registered key) + +## The fix + +```rust +PpcOpcode::addis => { + let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16); + ctx.gpr[instr.rd()] = result as u32 as u64; // ← truncate to 32 bits + ctx.pc += 4; +} +``` + +Truncates the result to 32 bits, simulating Xbox 360 user-mode 32-bit ABI behavior. + +## Verification + +``` +cargo test --workspace --release → all pass (506+ tests) +xenia-rs check sylpheed.iso -n 100M → swaps=2, no RtlRaiseException +xenia-rs check sylpheed.iso -n 4B → swaps=2, no RtlRaiseException, instructions=4B +xenia-rs check sylpheed.iso -n 30M --parallel → swaps=2, no RtlRaiseException +``` + +The throw fully silenced WITHOUT the validator-r31-override hack. RtlRaiseException count = 0. + +## What was reverted (the prior workaround) + +The hack from [project_xenia_rs_sylpheed_throw_fix_2026_04_29.md](project_xenia_rs_sylpheed_throw_fix_2026_04_29.md) (`if match_found && r31==32 { ctx.gpr[31] = 14 }` in `rtl_leave_critical_section`) is **REMOVED**. With the addis fix, the PPC validator naturally produces r31=14 for matched lookups. + +## Diagnostic instrumentation removed + +- All VTRACE / MEM-WATCH / BST-INSTR diagnostic blocks in `rtl_enter_critical_section`, `rtl_leave_critical_section`, `step_block`, `step_cached`, `read_u32/u8`, `write_u32/u8`. The codebase is back to clean. Only the addis fix remains. + +## Why this is correct + +PowerPC ISA 2.07 says `addis` sign-extends simm16 in 64-bit mode. Real Xbox 360 hardware does this too — but it ALSO has MSR.SF tracking, and Xbox 360 user code typically runs with MSR.SF=0 (32-bit mode), where 64-bit arithmetic ops effectively operate on the low 32 bits. We don't track MSR.SF; truncating addis to 32 bits is the simplest correct equivalent. + +## Files touched + +- [xenia-rs/crates/xenia-cpu/src/interpreter.rs](xenia-rs/crates/xenia-cpu/src/interpreter.rs) — `addis` truncates to 32 bits. +- [xenia-rs/crates/xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — removed `rtl_enter_critical_section` and `rtl_leave_critical_section` diagnostic blocks (BST CEIL search instrumentation, validator r31 hack). +- [xenia-rs/crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs) — removed memory watchpoint mechanism. + +## Stage status (per plan `yes-take-any-action-noble-dragon.md`) + +- **Stage 1 (diagnose throw)**: ✅ Complete (prior session). +- **Stage 2 (eliminate throw)**: ✅ **Now properly resolved at root cause** via the addis fix. **No throw on -n 4B run.** +- **Stage 3 (verify draw → wgpu pipeline)**: gate `draws > 0` STILL not met. The renderer plateau at swaps=2 is now confirmed to be a separate problem from the throw — fixing addis didn't unlock draws. + +## Open question for next session + +The renderer plateau persists. Three threads (in [project_xenia_rs_sylpheed_event_chain_2026_04_29.md](project_xenia_rs_sylpheed_event_chain_2026_04_29.md)) are blocked on `mr=true` events that nobody signals (handles 0x1004, 0x100c, 0x15e4, 0x42450b5c). The hypothesis that this was caused by the same memory-coherence bug as the throw is now ruled out — the addis fix didn't unblock them. So the missing event signaling has a different root cause. Next directions: (a) find similar instruction-level bugs (the addis fix may not be the only one — investigate what else might be subtly wrong); (b) Canary boot trace diff; (c) study what guest function should signal each blocked event by walking the throw-site call chain post-fix to see how far the renderer init progresses now. + +## Are there other instructions that need the same treatment? + +Possibly. Other instructions that take a sign-extended immediate and could leave the upper 32 bits set: +- `addi rD, 0, simm` (= `li`) — when simm has high bit set, produces sign-extended value +- `addic`, `addic.`, `subfic` — same pattern +- Generally any instruction that takes a simm16 and adds it to r0 + +These haven't been observed to break anything yet; if more bugs surface, apply the same `as u32 as u64` truncation. The cleanest long-term fix is to track MSR.SF and conditionally truncate, but that's a larger architectural change. diff --git a/migration/claude-memory/project_xenia_rs_analysis_overhaul_2026_05_08.md b/migration/claude-memory/project_xenia_rs_analysis_overhaul_2026_05_08.md new file mode 100644 index 0000000..905f23d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_analysis_overhaul_2026_05_08.md @@ -0,0 +1,149 @@ +--- +name: analysis-toolset overhaul (M1-M12 + 5 closers ALL landed) 2026-05-08+10 +description: Full overhaul of static analyzer + probe CLI. Nine no-ff merged feature branches off `e061e21`; final master `7bc9e3a`. All 4 HIGH (M1-M4), all 3 MEDIUM (M5-M7), all 5 LOW (M8-M12) milestones, plus all 5 deferred closers (M5.5, M9.5, M11.5, VMX, SJIS+UTF-8) landed. +type: project +originSessionId: dda71bb5-172f-43d2-b233-1f1769c3f41d +--- +**🎯 ANALYSIS-OVERHAUL FULL CLOSE-OUT (2026-05-10, LANDED, master `7bc9e3a`)**: +nine no-ff merged feature branches close ALL 12 planned milestones plus +the 5 deferred closers. End state: 17 distinct analysis layers / passes. + +## All landed merges since `e061e21` + +| Commit | Milestone(s) | Branch | +|---|---|---| +| `fd68285` | M1 | m1-pdata-boundaries | +| `bd57533` | M2 | m2-demangler | +| `3bd77ab` | M3 | m3-vtables-rtti | +| `0209e88` | M4 | m4-classaware-probes | +| `81c90f9` | M5 + M7 | m5-indirect-reach | +| `85d1603` | M6 | m6-extended-stores | +| `9028021` | M8 + M9 + M10 + M11 + M12 | m9-eh-flag | +| `b03192c` | **M5.5** | m5.5-this-flow | +| `7bc9e3a` | M9.5 + M11.5 + VMX + SJIS/UTF-8 | vmx-stores | + +## Final Sylpheed yield (master `7bc9e3a`) + +``` +M1 functions: 25,481 (was 12,156) + pdata_validated: 23,073 + has_eh: 2,975 (12.9%) [M9 derive] +M2 demangled_names: 0 (Sylpheed has no ?… symbols) +M3 vtables/classes/methods: 722 / 499 / 5,571 +M4 --pc-probe='Class::*' resolves Class::method tokens via labels. +M5 ind_call (static-vtable): 0 (Sylpheed dispatches via this->vptr) +M5.5 vptr_writes: 567 (214 vtables, 29 offsets, off 0 = 88%) + indirect_dispatch_sites: 6,842 (97 single + 6,745 multi-candidate) + indirect_dispatch_candidates: total candidate edges feed xrefs + ind_call rows total: 687,963 (M5 ∪ M5.5) + newly reachable in BFS: 2,746 (audit-009 cluster surfaces fns) +M6 xrefs.addr_mode populated for all data rows. + 442 x_form_indexed reads + 40 atomic writes + 110 stvx writes (VMX). +M7 strings: ascii 6,311 / utf16le 0 +SJIS+UTF-8 follow-up: shift_jis 790 / utf8 39 +M8/M11 function_pointer_arrays: 1,110 (722 vtable + 388 dispatch + 0 static_init) + function_pointer_array_entries: 6,347 slots +M9.5 eh_funcinfo: 2,588 (all magic 0x19930522) + eh_unwind_map entries: 10,019 + eh_try_blocks: 315 +M10 tls_info / tls_callbacks: 0 / 0 (Sylpheed has no .tls) +M11.5 static-init drivers: 0 (no canonical _initterm shape) +M12 --lr-trace JSONL output verified at entry-point PC. +``` + +## Methodology verification + +- **Tests: 605 → 655** (+50 across the 17 layers + closers). 0 failing. +- **Lockstep determinism**: `check sylpheed.iso --stable-digest -n 2M` ×2 + → byte-identical (`instructions=2000005`). No runtime regression. +- **End-to-end class probe**: `--pc-probe='ANON_Class_6B674251::*'` + resolves 45 PCs. +- **End-to-end M12 LR-trace**: writes well-formed JSONL on PC fire. +- **End-to-end M5.5**: 97 single-candidate dispatches resolve to a + unique `class.method`; multi-candidate sites bound 6,745 dispatches + to candidate sets averaging ~100 vtables each. + +## All new CLI flags & env vars + +### `xenia-rs exec` +| Flag | Env var | Source | Description | +|---|---|---|---| +| `--probe-db PATH` | `XENIA_PROBE_DB` | M4 | DB for resolving symbolic probe tokens | +| `--lr-trace=PC[,…]` | `XENIA_LR_TRACE` | M12 | JSONL records (pc/tid/hw/cycle/r3-r6/lr) | +| `--lr-trace-out=PATH` | `XENIA_LR_TRACE_OUT` | M12 | File sink for `--lr-trace` | + +Existing `--pc-probe` / `--branch-probe` / `--ctor-probe` accept +symbolic tokens (M4): `0xADDR` (numeric), `Class::method`, `Class::*`, +`function_name`. + +## All new DB tables + +``` +NEW TABLES BY LAYER: + pdata_entries (M1) + demangled_names (M2) + vtables / methods / classes (M3) + strings (M7 + SJIS/UTF-8 closer) + vptr_writes (M5.5) + indirect_dispatch_sites (M5.5) + indirect_dispatch_candidates (M5.5) + tls_info / tls_callbacks (M10) + function_pointer_arrays (M8/M11/M11.5) + function_pointer_array_entries (M8/M11/M11.5) + eh_funcinfo (M9.5) + eh_unwind_map (M9.5) + eh_try_blocks (M9.5) + +NEW COLUMNS: + functions.pdata_validated BOOLEAN (M1) + functions.pdata_length BIGINT (M1) + functions.has_eh BOOLEAN (M9) + xrefs.addr_mode VARCHAR (M6) + +NEW xrefs.kind value: + 'ind_call' (M5 + M5.5) + +NEW VIEW: + v_indirect_reachability_from_entry (M5) +``` + +## Practical follow-up queries + +```sql +-- Linker-validated functions +SELECT * FROM functions WHERE pdata_validated; + +-- High-confidence M5.5 dispatch resolutions +SELECT s.dispatch_pc, s.vptr_offset, s.slot, v.class_name, c.method_address +FROM indirect_dispatch_sites s +JOIN indirect_dispatch_candidates c ON c.dispatch_pc = s.dispatch_pc +JOIN vtables v ON v.address = c.vtable_address +WHERE s.candidate_count = 1; + +-- Try/catch range names per FuncInfo +SELECT fi.address, t.try_low, t.try_high, t.catch_high, t.n_catches +FROM eh_funcinfo fi JOIN eh_try_blocks t ON t.funcinfo_address = fi.address; + +-- Japanese strings (SJIS hex-escaped) referenced by code +SELECT s.address, s.length, s.content FROM strings s +WHERE s.encoding = 'shift_jis' AND s.length >= 20; + +-- Audit-009 cluster newly reachable via M5.5 ind_call +SELECT format('0x{:X}', addr) FROM v_indirect_reachability_from_entry +WHERE addr BETWEEN 2184949760 AND 2184957952 + AND addr NOT IN (SELECT addr FROM v_reachability_from_entry); +``` + +## Deferred / future work (sketched in SCHEMA.md only) + +- **M9.6** — link `eh_funcinfo` records to owning functions via + `bl __CxxFrameHandler` registration sites + per-try-block + `pHandlerArray` parsing. +- **M11.6** — relax M11.5 to detect non-canonical static-init drivers. +- Full SJIS → UTF-8 decoding (currently rendered as `\xHH` escapes). +- VMX128 (opcode 4) vector-store xrefs. + +Master HEAD `7bc9e3a`. Tests 655. swaps=2 draws=0 plateau intact (no +runtime semantics changed). The full toolchain is now in place: M5.5 +specifically opens the `this->vptr` dispatch space that the audit-009 +renderer cluster needed. diff --git a/migration/claude-memory/project_xenia_rs_audit_003_class_probe_2026_05_03.md b/migration/claude-memory/project_xenia_rs_audit_003_class_probe_2026_05_03.md new file mode 100644 index 0000000..a597d29 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_003_class_probe_2026_05_03.md @@ -0,0 +1,280 @@ +--- +name: KRNBUG-AUDIT-003 vtable/RTTI class probe + dispatcher addresses identified +description: 2026-05-03 — runtime class-readout for parked-waiter handles. RTTI is stripped; dispatchers are POD job-queues, not C++ classes with vtables. Identified static dispatcher addresses 0x828F3D08 (handle 0x100c) and 0x828F4070 (handle 0x15e0). Producer hunt deliverable: NO submitter code references either struct in static analysis. +type: project +originSessionId: 060e5bd6-ab54-4f52-919f-e60bfc69f9c7 +--- +# KRNBUG-AUDIT-003 — Class probe at handle creation/wait + dispatcher identification + +**Status:** landed on master at `48eed25` (no-ff merge of feature +branch `xam-handle-stack-trace/p0-class-probe` over the AUDIT-002 +merge `6440261`). Diagnostic-only, read-only, lockstep-preserved. + +## Problem framing + +Coming out of KRNBUG-AUDIT-002 we knew the back-chain at handle creation +for handles 0x1004 / 0x100c / 0x15e0, but we couldn't identify the +*owning subsystem object*. The wrapper `lr=0x824a9f6c` is the same +`silph::Event` ctor for 83 unique callers, so the immediate LR is +useless for subsystem identification. The promise of AUDIT-003 was +"recover the dispatcher's MSVC C++ class name via vtable[-4] → +RTTICompleteObjectLocator → TypeDescriptor." + +## What landed (`crates/xenia-kernel/src/state.rs`) + +- `pub enum ClassReadout { Named, VtableOnly, NotAnObject }` — outcome + of probing `[this]` as a C++ object. +- `pub fn read_class_at_this(this, mem) -> ClassReadout` — reads + vtable, traverses MSVC RTTI to recover decorated name. False-positive + guard: vtable's first two slots must be image-range function pointers + (rejects the static-init iterator case where `this` is a CRT + init-fn-array entry and `[this]` is a function PC, not a vtable). +- `pub fn probe_create_stack_classes(ctx, frames, mem) -> Vec` + — at handle creation time, walks the captured frames; for frame 0 + uses live `ctx.gpr[31]` / `r30` / `r3`; for frame K ≥ 1 reads + `[fp - 12]` and `[fp - 16]` (the standard PPC EABI prologue spill + slots). Emits one always-on raw line per frame plus zero-or-more + `→` annotated class lines for hits. The raw line is gold for + offline lookup even when RTTI rejects. +- `audit_create_with_ctx` routes through + `record_create_with_stack_and_probes` so every create site (4 of + them: NtCreateEvent / NtCreateSemaphore / NtCreateTimer / + XamTaskSchedule) gets the probe. + +## What landed (`crates/xenia-app/src/main.rs`) + +- `dump_thread_diagnostic(kernel, mem, quiet)` — adds `mem` parameter + so the WAIT-side dump can probe stack memory of parked threads. +- For each focus handle, after the DIAGNOSIS line the dump emits one + `WAIT-THREAD` block per parked waiter, showing the parked thread's + ctx (pc/lr/sp/r3/r30/r31), the back-chain frames, and per-frame + saved r28..r31 values. Each frame's saved-r31 / saved-r30 are also + fed through `read_class_at_this` for class lookup. + +## What we found + +### Handle 0x100c (tid=2, Event/Manual, singleton in main()) + +**Created stack** (verified again at -n 500M): +``` +[0] sub_824A9F18 +0x54 silph::Event ctor wrapper +[1] sub_82181750 +0x70 per-instance ctor — SETS this = 0x828F3D08 +[2] sub_821800D8 +0x3c single-call bridge ctor +[3] sub_82181C20 +0x38 subsystem driver +[4] sub_8216EA68 +0x3c main() +[5] entry_point +0x60 CRT entry +``` + +**Dispatcher address: `0x828F3D08`** (image rdata). + +Confirmed two ways: +1. Saved-r31 in create-stack frames 1 & 2: both `0x828F3D08`. +2. Static disassembly of `sub_82181750`: prologue does + `addis r11, r0, 0x828F; addi r31, r11, 15624` ⇒ r31 = + 0x828F0000 + 0x3D08 = `0x828F3D08`. The function then writes: + - `[this+0] = -1` (sentinel — **not a vtable**, confirms POD struct) + - `[this+4..12] = 0` + - `[this+20] = 0` (halfword) + - `[this+36] = 0` + - `[this+40] = 7` (count) + - `[this+44]` onwards: 256-byte sub-region memset/init via + `bl 0x8284DCEC` (with r3=&this[44], r4=256) + - `[this+72] = thread_handle` (set after `bl 0x82172370` thread spawn) + - `[this+76] = event_handle` (= handle 0x100c, set after `bl 0x824A9F18`) + - `[this+88..104] = 0` +3. Worker function `sub_82181830` receives r3 = `this`, immediately + calls `silph::Thread::SetProcessor(CURRENT, 5)`, copies r28 = this + and r29 = &this[44], then `lwarx`/`stwcx.` on `&this[80]` + (atomic). The wait-side telemetry confirms: at park time, the + spilled r29 (= base of this) is exactly `0x828F3D08`. + +**`[this+0] = -1` is decisive**: this is a hand-rolled **job queue +struct**, not a C++ polymorphic class. There is no vtable. RTTI cannot +help. The "class name" doesn't exist in MSVC mangled form. + +### Handle 0x15e0 (tid=16, Event/Manual, distinct cluster) + +**Created stack:** +``` +[0] sub_824A9F18 +0x54 +[1] sub_821701C8 +0x48 per-instance ctor — SETS this = 0x828F4070 +[2] sub_8216F600 +0x5c single-call bridge ctor +[3] sub_8217076C +0x8c subsystem driver +[4] sub_821C53FC +0x1c caller of subsystem driver +[5] sub_82172D24 +0x68 deeper caller +``` + +**Dispatcher address: `0x828F4070`** (image rdata). + +Confirmed via `xrefs` table: +- `sub_8216F618 +0x38` and `+0x5c` (the bridge ctor) reference 0x828F4070. +- `sub_821701C8 +0x1c` and `+0x168` (the per-instance ctor) reference + 0x828F4070. +- `sub_8280C2C0 +?` (CRT init driver) — same pattern. + +Same structural shape as 0x100c (POD job queue, not a C++ class). + +### Handle 0x1004 (tid=10, Event/Manual, 8-instance pool) + +**Created stack:** +``` +[0] sub_824A9F18 +0x54 +[1] sub_821783D8 +0x120 per-instance ctor (calls Event ctor) +[2] sub_8217C850 +0x58 single-call bridge ctor (per pool element) +[3] static ctor at 0x8280F810 +0x14 calls sub_8217C850 EIGHT times +[4] sub_824ACB38 +0xb8 CRT static-init driver (walks 0x82870010..0x828708d4) +[5] entry_point +0x60 CRT entry +``` + +**Dispatcher addresses:** EIGHT instances. Pre-AUDIT-003 we expected +saved-r31 to give us the pool-member's `this`, but the actual +saved-r31 chain at create time shows: +``` +frame=0 live r31=0x00000000 r30=0x00000000 r3=0x700ffb00 +frame=1 saved-r31=0x700ffb40 saved-r30=0x00000000 ← stack-relative, not this +frame=2 saved-r31=0x700ffbd0 saved-r30=0x00000000 ← stack-relative +frame=3 saved-r31=0x82870180 saved-r30=0x00000000 ← CRT iterator pointer +frame=4 saved-r31=0x82870180 saved-r30=0x00000000 ← same iterator +frame=5 saved-r31=0x700ffd10 saved-r30=0x00000000 ← stack-relative +``` + +The MSVC ctors `sub_821783D8` and `sub_8217C850` did **not** preserve +`this` in r31 across the call into `silph::Event::Ctor`. They appear +to have used r3 directly with no save. (Many MSVC tail-call-shaped +ctors do this.) The `0x82870180` value at frames 3 & 4 is the CRT +init-fn iterator pointer (the static-init driver walks +0x82870010..0x828708D4 — and 0x82870180 is inside that range). + +**Pool-member `this` addresses:** require offline analysis. Look at +`sub_8217C850`'s prologue — it receives `this` as r3 and is called 8 +times by the static ctor at `0x8280F810`. The `this` values would be +8 distinct heap or BSS addresses; one of them owns handle 0x1004. + +### Handle 0x42450b5c (tid=6, AUDIT_BLIND) + +Separate bug class, not silph::Event. Tid=6 parks at sub_824CD4F4 +via a non-`do_wait_single` path. `this` not in image range — +`0x42450b5c` is in user heap (0x4xxxxxxx), so the dispatcher is +heap-allocated, not a static. Stack at park time has no useful saved +regs. **Track separately. Do not bundle with the silph::Event hunt.** + +## Producer hunt deliverable — DECISIVE + +`xrefs` table interrogated for each dispatcher base: + +``` +0x828F3D08 (handle 0x100c) — 4 references: + pc=0x82180100 in sub_821800D8 (kind=ref) ← bridge ctor; load constant + pc=0x8218176c in sub_82181750 (kind=ref) ← per-instance ctor; load + pc=0x82181778 in sub_82181750 (kind=write) ← per-instance ctor; init [this+0] + pc=0x8284caa4 in sub_8280C2C0 (kind=ref) ← CRT init driver; ptr-to-init-fn + +0x828F4070 (handle 0x15e0) — 5 references: + pc=0x8216f650 in sub_8216F618 (kind=ref) ← bridge ctor + pc=0x8216f674 in sub_8216F618 (kind=ref) + pc=0x821701e4 in sub_821701C8 (kind=ref) ← per-instance ctor + pc=0x82170330 in sub_821701C8 (kind=ref) + pc=0x8284c9a4 in sub_8280C2C0 (kind=ref) ← CRT init driver +``` + +**EVERY xref is in a ctor or the CRT.** No producer code references +either dispatcher. Confirms AUDIT-001/002's `signal_attempts=0` +finding: the producer is unreached — the call chain that *would* +write to the queue and signal the event simply never runs. + +Note: this only catches references to the dispatcher *base*; producer +code that operates via an offset register (after a function-arg pass) +won't show up here. But for the basic producer-pattern case +(`load_const dispatcher_addr; call submit(this, work)`), this xref +audit is conclusive. + +## Recommendations for next session + +1. **Don't pivot to a fix.** AUDIT-003's job was identification, not + resolution. The diagnostic is in place; the data is captured. + +2. **Find the missing call sites.** Producer code likely operates as: + ``` + void Submit(JobQueue* q, ...) { q->push(); KeSetEvent(q->event); } + ``` + The CALLER of Submit holds the queue pointer in r3 (or another + register from a function arg). To identify these: + - Search the binary for any `lis/addis r?, 0x828F` followed by an + `addi r?, r?, 0x3D??` or `0x40??` pattern within a few + instructions — those are constant-loads of a dispatcher address. + - Cross-check with the existing xrefs table: a ctor's xrefs include + the constant-loads; a producer's xrefs would as well, but right + now there are none, which means the dispatcher pointer is plumbed + through function args, not loaded directly. + - Alternative: wrap `KeSetEvent` / `NtSetEvent` with `--trace-event-set` + printing live `r3` on entry. If a wake on handle 0x100c never + fires, no instrumentation will help — confirms unreachable. + +3. **Find what should call `Submit`.** This is a high-level question: + what game subsystem feeds work to handle 0x100c's worker (per- + instance ctor at sub_82181750, called from main() via + sub_82181C20)? The chain `main() → sub_82181C20 → sub_82181750` + suggests `sub_82181C20` is a subsystem driver — it constructs the + queue and presumably should also wire it up to a feeder. If the + feeder is itself a static-init that's never invoked, the trail + leads back to the CRT init array driver and whatever scheduling + subsystem is supposed to run those inits. + +4. **Handle 0x1004 follow-up.** Need to find the 8 pool-member `this` + addresses. Approach: hook the entry of sub_8217C850 (the bridge + ctor) under `--trace-handles-focus=0x1004` and capture r3 at each + call. Eight calls expected from the static ctor at 0x8280F810. + +## Verification + +- Tests: 581 → **586** green (5 new in `state.rs`: + `read_class_at_this_resolves_intact_rtti`, + `read_class_at_this_falls_back_when_rtti_stripped`, + `read_class_at_this_rejects_non_objects`, + `read_ascii_cstring_handles_termination_and_garbage`, + `probe_create_stack_classes_recovers_saved_r31_class`). +- Lockstep: `--stable-digest -n 100M` ⇒ `instructions=100000002` + (unchanged). +- Sylpheed n=50M oracle: passes. +- End-to-end: 500M-instruction `--trace-handles-focus` run captured + in `xenia-rs/audit-runs/audit-003/run-500m-v4.txt`. RC=0. + +## Engineering gotchas worth remembering + +- **Saved-register layout in MSVC PPC is NOT a clean `__savegprlr_NN` + every time.** The "saved r31" slot at `[fp - 12]` will hold *some* + saved value, but mapping it back to a specific register is per- + function-prologue dependent. Read the raw value, then identify + registers via the function disassembly. The diagnostic's labels + ("saved-r31", "saved-r30") are heuristic; the *values* are gold. +- **`__savegprlr` and similar helpers are themselves functions called + via `bl` from prologues** (e.g. `sub_82181830` calls `bl 0x825F0F7C` + before its `stwu`). They use the parent's r1 to spill, then return + via a stub. +- **MSVC's RTTI false positive: CRT init iterator.** When `r31` holds + a pointer into the CRT init-fn array (e.g. `0x82870180`), that + pointer's first u32 is the address of the next static ctor — which + IS in the image range, so the naive probe accepts it as a vtable. + But the "first virtuals" are then PPC instruction words from the + ctor's prologue (e.g. `0x7D8802A6` = `mflr r12`), which are NOT in + image range. The two-virtuals-must-be-image-range guard rejects this. +- **`[this+0] = -1` (= 0xFFFFFFFF) is a strong signal "not a C++ + object"**. Any object with a vtable would have `[this+0]` in image + range. The probe treats this correctly via `is_likely_image_ptr`. +- **POD job queues are common in this codebase**. silph::Event is a + C++ object (so silph::Event::Wait is a method call), but the + *queue / dispatcher / pool that owns the event* is often hand-rolled + POD. Don't look for class names where there are none. + +## Files + +- `crates/xenia-kernel/src/state.rs` — added `ClassReadout`, + `read_class_at_this`, `probe_create_stack_classes`, helpers + + 5 unit tests. +- `crates/xenia-kernel/src/audit.rs` — added `created_class_probes` + field on `HandleAuditTrail`, `record_create_with_stack_and_probes`. +- `crates/xenia-app/src/main.rs` — `dump_thread_diagnostic` now takes + `&GuestMemory`; FOCUS report prints WAIT-THREAD blocks with per- + frame stack walks + class probes. +- `audit-runs/audit-003/run-500m-v4.txt` — captured trace output. +- `audit-findings.md` — KRNBUG-AUDIT-003 entry. diff --git a/migration/claude-memory/project_xenia_rs_audit_004_ctor_probe_2026_05_04.md b/migration/claude-memory/project_xenia_rs_audit_004_ctor_probe_2026_05_04.md new file mode 100644 index 0000000..04f62b1 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_004_ctor_probe_2026_05_04.md @@ -0,0 +1,241 @@ +--- +name: KRNBUG-AUDIT-004 ctor-probe PC hook + dispatcher struct dump; producer indirection layer identified; 8-pool hypothesis falsified +description: 2026-05-04 — read-only --ctor-probe + --dump-addr diagnostics. Inner per-instance ctors fire EXACTLY ONCE each, proving handles 0x1004/0x100c/0x15e0 are SINGLETONS not pools. Producer indirection IS the Meyers singleton-getter; non-create-chain consumers identified at 9 sites for next-session producer hunt. Master HEAD 6a070be. +type: project +originSessionId: 2e7652bb-164b-4638-ae0f-a2da9ee68dac +--- +# KRNBUG-AUDIT-004 — `--ctor-probe` + `--dump-addr` diagnostics + +**Status**: landed on master at `6a070be` (no-ff merge of feature +branch `dispatcher-probe-audit/p0-ctor-probe-and-struct-dump` over +KRNBUG-AUDIT-003 merge `48eed25`). Read-only, lockstep +`instructions=100000002` preserved bit-exact. **Tests: 586 → 588**. + +## What landed + +**`crates/xenia-kernel/src/state.rs`:** +- `pub ctor_probe_pcs: HashSet` (default empty). +- `pub fire_ctor_probe_if_match(hw_id, mem)` — fast-rejects on empty + set; on PC match prints one `CTOR-PROBE pc=... tid=... hw=... + cycle=... sp=... r3=... lr=...` line plus 8-frame back-chain with + saved-r31/r30 per frame. +- `pub dump_addrs: Vec` for end-of-run struct dumps. +- 2 unit tests on the empty-set / PC-match invariants. + +**`crates/xenia-app/src/main.rs`:** +- `--ctor-probe=0xPC,0xPC,...` CLI flag (and `XENIA_CTOR_PROBE`). +- `--dump-addr=0xADDR,...` CLI flag (and `XENIA_DUMP_ADDR`). Each + address gets a 128-byte hex + be32 + ASCII dump after the FOCUS + report. +- `worker_prologue` calls `fire_ctor_probe_if_match` after reading + `pc` and before any thunk-dispatch / step-block branch. + +## Decisive findings — corrects KRNBUG-AUDIT-002/003 + +### 1. The "8-instance pool" for handle 0x1004 is FALSE + +Probe ran at `-n 50M --halt-on-deadlock --ctor-probe=0x821783D8, +0x82181750,0x821701C8` (the per-instance ctors for handles +0x1004 / 0x100c / 0x15e0 respectively). **Each fires EXACTLY ONCE**: + +``` +CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0 ← handle 0x1004 +CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08 ← handle 0x100c +CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070 ← handle 0x15e0 +``` + +So **handle 0x1004 has a SINGLE dispatcher at `0x828F3EC0`**, not 8 +pool members. The earlier "called 8 times" claim from AUDIT-002/003 +came from raw-counting OUTER getter `sub_8217C850` entries — but that +outer is itself a Meyers-style singleton getter (gates `bl +0x821783D8` on `[0x828F48D8] bit 0`); only the FIRST entry cascades +through to the per-instance ctor. Subsequent entries return the +existing slot pointer and are no-ops for our purposes. + +### 2. Producer indirection IS the Meyers singleton-getter + +Static byte-scan of `.rdata` and `.data` (PE at +`Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).pe`) for +the 4-byte BE encodings of 0x828F3D08 / 0x828F4070 yields **0 hits** +in either section. So no static registry table holds these +addresses. But the `xrefs` table for the outer getters +(`sub_821800D8` for 0x100c, `sub_8216F618` for 0x15e0) reveals 5–6 +callers each, all sharing the canonical producer pattern: + +```asm +bl outer_singleton_getter ; r3 = dispatcher ptr (returned) +lwz r3, OFFSET(r3) ; r3 = field at OFFSET (event handle / wake target) +bl 0x824AA1D8 ; signal/wake function +``` + +OFFSETS: 80 (= 0x50) for 0x100c, 36 (= 0x24) for 0x15e0. + +**Non-create-chain consumer sites** (these are the producer-hunt +targets for the next session): + +``` +sub_821800D8 (outer for 0x828F3D08, handle 0x100c) — 5 producer callers: + 0x821802d8 (sub_82180158+0x180) + 0x821806e0 (sub_821805C8+0x118) + 0x82180b28 (sub_82180A10+0x118) + 0x82180ea0 (sub_82180D90+0x110) + 0x82181254 (sub_821810E0+0x174) + ;; 0x82181c54 (sub_82181C28+0x2C) is the create-chain (skip) + +sub_8216F618 (outer for 0x828F4070, handle 0x15e0) — 4 producer callers: + 0x8216f9d4 (sub_8216F818+0x1BC) + 0x8216fc08 (sub_8216F9F0+0x218) + 0x821700b8 (sub_8216FF70+0x148) + 0x821700f4 (sub_821700E0+0x14) + ;; 0x821707f4 (sub_821707C0+0x34) is the create-chain (skip) +``` + +So the AUDIT-003 xref-audit conclusion ("every reference is in a +ctor or the CRT") was correct in the literal sense (every direct +dispatcher-base xref) but missed the **indirection layer**. The +producers don't reference 0x828F3D08 / 0x828F4070 directly — +they call the outer getter and dereference the returned pointer. +**Interpretation (2) of the audit charter is confirmed.** + +### 3. Dispatcher struct layouts (128-byte dumps) + +``` +0x828F3D08 (handle 0x100c, ctor sub_82181750): + +0x00 = 0xFFFFFFFF ; queue head/tail sentinel + +0x28 = 0x00000007 ; capacity = 7 + +0x2C = 0x01000000 ; init flag (BE) + +0x3C = 0xFFFFFFFF ; secondary sentinel + +0x48 = 0x00001010 ; thread_handle (worker) + +0x4C = 0x0000100C ; event_handle (= self) + +0x50 = 0x00000000 ; ← producer reads this (currently 0) + +0x70 = 0x00000001 ; refcount? + +0x74 = 0x828F3D08 ; self-pointer + +0x828F4070 (handle 0x15e0, ctor sub_821701C8): + +0x00 = 0x01000000 ; init flag + +0x10 = 0xFFFFFFFF ; queue sentinel + +0x1C = 0x000015E4 ; sibling-handle (NOT in our parked set) + +0x20 = 0x000015E0 ; event_handle (= self) + +0x24 = 0x00000000 ; ← producer reads this (currently 0) + +0x40 = 0xFFFFFFFF ; secondary sentinel + +0x828F3EC0 (handle 0x1004, ctor sub_821783D8): + +0x00 = 0x01000000 ; init flag + +0x10 = 0xFFFFFFFF ; queue sentinel + +0x20 = 0x40541BC0 ; heap pointer (sub-buffer #1) + +0x30 = 0x00000014 ; size 20 + +0x34 = 0x0000002F ; size 47 + +0x3C = 0x40211CA0 ; heap pointer (sub-buffer #2) + +0x44 = 0x405418C0 ; heap pointer (sub-buffer #3) + +0x50 = 0x40111840 ; heap pointer (sub-buffer #4) + +0x58 = 0xFFFFFFFF / +0x5C = 0xFFFFFFFF ; sentinels + +0x76 = 0x000012AC ; possibly thread id + +0x78 = 0x00001004 ; event_handle (= self) +``` + +The 0x1004 dispatcher is **structurally distinct** — it owns 4 +guest-heap (0x4xxxxxxx) sub-buffers, suggesting a richer +resource-managing subsystem rather than a pure POD job queue. +Each handle's struct uses a different event-handle offset +(0x4C / 0x20 / 0x78), so they are NOT instances of a shared +base class — three distinct subsystem types. + +## Reproduce + +```bash +cargo run --release -p xenia-app -- exec 'sylpheed.iso' \ + --halt-on-deadlock \ + --trace-handles-focus=0x1004,0x100c,0x15e0 \ + --ctor-probe=0x821783D8,0x82181750,0x821701C8 \ + --dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \ + -n 50000000 +``` + +Trace files: +- `audit-runs/audit-004/run-50m-probe.txt` (outer-getter PC probes — many hits per session) +- `audit-runs/audit-004/run-50m-probe-v2.txt` (inner-ctor probes — 1 hit each, singleton confirmation) + +## Recommendation for next session (do not implement a fix) + +**Hook the 9 non-create-chain consumer sites** to determine which +of two failure modes is actual: + +```bash +cargo run --release -p xenia-app -- exec 'sylpheed.iso' \ + --halt-on-deadlock \ + --trace-handles-focus=0x1004,0x100c,0x15e0 \ + --ctor-probe=0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\ +0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4 \ + -n 500000000 +``` + +- **Failure mode A (producer never reached)**: none of the 9 PCs + fire. The producer chain is gated upstream — likely a feature + flag, init phase, or RPC handler that never executes. Trail + leads back to the function-pointer table being walked or the + scheduler-driven event source. +- **Failure mode B (producer fires but signals zero)**: some PCs + fire. The dispatcher field at the producer's offset (+0x50 for + 0x100c; +0x24 for 0x15e0) was never populated, so `lwz r3, + OFFSET(r3)` reads zero and `bl 0x824AA1D8` is called with + handle=0. Then the next bug is "who SHOULD populate that field + and doesn't" — read the OWNING subsystem's setup path. + +Both failure modes are answerable in one read-only probe run. + +For handle 0x1004's separate-track investigation: dispatcher +0x828F3EC0 has 4 heap sub-buffers in 0x4xxxxxxx range. The pool +of buffers (not the singleton dispatcher itself) might be the +"8 instances" intuition that drove AUDIT-003's mislabel. Worth +dumping each sub-buffer to see if THEY look like 8 elements. + +## Engineering gotchas worth remembering + +- **Meyers-singleton getters look identical to per-instance + ctors** when probed naively. The trick is the init-flag check + at the top: `lwz r11, FLAG; rlwinm r9, r11, 0, 31, 31; cmpli + cr6, 0, r9, 0; bc 4, 4*cr6+eq, RETURN_PATH`. The first call + passes, sets the flag, falls through to the inner ctor. All + subsequent calls hit the bypass branch. Counting outer-getter + entries (= "called 8 times") is meaningless; only the first is + a real construction event. +- **xref-table audit is necessary but NOT sufficient.** It + catches direct constant-loads of an address but misses + function-pointer indirection (singleton getters, vtable + dispatch, registered-callback tables). When a static-analysis + audit says "no producer references X", it actually means "no + direct constant-load reference"; you still need to check + whether functions that RETURN X have additional callers. +- **Static-byte-scan of .rdata / .data is fast and cheap to + rule out hidden registry tables.** The Python recipe used + here: + ```python + with open('image.pe', 'rb') as f: data = f.read() + needle = struct.pack('>I', 0x828F3D08) + while True: + i = blob.find(needle, start) + if i < 0: break; ... + ``` + In sub-second on this image. If this returns hits at non-self + addresses, you have a registry indirection in addition to (or + instead of) any singleton-getter pattern. +- **Probe on `bl` arrival**, not on `bl` source PC. We probe on + the FIRST instruction of the ctor function (e.g. 0x821783d8), + capturing live r3 = `this`. That works because PPC `bl` + passes the first arg in r3 by ABI. If you probe on the bl + call site instead, you have to read forward in the + instruction stream to find what r3 is at the call point. + +## Files + +- `crates/xenia-kernel/src/state.rs` — `ctor_probe_pcs`, + `dump_addrs`, `fire_ctor_probe_if_match`, 2 unit tests. +- `crates/xenia-app/src/main.rs` — `--ctor-probe` / + `--dump-addr` CLI parsing, prologue hook, end-of-run dumper. +- `audit-findings.md` — KRNBUG-AUDIT-004 entry (after KRNBUG- + AUDIT-003). +- `audit-runs/audit-004/run-50m-probe.txt` — outer-getter + probe (28 hits showing many-call pattern of sub_8217C850). +- `audit-runs/audit-004/run-50m-probe-v2.txt` — inner-ctor + probes (3 hits total, one per per-instance ctor — singleton + hypothesis confirmed). diff --git a/migration/claude-memory/project_xenia_rs_audit_005_priv_stub_2026_05_04.md b/migration/claude-memory/project_xenia_rs_audit_005_priv_stub_2026_05_04.md new file mode 100644 index 0000000..855f40d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_005_priv_stub_2026_05_04.md @@ -0,0 +1,141 @@ +--- +name: KRNBUG-AUDIT-005 — XexCheckExecutablePrivilege stub gates init flow +description: 2026-05-04 audit. PC-probe + canary kernel-call diff identified XexCheckExecutablePrivilege returning 0 as the upstream gate that skips XGetAVPack, XeCrypt*, XamTaskSchedule init flow — explains why producers for parked-waiter handles 0x1004/0x100c/0x15e0 are unreached. +type: project +originSessionId: 3ca68be8-9f51-47ec-8f5f-1227affcc3d7 +--- +# KRNBUG-AUDIT-005 (2026-05-04, LANDED master) + +## What landed + +Combined-session diagnostic per the user prompt: + +1. **`--pc-probe=PC[@DISPATCHER:OFFSET][,...]` extension** — generalized + the existing `--ctor-probe` machinery so each token can additionally + capture the dispatcher field that the `bl outer_getter; lwz r3, + OFF(r3); bl 0x824AA1D8` producer pattern is about to read. Reused + `parse_hex_u32`. `--ctor-probe` and `--pc-probe` now share parser + + storage (single helper, no duplication). New env var + `XENIA_PC_PROBE` is an alias for `XENIA_CTOR_PROBE`. +2. **Reused existing `probe_calls` trace target** — no new + kernel-call tracing infrastructure needed. The pre-existing + `tracing::trace!(target: "probe_calls", ...)` in + `state.rs::call_export` produces a kernel-call sequence comparable + to canary's once filtered through `--log-filter='probe_calls=trace'`. + +## Decisive findings + +### α confirmed: producer code path never reached + +All 9 non-create-chain consumer call sites for handles 0x100c (5 +sites) and 0x15e0 (4 sites) — the canonical producer pattern from +KRNBUG-AUDIT-004 — fire **0×** at -n 500M +(`grep -c CTOR-PROBE audit-runs/audit-005/ours.log == 0`). Failure +mode (B: lwz reads zero) and (3: wake function called with stale +handle) are RULED OUT. The bug is upstream. + +### Upstream divergence located: `XexCheckExecutablePrivilege` stub + +Set-diff of kernel-call sequences (canary as oracle, ours from -n +500M) shows **11 exports canary calls and we don't**: + +``` +ExTerminateThread (×2), KeReleaseSemaphore (×268), KeResetEvent (×1), +NtDeviceIoControlFile (×2), ObCreateSymbolicLink (×1), XGetAVPack (×1), +XamTaskCloseHandle (×1), XamTaskSchedule (×1), +XamUserReadProfileSettings (×2), XeCryptSha (×1), +XeKeysConsolePrivateKeySign (×1) +``` + +**`XGetAVPack` has exactly one caller**, `0x824AB5A0` inside +`sub_824AB578`. Disasm: + +``` +824ab58c addi r3, r0, 10 ; privilege bit 10 +824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege +824ab598 cmpli 0, r3, 0x0 +824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block +``` + +`crates/xenia-kernel/src/exports.rs:193` registers +`XexCheckExecutablePrivilege` as `stub_return_zero`. Always returning +0 → guest takes the `bc 12, eq` branch → skips XGetAVPack + +XeCryptSha + XeKeysConsolePrivateKeySign + ObCreateSymbolicLink + +XamUserReadProfileSettings + the long `NtWriteFile` save-data block. + +The OTHER caller (`sub_824A9710` at `0x824A99A0`) queries privilege +**11** with **opposite polarity** (`bc 4, eq` = bne) — both stubs +returning 0 means the guest walks the wrong arm of every +privilege-gated branch. This is the gate to `XamTaskSchedule` and +the XAM init flow that AUDIT-002 identified as a producer +candidate. + +### Cascade explanation + +The dispatcher structs at `0x828F3D08`, `0x828F4070`, `0x828F3EC0` +have their per-instance ctors fired (KRNBUG-AUDIT-004 verified — +each fires once). But the dispatcher *fields* the producer is +about to read remain zero (`[0x828F3D08+0x50] = 0`, +`[0x828F4070+0x24] = 0` from AUDIT-004 dumps). Now we know why: +the **producers** that would populate those fields with a non-zero +handle never execute, because the upstream init flow (gated by +the privilege checks) is skipped. The ctor sets up the dispatcher +struct shell; the producer (somewhere in the +XGetAVPack/XeCrypt/XamTaskSchedule chain) populates it with the +worker-event handle. We never reach the producer. + +### Note on canary log filtering + +Canary's config has `log_high_frequency_kernel_calls = false`. The +"called in OURS but not canary" side of the diff (23 entries, headed +by `NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter +difference, **not** a bug surface. Always work from the directionally +meaningful side: "called in CANARY but not OURS". + +## Verification + +- 588 tests pass before and after. +- Lockstep golden `sylpheed_n50m` matches: `run digest matches + golden` at -n 50M `--stable-digest`. +- The new `pc_probe_consumers` field is empty by default; existing + ctor-probe tests cover the shared infrastructure. + +## Next session — recommendation + +Replace `XexCheckExecutablePrivilege` stub with a real impl: + +1. Parse `XEX_HEADER_EXECUTION_INFO` privilege bits at XEX load + time into `KernelState` (or surface via existing VFS XEX metadata). + See `crates/xenia-xex/` for the XEX header parser; `KernelState` + already holds `image_base`. +2. `xex_check_executable_privilege(priv_id)`: return 1 if bit + `priv_id` is set in the title's privilege bitmask. Encoding: + `PrivilegeFlags[priv_id / 8] & (1 << (priv_id % 8))` — match + canary's reading. +3. Re-run `audit-runs/audit-005/diff.py`. Expect `XGetAVPack`, + `XamTaskSchedule`, `XeCryptSha`, etc. to appear in our sequence. +4. Re-run with the 9-PC probe armed at -n 500M. At minimum the + ctor-probe trail changes; ideally producer sites start firing. +5. If producer sites fire, dispatcher fields populate (verify with + `--dump-addr=0x828F3D08,0x828F4070`). +6. Lockstep golden `sylpheed_n50m.json` will change — `imports` + counter rises, `swaps` may advance. Regenerate under + `--stable-digest` as the new anchor. + +## Files + +- `crates/xenia-kernel/src/state.rs` — `pc_probe_consumers` field + + extended `fire_ctor_probe_if_match` (~12 added lines). +- `crates/xenia-app/src/main.rs` — `--pc-probe` clap alias + + `PC@DISP:OFF` parser (~20 added lines). +- `audit-runs/audit-005/canary.log` — copy of + `/home/fabi/xenia_canary_windows/xenia.log`. +- `audit-runs/audit-005/ours.log` — 838 MB / 5.6 M lines @ -n 500M. +- `audit-runs/audit-005/diff.py` — one-shot Python diff (set-diff + + first-divergence window). Deletable after the fix lands. +- `audit-findings.md` KRNBUG-AUDIT-005 — full deliverable. + +## Master HEAD after merge + +`451b3b2` — Merge canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade +(feature commit: `3e2fc1e`). diff --git a/migration/claude-memory/project_xenia_rs_audit_006_export_queue_2026_05_04.md b/migration/claude-memory/project_xenia_rs_audit_006_export_queue_2026_05_04.md new file mode 100644 index 0000000..4152f69 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_006_export_queue_2026_05_04.md @@ -0,0 +1,78 @@ +--- +name: KRNBUG-AUDIT-006 (canary-only export fix queue) +description: 2026-05-04 read-only audit; 7/7 canary-only exports classify REAL_BUT_UNREACHED → upstream gate is KRNBUG-IO-002 block-size mismatch +type: project +originSessionId: c7ccc670-90c6-409b-b2d4-71dd9a133de2 +--- +# 🎯 KRNBUG-AUDIT-006 (2026-05-04, READ-ONLY SESSION) + +**Headline:** the post-IO-001 canary-only export list is **identical** to +IO-001's snapshot (7 entries, no movement). All 7 classify as +REAL_BUT_UNREACHED or STUB_BUT_UNREACHED. Per stop conditions, the next +session must fix the **upstream gate**, not pull from this queue. + +## Pre-state + +- Master HEAD `556a8c3`. Tests 591 green. +- Trace: `xenia-rs/audit-runs/audit-006/ours.log` (692 MB, -n 500M, post-IO-001). +- Oracle: `xenia-rs/audit-runs/audit-006/canary.log` (348 KB, deterministic; canary `9467c77f0`). +- Diff: `comm -23` on extracted call names → 7 canary-only entries, identical to IO-001 list. + +## The 7 canary-only entries (all Tier 4) + +`XamTaskSchedule`, `XamTaskCloseHandle`, `KeResetEvent`, +`ObCreateSymbolicLink`, `KeReleaseSemaphore`, `ExTerminateThread`, +`XamUserReadProfileSettings`. 5 of 7 are real impls in our code; 2 +(`XamTaskCloseHandle`, `ObCreateSymbolicLink`) are `stub_success`. None +fires at -n 500M. All first calls in canary fall **after** line 1210 +(the `XamTaskSchedule(824A93C8, ...)` gate-pivot). + +## The gate (Tier 0 — what next session works on) + +**KRNBUG-IO-002 — `nt_query_volume_information_file` block size** +(`crates/xenia-kernel/src/exports.rs:1241-1269`). + +- Class=3 returns `(sectors=1, bytes_per_sector=2048)` → alloc_unit=2048. +- Sylpheed expects 65536 (`main(1, 0x10000, 0xFF000)`); game's + `sub_824ABA98` (VerifyDirBlockSize) silently rejects, propagates failure + to `sub_824A9710`, which exits before its `XexCheckExecutablePrivilege(0xB)` + + `XamTaskSchedule` call sites. Confirmed: our `XexCheckExecutablePrivilege` + count = 1 (priv=0xA only); canary count = 2 (priv=0xA + 0xB). +- Canary's NullDevice (`xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`) + returns `(0x80, 0x200)` = 65536 — the value Sylpheed expects. +- **Fix:** two-line value change in the class=3 branch + (`sectors=128, bytes_per_sector=512`). +- **Expected cascade post-fix:** XamTaskSchedule fires → Cache0 callback + thread spawns → `ObCreateSymbolicLink` + `ExRegisterTitleTerminateNotification` + + `KeResetEvent(0x8287094C)` + `ExCreateThread(entry=0x82181830, + ctx=0x828F3D08)`. The latter is the worker for **dispatcher 0x100c** + (one of the four parked-handle producers). Closing IO-002 should drop + the 7-entry list to 0 or near-0 and finally advance handle 0x100c's + signal_attempts off zero. + +## Surprises / corrections to prior beliefs + +- **`0xC000014F` from IO-001 memory's prediction has not appeared in + ours.log.** First cache-related error is `0xC0000034` + (OBJECT_NAME_NOT_FOUND) from `lr=0x824a97e4`. The recreate path + completes its 44 NtWriteFile calls; the failure is *game-side* in + the verifier, not a kernel-returned NTSTATUS. Gate hypothesis still + holds; the specific status code in IO-001 memory was speculative. +- **No new canary-only exports surfaced post-IO-001.** The cascade has + not opened any new boot territory since IO-001 landed. +- **Tier 1 / 2 of the queue are empty.** This is the expected outcome + per the spec's stop conditions, not a defect of the audit. + +## Deliverable + +`xenia-rs/audit-runs/audit-006/canary_export_queue.md` (216 lines) +documents the classification, the gate, fix sketches, and the +post-fix verification chain. **Do not** pull Tier 4 entries from it +before IO-002 closes. + +## Recommended next session + +KRNBUG-IO-002, one-shot (≤ 4 LOC). Re-run audit-006's diff after the +fix; expect canary-only count 7 → 0 (or near-0). Whatever new +canary-only entries surface (if any) become audit-007 input. Land +the producer-hunt finally. diff --git a/migration/claude-memory/project_xenia_rs_audit_007_branch_probe_2026_05_04.md b/migration/claude-memory/project_xenia_rs_audit_007_branch_probe_2026_05_04.md new file mode 100644 index 0000000..0a632e4 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_007_branch_probe_2026_05_04.md @@ -0,0 +1,145 @@ +--- +name: KRNBUG-AUDIT-007 — sub_824A9710 exit gate identified (NtDeviceIoControlFile FsCtl 0x74004 stub) +description: 2026-05-04. Branch-probe trace landed sub_824A9710 entry; r3=0xC0000034 captured at 0x824a97e0 (failure landing pad). Exit branch=0x824a9944 (post bl sub_824ABD88 first call). Root cause: NtDeviceIoControlFile registered as stub_success — game-side sub_824ABD88 reads [out_buf+8] from the 16-byte response of FsCtlCode=0x74004, finds zero, and assigns hardcoded 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND). Diagnostic only — no fix this session. +type: project +originSessionId: 63d3490e-a119-46d9-960e-f11684a5e2a2 +--- +🎯 **KRNBUG-AUDIT-007 (2026-05-04, READ-ONLY DIAGNOSTIC, branch `investigate-sub-824a9710/p0-branch-probe`)** — branch-probe instrumentation landed; runtime trace decisively identifies the exit gate downstream of IO-001/IO-002. The priv-11 site at `sub_824A9710:0x824a99a0` is gated by `NtDeviceIoControlFile(FsCtlCode=0x74004)` which is currently registered as `stub_success` at `crates/xenia-kernel/src/exports.rs:90`. + +## Decisive runtime evidence + +`audit-runs/audit-007/sub_824A9710-trace.log`: + +``` +BRANCH-PROBE pc=0x824a9aa0 tid=1 hw=0 cycle=5362945 r3=0x00000001 lr=0x8216eaa0 cr0=..E cr6=..E +BRANCH-PROBE pc=0x824a9128 tid=1 hw=0 cycle=5362957 r3=0x00000001 lr=0x824a9abc cr0=..E cr6=..E +BRANCH-PROBE pc=0x824a9710 tid=1 hw=0 cycle=5363003 r3=0x00000000 lr=0x824a9acc cr0=L.. cr6=..E +BRANCH-PROBE pc=0x824a97e0 tid=1 hw=0 cycle=5369559 r3=0xc0000034 lr=0x824a9940 cr0=L.. cr6=L.. +BRANCH-PROBE pc=0x824a9a98 tid=1 hw=0 cycle=5369562 r3=0x00000002 lr=0x824a97e4 cr0=L.. cr6=L.. +``` + +**Reading the trace:** + +- Cycle 5362945: `main()` (sub_8216EA68) calls `sub_824A9AA0(r3=1, r4=0x10000, r5=0xFF000)` at PC 0x8216ea9c, return at 0x8216eaa0. ✅ Function chain reached. +- Cycle 5363003: `sub_824A9710` entered with r3=0 (sub_824A9128 returned 0 — its inner sub_824A90A8 returned negative, expected for first-time cache load). +- Cycle 5369559: PC=`0x824a97e0` (failure landing pad), r3=`0xc0000034`, lr=`0x824a9940` (the `cmpi 0,r3,0` at PC after `bl 0x824ABD88`). +- Cycle 5369562: PC=`0x824a9a98` (epilogue) — function returns 0xC0000034 to caller `sub_824A9AA0`, which discards it. + +**Translation:** the `bl sub_824ABD88` at `0x824a993c` returned `0xC0000034` (STATUS_OBJECT_NAME_NOT_FOUND). The branch at `0x824a9944` (`bc 12, lt, 0x824A97E0`) was TAKEN, exiting the function before reaching the priv-11 query at `0x824a99a0`. + +**Cycle budget for path** (`5369559 − 5363003 = 6556` instructions): consistent with prologue + 2× memset + 3 early-exit checks + NtCreateFile (~thunk + HLE) + NtReadFile (~thunk + HLE, IO-001 path returns success+0) + magic mismatch → recreate setup at 0x824a98bc + sub_824ABD88 internals (sub_824ABC88 + NtOpenFile + NtDeviceIoControlFile×2 + NtClose). Maps cleanly onto the canary.log lines 1141-1208 prefix. + +## Mechanical chain to 0xC0000034 (cross-checked vs disasm) + +`sub_824ABD88` (the gate) full disasm at `0x824abd88-0x824ac184` (the function is 254 insns; sylpheed.db's `end_address=0x824abe3c` was a function-detection truncation — actual end is at `0x824ac184` confirmed by disassembling forward). + +Path through `sub_824ABD88`: + +1. `bl sub_824ABC88` at `0x824abda8` returns 0 (input ANSI string `\Device\Harddisk0\Cache0` mismatches the `\Device\Harddisk0\Partition1` literal at 0x820015A4 — early fast-path return inside sub_824ABC88). +2. `bl NtOpenFile` at `0x824abde0` (object name = caller's r31 = `\Device\Harddisk0\Cache0`, access=0x100003, options=0x18). Per our `open_vfs_file` synthesizes empty file → returns SUCCESS. +3. `bl NtDeviceIoControlFile` at `0x824abe1c` — first IOCTL: `(handle, 0,0,0, iosb=r1+112, fsctl=0x70000, 0,0, out_buf=r1+120, out_len=8)`. Stub returns SUCCESS, doesn't write OUT. +4. `lwz r11, 124(r1)` at `0x824abe3c` reads byte-pos within first OUT (= 0). cntlzw + subfic computes `r11 = 31 - 32 = -1`, but `bc 12, 4*cr6+eq, 0x824ABE54` takes the `r11==0` branch → `r11 = 0`. Stored in `r22 = 0`. +5. `bl NtDeviceIoControlFile` at `0x824abe90` — second IOCTL: `(handle, 0,0,0, iosb=r1+112, fsctl=0x74004, 0,0, out_buf=r1+160, out_len=16)`. Stub returns SUCCESS, doesn't write OUT. +6. `cmpi 0, r3, 0` / `bc 12, lt, 0x824ABEB8` — r3=0 ≥ 0, fall through. +7. **`ld r10, 168(r1)`** — loads doubleword from `[r1+160+8]` = upper 8 bytes of the 16-byte OUT buffer. Stub left it whatever it was → ZERO (fresh stack frame post-stwu). +8. `cmpi cr6, 1, r10, 0` (L=1, doubleword cmp). r10=0 → cr6.eq=1. +9. `bc 4, 4*cr6+eq, 0x824ABEB0` — BO=4 = branch-if-FALSE; cond=eq is TRUE; **does NOT branch** → falls through to: +10. **`addis r3, r0, 0xC000`** at `0x824abea8` followed by **`ori r3, r3, 0x34`** at `0x824abeac` — **r3 := 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND), HARDCODED**. +11. `cmpi cr6, 0, r3, 0` / `bc 4, 4*cr6+lt, 0x824ABECC` — r3=0xC0000034 negative, cr6.lt=1, BO=4 = branch-if-FALSE — does NOT branch → falls through to: +12. `or r28, r3, r3` (`0x824abeb8`) → save 0xC0000034 → `bl NtClose` (`0x824abec0`) → `or r3, r28, r28` (`0x824abec4`) → `b 0x824ABE34` (epilogue) — **return 0xC0000034 to caller**. + +Caller `sub_824A9710` at `0x824a9940`: cmpi shows r3 < 0, bc 12, lt at `0x824a9944` is TAKEN → branches to `0x824a97e0` (the failure landing PC the probe captured). + +## What canary does at this site + +`audit-runs/post-IO-002/canary.log` lines 1196-1208: + +``` +NtOpenFile(701CF390(00000000), 00100003, 701CF3C0(00000000,\Device\Harddisk0\Cache0,00000040), 701CF3A0, 00000000) +NtDeviceIoControlFile(F8000010, 00000000, 00000000, 00000000, 701CF3A0, 00070000, 00000000, 00000000, 701CF3A8, 00000008) +NtDeviceIoControlFile(F8000010, 00000000, 00000000, 00000000, 701CF3A0, 00074004, 00000000, 00000000, 701CF3D0, 00000010) +NtWriteFile × 17 (zero-fill 64KB cluster) +KeQuerySystemTime +NtWriteFile (commit final block) +undefined extern call to 8284E0DC IoDismountVolumeByFileHandle +NtClose +NtOpenFile(701CF3F0(00000000), 00100001, 701CF400(00000000,\Device\Harddisk0\Cache0\,00000040), 701CF3F8, 00000003) +NtQueryVolumeInformationFile(F8000010, 701CF3F8, 701CF410, 00000018, 00000003) +NtClose +XexCheckExecutablePrivilege(0000000B) ← priv-11 site fires +XamTaskSchedule(824A93C8, 828A28F0, 701CF4C0, 701CF4A4(00000000), ContextArg) +``` + +Canary's `NtDeviceIoControlFile` for FsCtlCode=0x74004 returns SUCCESS *and* populates the 16-byte OUT buffer at `701CF3D0`. The exact payload isn't logged, but the buffer's upper-8-bytes is non-zero (otherwise canary would also hit the 0xC0000034 trap), and from the post-IOCTL flow the value is consumed as a partition-size or address to drive the 17× NtWriteFile zero-fill loop. + +Per `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc` (NtDeviceIoControlFile_entry) + `xenia-canary/src/xenia/vfs/devices/null_device.{h,cc}` (NullDevice::IoControl), each device implements `IoControl` and writes the payload. FsCtlCode `0x74004` decodes to (DEVICE=`FILE_DEVICE_DISK`, FUNCTION=0x1001, METHOD=`METHOD_BUFFERED`) — likely `IOCTL_DISK_GET_PARTITION_INFO` or a Xbox-specific equivalent. FsCtlCode `0x70000` = (DEVICE=DISK, FUNCTION=0, METHOD=BUFFERED) — `IOCTL_DISK_GET_DRIVE_GEOMETRY` candidate. + +## Next-session fix target (KRNBUG-IO-003) + +**Where:** `crates/xenia-kernel/src/exports.rs:90`. Replace `stub_success` registration with a real `nt_device_io_control_file` handler. + +**Minimum viable fix:** + +The game-side check at `sub_824ABD88:0x824abe9c-0x824abeb4` only requires that **`[out_buf+8]` is non-zero** when FsCtlCode=0x74004. That alone moves us past the gate. A canary-faithful fix populates the full 16-byte OUT buffer; the minimum fix writes any non-zero doubleword at offset 8. + +**Canary-faithful fix:** + +Mirror canary's `NullDevice::IoControl` for FsCtlCode 0x74004 (and 0x70000 for the prior call). Reference: `xenia-canary/src/xenia/vfs/devices/null_device.cc`. The FsCtlCode signature dispatch should also fall through to `STATUS_NOT_IMPLEMENTED` for unknown codes (instead of silent SUCCESS) to surface future divergences early. + +**Effort:** small (~30-50 LOC for two FsCtlCode branches + canary structure-match fixture test). FsCtlCode value extraction + payload writing — no new types needed. + +**Risk:** low for the minimum fix (one IOCTL response). Medium for the full canary-faithful path (matches canary's NullDevice exactly, with possible regression risk for the 0x70000 path if any other site queries it). + +## Cascade prediction (sharp, falsifiable) + +Post-IO-003 fix, lockstep at -n 500M: + +- `XexCheckExecutablePrivilege` count: **1 → 2** (priv=0xA + priv=0xB). +- `XamTaskSchedule` count: **0 → 1** (callback=0x824A93C8, message=0x828A28F0). +- canary-only export count: **7 → ≤ 3**. The 4 entries that should drop: + - `XamTaskSchedule` (immediate fire). + - `KeResetEvent` (post-cluster, on `KeResetEvent(0x8287094C)`). + - `ObCreateSymbolicLink` (alt path at `sub_824A9710:0x824a9a6c` if priv-11 returns 0; or in the inner Cache0 cluster). + - `XamTaskCloseHandle` (cleanup after task completes). +- Worker thread spawn at `ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` — **handle 0x100c's parked-waiter producer fires**, advancing `signal_attempts` off zero for one of the three parked handles. +- `swaps`: **2 → 2** (renderer plateau is multi-causal — closing this gate doesn't directly unblock the renderer, but unparks one worker). + +## Falsification guards (what would invalidate the prediction) + +- **(α)** If KeResetEvent / ObCreateSymbolicLink / XamUserReadProfileSettings / etc. each have additional unimplemented kernel-call dependencies, the cascade stops at the first one. Detectable by re-running `--branch-probe` over `sub_824A9710` and observing a NEW exit branch (any of: 0x824a996c, 0x824a9998, 0x824a9a18) — OR a hit at sub_824ABA98's analogous failure path. +- **(β)** If `sub_824ABA98` (called at `0x824a9950` and `0x824a9990` in sub_824A9710) has its own unimplemented dependency, we'll exit at `0x824a9998` after the second sub_824ABA98 retry. +- **(γ)** If our `nt_write_file` doesn't handle the synthesized empty-file Cache0 path correctly (a write to a zero-byte file), the 17× NtWriteFile zero-fill in canary's flow may surface a fresh failure between IOCTL and priv-11. Probe will catch this as a hit at `0x824a9998` or downstream. + +## Files added / modified this session (instrumentation only — read-only) + +- `crates/xenia-kernel/src/state.rs` — added `branch_probe_pcs: HashSet` field + `fire_branch_probe_if_match(hw_id)` helper. Sister to `fire_ctor_probe_if_match` but emits a single compact `BRANCH-PROBE` line (no back-chain) including cr0/cr6 flags. ~40 LOC. +- `crates/xenia-app/src/main.rs` — added `--branch-probe` CLI flag (env-var `XENIA_BRANCH_PROBE`), parser, and call in `worker_prologue` after `fire_ctor_probe_if_match`. ~30 LOC. +- No state mutated; lockstep digest unchanged. 592 → 592 tests. Two reruns of `check -n 100000010 --stable-digest` produced bit-identical output (`audit-runs/audit-007/lock_post_branchprobe.json` ≡ `lock_post_branchprobe_run2.json` ≡ `audit-runs/post-IO-002/lock_n100m_run1.json`). + +## Trace artifacts (re-runnable) + +- `audit-runs/audit-007/sub_824A9710-trace.log` — 5 BRANCH-PROBE lines + thread diagnostics. +- `audit-runs/audit-007/sub_824A9710-trace.err` — full kernel-call trace + counter dump (635 lines). +- `audit-runs/audit-007/lock_post_branchprobe.json`, `lock_post_branchprobe_run2.json` — lockstep digest reruns. + +Re-run command: + +``` +PROBE_LIST="0x824a9aa0,0x824a9128,0x824a9710,0x824a9778,0x824a9788,0x824a9790,0x824a97dc,0x824a97e0,0x824a9824,0x824a9828,0x824a9840,0x824a9850,0x824a985c,0x824a9870,0x824a9880,0x824a9888,0x824a9918,0x824a9944,0x824a9958,0x824a996c,0x824a9998,0x824a999c,0x824a99a0,0x824a99a8,0x824a9a10,0x824a9a18,0x824a9a60,0x824a9a78,0x824a9a98" +./target/release/xenia-rs exec sylpheed.iso --halt-on-deadlock --branch-probe="$PROBE_LIST" -n 500_000_000 \ + > audit-runs/audit-007/sub_824A9710-trace.log 2> audit-runs/audit-007/sub_824A9710-trace.err +``` + +## Probe-machinery limitation noted + +The `--branch-probe` helper fires only at PC values that are block-heads (i.e., the start of an interpreter dispatch unit). Mid-block PCs in the request set don't trigger because the prologue runs once per block, not once per instruction. In this trace, the function entry, failure landing pads (`0x824a97e0`, `0x824a9a98`), and external-call return points get probed, while internal `bc` PCs are silent. The data captured was sufficient — the failure landing PC + LR pair uniquely identifies the upstream branch — but if a future audit needs every-instruction granularity, the call site needs to move from `worker_prologue` to inside the per-instruction step in `step_block`. + +## Cross-check vs prior memory + +- `project_xenia_rs_io_002_volallocunit_2026_05_04.md` listed three next-session-gate hypotheses. **Hypothesis 1 confirmed** (sub_824A9710 entry-side probe found the gate). **Hypothesis 2** (different info-class) was wrong direction — the gate is an IOCTL not an info-class. **Hypothesis 4** (different IOCTL) was on the right track — the FIRST IOCTL we saw in our log (FsCtlCode `0x70000+0x4004`) is exactly the gate IOCTL. +- `project_xenia_rs_audit_006_export_queue_2026_05_04.md` Tier-0 KRNBUG-IO-002 hypothesis (vol-info block size) was the wrong gate but the framework correctly classified the 7 canary-only exports as REAL_BUT_UNREACHED. The framework's stop condition triggered correctly. +- `project_xenia_rs_audit_005_priv_stub_2026_05_04.md` attribution of `sub_824ABA98 = VerifyDirBlockSize` and `sub_824ABD88 = MaybeMountAndIoctl` was provisional. **`sub_824ABD88 = MaybeMountAndIoctl` confirmed accurate** (it's the function that wraps the partition-init IOCTL flow). `sub_824ABA98` not yet exercised — no runtime evidence either way. + +## Stop condition + +Per task brief: "DO NOT IMPLEMENT A FIX. This session's job is to identify the exit branch and the responsible kernel call." ✅ Identified. Hand-off complete. diff --git a/migration/claude-memory/project_xenia_rs_audit_008_branch_probe_2026_05_05.md b/migration/claude-memory/project_xenia_rs_audit_008_branch_probe_2026_05_05.md new file mode 100644 index 0000000..57427cc --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_008_branch_probe_2026_05_05.md @@ -0,0 +1,140 @@ +--- +name: KRNBUG-AUDIT-008 — 0x100c worker IS spawned post-IO-003; gate is downstream job-submitter +description: 2026-05-05. Branch-probe at the post-priv-11 cluster decisively shows main spawns the 0x100c worker (tid=3, ctx=0x828F3D08, entry=0x82181830); IO-003 memory's "0x100c UNCREATED" claim was wrong. Real next gate is missing calls to the 5 non-create-chain job-submitters of sub_821800D8 (pattern `bl outer_getter; lwz r3, 80(r3); bl 0x824AA1D8`). Diagnostic only — no fix per discipline gate. +type: project +originSessionId: 60e12f9a-ffab-465e-8ec5-d5613371fa20 +--- +🎯 **KRNBUG-AUDIT-008 (2026-05-05, READ-ONLY DIAGNOSTIC)** — model reset on the IO-003 cascade. The "0x100c UNCREATED" claim from `project_xenia_rs_io_003_ioctl_2026_05_04.md` is **falsified**: the 0x100c worker IS spawned post-IO-003 as `tid=3` with `ctx=0x828F3D08, entry=0x82181830`, parked on its lifecycle event handle 0x1020 (signals=0). The actual next gate is downstream of the create chain, in the 0x82287000-0x82292FFF module range that owns the 5 job-submitters of sub_821800D8. + +## Decisive runtime evidence (audit-runs/audit-008/branch-probe.trace) + +The probe targeted main's post-XamTaskSchedule path, the 0x100c create chain, and the XamTaskSchedule callback. -n 100M, lockstep mode (matching IO-003 baseline `instructions=100000019`). + +``` +BRANCH-PROBE pc=0x824a9a14 tid=1 hw=0 cycle=5378562 -- main: post-XamTaskSchedule +BRANCH-PROBE pc=0x824a93c8 tid=2 hw=1 cycle=0 r3=0x828a28f0 lr=0xbcbcbcbc -- spawned thread enters callback +BRANCH-PROBE pc=0x824a9540 tid=2 hw=1 cycle=4232 -- spawned thread past StfsCreateDevice cmpi check +BRANCH-PROBE pc=0x824a9a44 tid=1 hw=0 cycle=5378576 -- main: post-KeWaitForSingleObject(0x8287094C) +BRANCH-PROBE pc=0x824a9a4c tid=1 hw=0 cycle=5378579 -- main: post-KeResetEvent +BRANCH-PROBE pc=0x824a9a98 tid=1 hw=0 cycle=5378596 -- main: sub_824A9710 epilogue +BRANCH-PROBE pc=0x824a9acc tid=1 hw=0 cycle=5378609 -- main: sub_824A9AA0 return +BRANCH-PROBE pc=0x8216eaa0 tid=1 hw=0 cycle=5378617 -- main: bl sub_82181C28 callsite +BRANCH-PROBE pc=0x82181c28 tid=1 hw=0 cycle=5378618 -- main entered sub_82181C28 +BRANCH-PROBE pc=0x821800d8 tid=1 hw=0 cycle=5378630 -- main entered sub_821800D8 (singleton getter for 0x100c) +BRANCH-PROBE pc=0x82181750 tid=1 hw=0 cycle=5378645 r3=0x828f3d08 -- main entered sub_82181750 ctor +BRANCH-PROBE pc=0x821817c0 tid=1 hw=0 cycle=5378712 r3=0x00001020 -- post-sub_824A9F18 (lifecycle event=0x1020 created) +BRANCH-PROBE pc=0x82181830 tid=3 hw=2 cycle=0 r3=0x828f3d08 lr=0xbcbcbcbc -- 0x100C WORKER SPAWNED AS tid=3 +BRANCH-PROBE pc=0x82181838 tid=3 hw=2 cycle=1 r3=0x828f3d08 lr=0xbcbcbcbc -- past entry thunk +BRANCH-PROBE pc=0x821817fc tid=1 hw=0 cycle=5378786 r3=0x00001024 -- main: post-bl sub_82172370, thread handle=0x1024 +BRANCH-PROBE pc=0x82180120 tid=1 hw=0 cycle=5378951 -- main: post-atexit registration +BRANCH-PROBE pc=0x82181c58 tid=1 hw=0 cycle=5378957 r3=0x828f3d08 -- main: bl sub_821800D8 returned +``` + +**Reading the trace:** + +- (1) tid=2 enters the XamTaskSchedule callback at `sub_824A93C8` with `r3=0x828a28f0` — exact match for canary's `XamTaskSchedule(callback=0x824A93C8, message=0x828A28F0, ...)` log line at `audit-runs/post-IO-002/canary.log:1210`. ✓ +- (2) tid=2 reaches `0x824a9540` (post-StfsCreateDevice cmpi) at cycle 4232. The branch at `0x824a9544` is `bc 4, lt, 0x824A956C` (BO=4 = branch-if-FALSE, cond=lt → branch if r3 ≥ 0). Stub returned 0 → branch TAKEN → cascade walks past to ObCreateSymbolicLink + ExRegisterTitleTerminateNotification + KeSetEvent + KeWaitForSingleObject(0x8287093C). Mirrors canary's F8000010 thread which **also stops at this same wait** (canary log lines 1217-1231 show F8000010 doing the same calls and then going silent). ✓ NOT a gate. +- (3) tid=1 (main) reaches `0x824a9a14` (post-XamTaskSchedule) at cycle 5378562 — only 6,557 cycles later than the spawned thread's callback entry. The spawned thread already signaled `KeSetEvent(0x8287094C)` so main's `KeWaitForSingleObject(0x8287094C)` at `0x824a9a40` returned immediately. +- (4) Main proceeds through the priv-11 cluster's tail: `KeWaitForSingleObject` (cycle 5378576) → `KeResetEvent(0x8287094C)` (cycle 5378579) → `sub_824A9710` epilogue at `0x824a9a98` (cycle 5378596) → `sub_824A9AA0` return (cycle 5378609) → main resume at `0x8216eaa0` (cycle 5378617). +- (5) Main enters `sub_82181C28` (Meyers singleton getter for `[0x828F3D98]` flag) at cycle 5378618. **First call → flag is zero → falls through to `bl sub_821800D8` at `0x82181c54`.** +- (6) `sub_821800D8` (Meyers singleton getter for the 0x100c dispatcher at `[0x828F3D78]` flag) at cycle 5378630. **First call → flag is zero → falls through to `bl sub_82181750` at `0x82180110`.** +- (7) `sub_82181750` (the actual constructor) entered at cycle 5378645 with `r3=0x828F3D08` (the dispatcher's `this`). +- (8) Inside the constructor, `bl sub_824A9F18` (lifecycle-event allocator) returns `r3=0x1020` (the parked-waiter handle). Stored at `[0x828F3D08+76]`. +- (9) `bl sub_82172370` (ExCreateThread wrapper, called at cycle ≈5378750) successfully spawns the worker — visible in the probe as `BRANCH-PROBE pc=0x82181830 tid=3 hw=2 cycle=0 r3=0x828f3d08 lr=0xbcbcbcbc`. **Decisive: tid=3 IS the 0x100c worker.** Confirmed by handle audit at the run end: `handle=0x00001020 Event(sig=false, mr=true) waiters(tid)=[3]` — same tid waiting on the lifecycle event whose value (0x1020) was returned by `sub_824A9F18` to the constructor. +- (10) Main's `sub_82172370` returns `r3=0x1024` (thread handle, distinct from the lifecycle event 0x1020) at cycle 5378786. Constructor finishes; getter returns. Boot continues. + +## Reset of model — corrections to KRNBUG-IO-003 memory + +The IO-003 prediction scorecard recorded: +- ✗ `0x100c worker spawn (UNCREATED → created+signaled)` — UNCREATED post-IO-003. +- ✗ `Worker thread spawn count: 19 → higher` — unchanged at 19. + +**Both are wrong.** The probe definitively shows the 0x100c worker IS created. The handle audit list in `audit-runs/post-IO-003/exec_trace_focus_500m.log` already shows `handle=0x00001020 Event(sig=false, mr=true) waiters(tid)=[3]` — the worker has BEEN there all along; the IO-003 audit just didn't connect tid=3 to the 0x100c dispatcher. + +The actual cascade post-IO-003 is **stronger** than IO-003 thought: +- 0x100c worker: spawned (tid=3, parked on handle 0x1020). +- 0x1004 worker: spawned (tid=11, parked on handle 0x1004) — also previously misclassified. +- 0x15e0 worker: spawned (tid=6, semaphore signal-pump live). + +What IS still broken: **none of the 3 parked workers have their lifecycle events signaled.** `signal_attempts=0` for handles 0x1004, 0x1020 (and 0x10c4 = 0x100c's secondary event). + +## Where the real gate lives + +`sub_821800D8` (the 0x100c singleton getter) has 6 callers via xrefs: + +| caller PC | from func | role | +|-----------|-----------|------| +| 0x82181c54 | sub_82181C28 | **create chain** (called by main, runs the ctor) | +| 0x821802d8 | sub_82180158 | job-submitter shim (pattern: `bl getter; lwz r3, 80(r3); bl sub_824AA1D8`) | +| 0x821806e0 | sub_821805C8 | job-submitter shim | +| 0x82180b28 | sub_82180A10 | job-submitter shim | +| 0x82180ea0 | sub_82180D90 | job-submitter shim | +| 0x82181254 | sub_821810E0 | job-submitter shim | + +Each of the 5 job-submitters is a 5-instruction leaf (`bl outer; lwz r3, 80(r3); bl sub_824AA1D8`) — the canonical "get-then-enqueue" pattern, where `sub_824AA1D8` is the universal dispatcher-submit primitive that ALSO signals the lifecycle event. + +`sub_824AA1D8`'s callers split into: +- 5 from the 0x100c shim cluster above +- 4 from the 0x15e0 shim cluster (sub_8216F618 outer getter, offset 36): `0x8216f9dc, 0x8216fc10, 0x821700c0, 0x82170514` +- 1 from inside `sub_82181750` itself at `0x82181924` (the ctor's own self-submit) + +The 5 job-submitter shims (sub_82180158, sub_821805C8, sub_82180A10, sub_82180D90, sub_821810E0) are themselves called from the **0x82287000-0x82292FFF module range**: + +| shim | shim callers (source func) | +|------|--------------------------| +| sub_82180158 | sub_82292838 (1 caller) | +| sub_821805C8 | sub_822878A8, sub_8228D760, sub_822900A8, sub_822919C8 (4 callers) | +| sub_82180A10 | sub_822878A8 (1 caller) | +| sub_82180D90 | sub_82292838 (1 caller) | +| sub_821810E0 | sub_8228FDB8, sub_82292838 (2 callers) | + +The 0x82287xxx-0x82292xxx range is a different code module from the cache/init code. It's almost certainly the renderer / scene-graph subsystem (call sites cluster around the post-XamContentCreateEnumerator content-load path that canary's log shows starting at line 1238). Boot must reach this module to submit its first job; we don't. + +## Failure-mode classification (audit-007 schema) + +The next gate is **β** (internal sub-recursion / unreached subsystem), NOT **α** (single stubbed import). The 0x100c create chain succeeds end-to-end; the 0x100c worker exists; what's missing is *use* of the 0x100c worker by code in 0x82287xxx-0x82292xxx. + +## Discipline gate evaluation (per task brief) + +| # | Condition | Pass? | +|---|---|---| +| 1 | Phase 1 named a single kernel/xam import as failing `bl` (branch α) | **NO** — gate is internal logic, not a stub | +| 2 | The import is on the canary-only list or verifiably wrong | N/A | +| 3 | Canary's impl is small (<80 LOC) | N/A | +| 4 | Sharp 4-dimensional cascade prediction | **NO** — would require deeper diagnosis to predict which submitter fires first and what it unblocks | +| 5 | No new ABI plumbing | N/A | + +**Gate fails on box 1 + 4. STOP. Hand back per discipline gate.** No code changes this session. + +## Open puzzles for next session + +1. **Find the FIRST submitter that should fire.** Canary's log between lines 1232 and EOF (5260) shows the post-XamTaskSchedule boot continuation. Specifically, the lines that suggest renderer/scene activation (the 0x100c module is likely renderer-side): + - Line 1238: `XamContentCreateEnumerator` (handle F8000028) — main thread. + - Lines 1240-1256: NtCreateSemaphore, cache:\ NtOpenFile + NtQueryVolumeInformationFile, ExCreateThread(entry=0x8245A5D0). + - Eventually scene/renderer should submit a job to the 0x100c dispatcher. +2. **Probe the candidates.** Add `--branch-probe=0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8` (the parent functions that contain the shim callsites) to a fresh -n 500M run. Whichever fires first identifies the active producer path; whichever fires LAST or never identifies the gate. +3. **Cross-check tid=2's wait target.** The handle-list dump shows `handle=0x8287093c` with waiter tid=2, matching disasm of sub_824A93C8:0x824a95dc-f4 (`addi r27, r11, 2364` with r11=0x82870000 → r27=0x8287093C). The earlier run-state dump's `WaitAny { handles: [2189887804=0x82870EBC] }` reading is a probable diagnostic-text artifact (display delta or stale wait-list field) — not a wrong wait target. **Verify before treating as a bug.** +4. **Wait-target 0x8287093C signals.** `KeSetEvent(0x8287093C)` would unpark tid=2 immediately. In canary's flow, this likely happens far downstream (after the renderer submits enough work that the cache-async-init machinery completes). Don't over-index on this — it's a SECONDARY wait, the primary cascade gate is the missing job-submitter. + +## Trace artifacts (re-runnable) + +- `audit-runs/audit-008/branch-probe.trace` — 17 BRANCH-PROBE lines (clean extract). +- `audit-runs/audit-008/probe-100m.log` — full stdout (8.7KB). +- `audit-runs/audit-008/probe-100m.err` — full stderr trace (215KB). + +Re-run command: + +``` +cd xenia-rs +PROBE_LIST="0x824a9a10,0x824a9a14,0x824a9a40,0x824a9a44,0x824a9a48,0x824a9a4c,0x824a9a98,0x824a9ac8,0x824a9acc,0x8216eaa0,0x82181c28,0x82181c40,0x82181c48,0x82181c54,0x82181c58,0x82181c88,0x821800d8,0x821800f0,0x821800fc,0x82180110,0x82180120,0x82180138,0x82181750,0x821817bc,0x821817c0,0x821817f8,0x821817fc,0x82181830,0x82181838,0x824a93c8,0x824a953c,0x824a9540,0x824a9580,0x824a95cc,0x824a95f4,0x824a95f8" +./target/release/xenia-rs exec "../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" \ + --halt-on-deadlock --branch-probe="$PROBE_LIST" -n 100000000 \ + > audit-runs/audit-008/probe-100m.log 2> audit-runs/audit-008/probe-100m.err +``` + +## Constraints honored + +- No code modifications (branch-probe machinery from KRNBUG-AUDIT-007 was sufficient). +- No git commit (no changes to commit). +- No backwards-compat shims. +- Probe machinery is read-only — does not perturb lockstep digest. diff --git a/migration/claude-memory/project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md b/migration/claude-memory/project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md new file mode 100644 index 0000000..f9a0c61 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md @@ -0,0 +1,127 @@ +--- +name: KRNBUG-AUDIT-009 — renderer cluster fully unreached at -n 500M; gate is structurally above 0x82287-0x82294 +description: 2026-05-05. Branch-probed all 12 audit-008-recommended PCs (6 parents + 5 shims + dispatcher) plus the AUDIT-005 9-PC producer-callsite set. 0/21 fired at -n 500M. Stop condition 1 triggered. The 0x82287000-0x82294000 cluster is not entered at all — the gate sits structurally above the cluster boundary. Diagnostic only — discipline gate fails on box 1 + 3. Next-session probe set proposed (cluster level-1 roots + new thread entry trampolines + main frame-poll callees). +type: project +originSessionId: 8822a6cd-a1c7-4484-a94f-7acd68bc35b3 +--- +🎯 **KRNBUG-AUDIT-009 (2026-05-05, READ-ONLY DIAGNOSTIC)** — stop condition 1 triggered. The 12 PCs proposed by AUDIT-008 + the 9 AUDIT-005 producer callsites all show `firings=0` at -n 500M. The renderer / scene-graph cluster (0x82287000-0x82294000) is fully unreached. The gate is structurally above the cluster and the discipline gate's box 1 (named import) + box 3 (sharp cascade prediction) both fail. Hand back, no code changes. + +## Decisive runtime evidence (audit-runs/audit-009/probe-500m.err) + +- 21 PCs armed: 6 parents (`0x82292838, 0x822878A8, 0x8228D760, 0x822900A8, 0x822919C8, 0x8228FDB8`) + 5 shims (`0x82180158, 0x821805C8, 0x82180A10, 0x82180D90, 0x821810E0`) + dispatcher `0x824AA1D8` + 9 AUDIT-005 producer-callsites (5 × 0x100c + 4 × 0x15e0 shim cluster). +- BRANCH-PROBE line count in stderr: **0**. +- Run completed at `instructions=500000010 import_calls=5629676 unimplemented=0 wall_ms=20974`. +- Final main state: `tid=1 hw=0 pc=0x822f1c60 lr=0x822f1be0 sp=0x700ff880` — inside `sub_822F1AA8` (frame-poll loop, between two `XNotifyGetNext` callsites at `0x822f1bdc` / `0x822f1c14`). LR target sits inside the same function ⇒ infinite intra-function loop. +- Counters telling the story: `XNotifyGetNext=1,489,741`, `NtWaitForSingleObjectEx=1,489,801`, `NtWaitForMultipleObjectsEx=865,493`, `RtlEnter/LeaveCriticalSection=889,109` each, `VdSwap=2`, `XAudioRegisterRenderDriverClient=1`, `XamNotifyCreateListener=1`, `XamUserGetSigninState=4`, `XamInputGetCapabilities=8`. Main is service-loop polling forever; init never proceeds past frame-poll #1. + +## Thread snapshot at deadlock + +| tid | hw | state | entry | ctx | parked-on | +|-----|----|-------|-------|-----|-----------| +| 1 | 0 | Ready | (main) | — | sub_822F1AA8 frame poll, PC=0x822f1c60 | +| 2 | 1 | Blocked | 0x824a93c8 (XamTaskSchedule cb) | 0x828a28f0 | handle 0x82870EBC = canary mirror | +| 3 | 5 | Blocked | 0x82181830 | **0x828F3D08** (= 0x100c) | handle 0x1020 (event) | +| 4 | 3 | Blocked | 0x8245a5d0 (cache scanner) | 0x828f4838 | handle 0x1028 | +| 5 | 3 | Blocked | 0x82450a28 | 0x828f3b68 | handles 0x104C, 0x1050 | +| 6 | 5 | Blocked | 0x82457ef0 | 0x828f3b08 | handles 0x10C4, 0x10C8 | +| 7 | 2 | Blocked | 0x824cd458 | 0x42450b3c | handle 0x42450B5C (heap, AUDIT_BLIND) | +| 8 | 4 | Ready | 0x822f1ee0 | 0x40d09a40 | — | +| 11 | 5 | Blocked | 0x82178950 | **0x828F3EC0** (= 0x1004) | handle 0x1004 | +| 14, 15 | 4 | Blocked | 0x822c6870 | 0x828f3300 | handle 0x1308 | +| 16 | 1 | Blocked | 0x824563e0 | 0x828f3e70 | handle 0x1308 | +| 17 | 4 | Blocked | 0x82170430 | **0x828F4070** (= 0x15e0) | handle 0x15F4 | +| 18 | 0 | Ready | 0x823dde30 | 0x828f3c4c | — | +| 19 | 3 | Blocked | 0x823ddb50 | 0x828f3c88 | handles 0x160C, 0x01000000 | +| 20 | 1 | Ready | 0x823ddb50 | 0x828f3c88 | — | + +18 worker threads exist (incl. 0x100c worker tid=3, 0x1004 tid=11, 0x15e0 tid=17 — the three target dispatchers from the audit). All three parked, `signal_attempts=0` on lifecycle handles. The new spawn entries that didn't appear in audit-008's catalog: `0x822c6870` (×2), `0x824563e0`, `0x823dde30`, `0x823ddb50` (×2). They are likely XAM/system-event dispatchers, but unprobed. + +## Canary-only export delta + +Unchanged from audit-008 baseline: `{ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings}` (3 entries). XexCheckExecutablePrivilege fires twice (priv=10, priv=11) — confirms the post-IO-003 cascade is fully banked. + +## Cluster-shape interpretation (sylpheed.db) + +The 0x82287000-0x82294000 cluster is **internally cohesive but externally unreachable via direct calls**: + +- The 6 parent functions (the audit-008 candidates) have only intra-cluster callers — none are called from main's call list, none are called from any function outside the 0x82287-0x82294 range. +- The cluster's level-1 roots — `sub_82293448` (only self-recursion), `sub_822919C8` (only self-recursion), `sub_82288028` (8 in-cluster callers), `sub_82292D80` (1 in-cluster caller) — have NO data/jump xrefs in sylpheed.db. They must be reached via indirect calls (vtables / function pointers), but the analysis pass didn't index those. +- Static byte-scan of `.rdata`/`.data` for the 4-byte BE encodings of these function addresses yielded 0 hits in audit-004. So the indirection is via dynamic init: the cluster's entry function-pointer is written somewhere at runtime, and we never write it. + +This means the gate isn't even "main fails to call sub_82293448" — it's "main fails to populate / reach the indirect-call site that would dispatch the cluster's first job." Where that registration happens is unknown. + +## Discipline gate + +| # | Condition | Pass? | +|---|---|---| +| 1 | Phase 1 named a single failing kernel/xam import (α) or narrow internal-sub bug | **NO** — 0 PCs fired | +| 2 | Canary impl small (<80 LOC) | N/A | +| 3 | Sharp 4-dim cascade prediction | **NO** — no candidate fix in scope | +| 4 | No new ABI plumbing | N/A | +| 5 | Fix doesn't touch renderer subsystem | N/A | + +Boxes 1 + 3 fail. **STOP. Hand back per stop condition 1.** + +## Follow-up probe set for next session + +Three interleaving hypotheses, one probe set: + +``` +PROBE_LIST= + # H1 — cluster level-1 roots: if any fires, gate is INSIDE cluster (renderer β): + 0x82293448,0x822919c8,0x82288028,0x82292d80,0x822851e0,0x82286bc8, + # H2 — new thread entry trampolines (post-IO-003 spawns, unprobed): + 0x822c6870,0x824563e0,0x823dde30,0x823ddb50,0x822f1ee0, + # H3 — main's frame-poll loop entry + critical PCs in its body: + 0x822f1aa8,0x822f1be0,0x822f1c14,0x822f1c40,0x822f1c60,0x822f1d00, + # H4 — main's continuation (fires only if main exits frame-poll #1): + 0x822f1638,0x821506b8,0x8216f088,0x82150ef8, + 0x82173360,0x82173530,0x8216f170,0x824a9ad8 +``` + +Discrimination logic: + +- **All cluster roots fire + frame-poll exits** → β-class within renderer. Brief's "no renderer fixes" rule binds; document and hand back to a different person/audit. +- **Frame-poll fires but never exits** → main is stuck waiting for an XAM notification. `XamNotifyCreateListener=1` was called (the listener is registered) but `XNotifyGetNext` returns nothing actionable. Investigate which area mask was registered and which notification ID main awaits. The 1.49M loop iterations are doing nothing productive. +- **One of the new thread entries fires + advances** → that thread is the missing producer; trace its call chain into the cluster. +- **Frame-poll exits AND continuation fires** → gate is in main's post-poll sequence (sub_822F1638 etc.); narrower probe of those. + +The 4-dimensional cascade prediction can only be written after this discrimination. + +## Open puzzles for next session + +1. **Does main's frame-poll loop EVER exit?** It looped 1.49M times in 500M instructions; exit may be at >1 billion instructions. A `--pc-probe=0x822f1aa8` with hit-count would show iteration cadence. +2. **What XAM notification does main await?** `XamNotifyCreateListener=1` registered ONE listener; the area mask + notification ID are in the call args. Capture and cross-reference against canary's listener registration. +3. **The 5 "Ready" threads (tids 8, 18, 20)** — are they actually scheduled? In lockstep mode, `Ready` threads should round-robin onto a quantum. If they accumulate Ready state without progressing, the lockstep scheduler may be skipping them — verify by adding `--branch-probe` at their entry trampolines and checking iteration counts. +4. **The 0x42450B5C heap-handle thread (tid=7, AUDIT_BLIND)** — still parked, still on a heap-pointer wait. Its source is unknown (not a kernel handle). Eligible only after the cluster fires. +5. **Single-thread variance** — between audit-008 and audit-009, tid mappings shifted (audit-008 said tid=6 was 0x15e0 worker; here it's tid=17). Spawn ordering depends on cycle counts not preserved across runs at different probe sets. Treat tid as ephemeral identifier; trust ctx (0x828F3D08 / 0x828F3EC0 / 0x828F4070) instead. + +## Trace artifacts (re-runnable) + +- `audit-runs/audit-009/probe-500m.log` — final state + 18-thread diag + handle audit + full counter table (22 KB). +- `audit-runs/audit-009/probe-500m.err` — full stderr trace, kernel-call log (187 KB). +- `audit-runs/audit-009/branch-probe.trace` — empty (0 BRANCH-PROBE lines emitted). + +Re-run command: + +``` +cd "/home/fabi/RE Project Sylpheed/xenia-rs" +PROBE="0x82292838,0x822878a8,0x8228d760,0x822900a8,0x822919c8,0x8228fdb8,\ +0x82180158,0x821805c8,0x82180a10,0x82180d90,0x821810e0,0x824aa1d8,\ +0x821802d8,0x821806e0,0x82180b28,0x82180ea0,0x82181254,\ +0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4" +./target/release/xenia-rs exec sylpheed.iso \ + --halt-on-deadlock --branch-probe="$PROBE" \ + --trace-handles-focus=0x1004,0x100c,0x15e0,0x1020,0x10c4 \ + -n 500000000 \ + > audit-runs/audit-009/probe-500m.log 2> audit-runs/audit-009/probe-500m.err +``` + +## Constraints honored + +- Stop condition 1 from the brief: 0/12 < 4/12 fired ⇒ hand back; no Phase 2 attempted. +- Discipline gate failed boxes 1 + 3 ⇒ no fix. +- No code modifications — `--branch-probe` from KRNBUG-AUDIT-007 was sufficient. +- No git commit (no source changes to commit). +- No backwards-compat shims, no speculative abstractions. +- Probe machinery is read-only — does not perturb lockstep digest. diff --git a/migration/claude-memory/project_xenia_rs_audit_010_xnotify_diff_2026_05_05.md b/migration/claude-memory/project_xenia_rs_audit_010_xnotify_diff_2026_05_05.md new file mode 100644 index 0000000..ba5cde7 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_010_xnotify_diff_2026_05_05.md @@ -0,0 +1,136 @@ +--- +name: project_xenia_rs_audit_010_xnotify_diff_2026_05_05 +description: KRNBUG-AUDIT-010 (2026-05-05, READ-ONLY) — XNotify delivery diff identifies 4 missing startup notifications gating dispatcher invocation. Branch (α). Discipline gate fails on box 3 (cannot name renderer L1 root). Diagnostic-only. +type: project +originSessionId: 330041a0-be3e-45bc-bfba-50468ea4e41c +--- +# KRNBUG-AUDIT-010 — XNotify delivery diff (2026-05-05, READ-ONLY) + +**Why:** First session past the kernel-boundary cascade (post-IO-003). +audit-009 left main parked in a frame-poll loop calling +`XNotifyGetNext` 1.49M times in 500M instr while the renderer cluster +remained 0/21 unreached. Per the new "delivery diff" methodology, +canary's xenia.log + canary impl files are the oracle for what +notifications should be delivered at the boot horizon vs what we +deliver. + +**How to apply:** Next session must run a one-shot `--pc-probe` at +the dispatcher `bcctrl` before implementing the listener. Without +that probe, the L1-root prediction box of the discipline gate cannot +be filled. + +## Branch + +(α) — canary delivers 4 specific startup notifications we don't. + +## Diff (decisive) + +| | canary | ours | +|---|---|---| +| `XamNotifyCreateListener(0x2F, 0)` | once @ L1395 | once (audit-009 counter=1) | +| Startup notifications enqueued by `RegisterNotifyListener` | 4 (`SystemUI`, `SystemSignInChanged`, `LiveConnectionChanged`, `LiveLinkStateChanged`) | **0** (no listener registry exists) | +| `XNotifyGetNext` returns | dequeued notifications (id≠0, r3=1) | **always r3=0** (stub) | +| Calls to `XNotifyGetNext` | (canary doesn't log every call) | 1,489,741 in 500M instr | +| `XamUserReadProfileSettings` fires | yes @ L2787 (post-listener) | NO (canary-only export) | + +## Root cause (kernel side) + +- `crates/xenia-kernel/src/xam.rs:358-361` — `xam_notify_create_listener` + is a stub: `state.alloc_handle()`, no listener storage. +- `crates/xenia-kernel/src/xam.rs:363-366` — `xnotify_get_next` is a + stub: `ctx.gpr[3] = 0`. +- `crates/xenia-kernel/src/objects.rs:14-77` — no `NotifyListener` + variant in `KernelObject`. +- No code in `xenia-kernel` references `BroadcastNotification`, + `EnqueueNotification`, or any notification queue. + +Canary reference impl: +- `kernel_state.cc:1013-1033` (`RegisterNotifyListener` + 4 startup + notifications) +- `xnotifylistener.cc:25-90` (Initialize / Enqueue / Dequeue) +- `xam_notify.cc:22-95` (XamNotifyCreateListener + XNotifyGetNext) + +## Consumer side — Sylpheed dispatch + +Main poll loop `sub_822F1AA8` does: +- `XNotifyGetNext(block[+132], 0, &id, ¶m)` (block=288-byte + alloc; listener handle at offset 132, set by `sub_822F14D8`). +- If returns 1: load `outer = mem[0x828E1F08]`, call + `outer.vtable[1](this=outer, data, id)` — drains queue in a tight + loop. + +Construction: +- `sub_8216EA68` (main) → `sub_822F2758(&outer)` → + `sub_822F14D8(block, outer)`. +- `sub_822F2758:0x822f2788` sets `outer.vtable = 0x820AD894`. +- `sub_822F14D8:0x822f15c8` sets `mem[0x828E1F08] = outer`. + +vtable read from `.pe` at file offset 0xAD894: +- vtable[0..3] = `0x825ED990` (looks like `__purecall`/abort: + calls debug callback at `mem[0x828A5B7C]` if non-null, then + apparent exit sequence) +- vtable[4] = `0x824C8F00` (`bclr 20, lt` — empty) +- vtable[5,6] = `0x825ED990` +- vtable[7] = `0x824C8F00` + +**vtable[1] target = 0x825ED990 statically resolves to abort.** +This is suspicious — canary runs Sylpheed fine through the dispatch. +Either `mem[0x828A5B7C]` holds the real handler at runtime, or the +vtable is dynamically replaced (no such write seen in xrefs to +`mem[0x828E1F08]` beyond ctor/dtor). + +## Discipline gate + +| Box | | | +|---|---|---| +| 1. Specific missing notification + file:line | ✅ | 4 IDs, kernel_state.cc:1013-1033, xnotifylistener.cc:25-51, xam_notify.cc:57-95 | +| 2. <80 LOC synthesis | ✅ | ~70 LOC est. | +| 3. Sharp 4-dim cascade prediction | ❌ | Cannot name renderer L1 root | +| 4. No renderer/GPU changes | ✅ | | + +**Box 3 fails → STOP, hand back. No fix landed.** + +## Next session = Phase-1.5 probe + Phase-2 fix + +**Step 1 (read-only probe):** temporarily patch +`xam_notify_get_next` to return one synthetic notification on first +call (e.g. id=0x0A SignInChanged, data=1). Add +`--pc-probe=0x822f1bfc,0x822f1c00`. Re-run -n 100M. Capture the +actual vtable[1] target via the bcctrl. Revert. + +- target ≠ 0x825ED990 → chase real handler chain to find renderer + L1 root. +- target = 0x825ED990 → check what populates `mem[0x828A5B7C]` at + boot. + +**Step 2 (fix):** Add `KernelObject::NotifyListener { mask, max_version, +is_system, queue }`. Track listeners on `KernelState`. +`xam_notify_create_listener`: build listener, on first `kXNotifySystem`-mask +auto-enqueue (0x9, IsUIActive=0)+(0xA, 1); on first `kXNotifyLive`-mask +auto-enqueue (0x02000001, 0x001510F1)+(0x02000003, 0). +`xnotify_get_next`: dequeue head (or matching-id), write outparams, +return 1/0. + +## Cascade prediction (provisional) + +- **Renderer L1 root**: TBD (Step 1 probe). +- **Canary-only export to fire**: `XamUserReadProfileSettings` + (canary L2787 post-listener-create). +- **signal_attempts**: renderer subsystem likely activates without + parked-handle interaction this step. +- **draws delta**: NO this step. + +## Trace artifacts + +`audit-runs/audit-010/findings.md` — full write-up. No code changes; +no commit. Audit-009 trace is the runtime evidence (XNotifyGetNext=1,489,741). + +## Stop conditions hit + +- Discipline gate box 3 fails (binding per session brief). +- Static analysis cannot resolve vtable[1] runtime target. +- Branch (α) confirmed; no need to chase β/γ paths. + +## Master HEAD at session end + +`50a4887` (unchanged from audit-009). diff --git a/migration/claude-memory/project_xenia_rs_audit_012_vtable_zero_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_012_vtable_zero_2026_05_06.md new file mode 100644 index 0000000..3a99d46 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_012_vtable_zero_2026_05_06.md @@ -0,0 +1,54 @@ +--- +name: KRNBUG-AUDIT-012 vtable=0 diagnostic 2026-05-06 +description: Pure runtime diagnostic that falsified ALL FIVE bug-class hypotheses for the audit-011 vtable=0 finding. Vtable IS correctly initialized; audit-011 misread runtime data. Discipline gate now PASSES for the original AUDIT-011 listener fix; next session = KRNBUG-IO-004. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-012 (2026-05-06, READ-ONLY)** — pure runtime diagnostic. Master HEAD `50a4887` unchanged. Tests 594/594. Lockstep `sylpheed_n50m` PASS. + +## Probes that fired (combined 100M + 500M) + +- Construction chain (100M): allocator `0x82150EF8` 8585×, outer ctor `0x8216F088` 1×, bl `0x8216F10C` 1×, inner ctor `0x822F2758` 1×, install `0x822F14D8` 1×. +- Dispatch arms (500M): `0x822F227C` 3291× / `0x822F22A4` 3291× (tid=1 main, frame chain main → 0x8216EE14 → 0x822F1E00 → 0x824BEAAC). `0x822F1D40` 1×. **All others (0x822F1B3C / 0x822F1BE8 / 0x822F1E44 / 0x822F2130 / 0x822F2200 / 0x822F2268 / 0x822F266C / 0x822F2704) fire 0×.** +- Memory transitions of `mem[0x40111890+0]`: `0x401118D0 → 0x820AD894 (cycle 5532659, sub_822F2758 wrote) → 0x820A183C (cycle 5532853, sub_8216F088 wrote)`. **Stays at 0x820A183C through end-of-run, monotonic, zero zero-transitions.** +- `mem[0x828E1F08]` (dispatcher slot): `0 → 0x40111890` at cycle 5532853, monotonic. +- `mem[0x820A183C..+12]` = real thunks `[0x82175330, 0x82175338, 0x82175340, 0x82175348]` — disasm confirms each is a `lwz r3, 8(r3); b sub_xxx` pattern to a real method. + +## All five bug-class hypotheses REFUTED + +| Angle | Verdict | +|-------|---------| +| 1. Atomic / store-store reorder | REFUTED — outer+0 monotonic, never flips back to 0 | +| 2. Memset/memcpy overlap | REFUTED — same evidence; no bulk-zero event | +| 3. GS-cookie / __report_gsfailure | REFUTED — no such kernel exports registered; ctor reaches normal epilogue at 0x822f27d0 | +| 4. .rdata mapping fidelity | REFUTED — vtable bytes match disasm, thunks are real | +| 5. Destructor ran by mistake | REFUTED — probes at 0x822F1638 + 0x822F16BC fire 0× in 500M; static analysis shows dtor zeroes the static slot, not outer+0 | + +## Reconciliation: audit-011 misread the data + +audit-011 reported `mem[0x40111890+0]=0` at PC `0x822F1BE8`. Re-reading `audit-runs/audit-011/dispatch-probe.log`: tid=1 final state shows `PC=0x8284E45C` (NOT 0x822F1BE8), `LR=0x822f1be0`. **PC `0x8284E45C` is the XAM thunk for ordinal `0x028B = XNotifyGetNext`** (xam.rs:72). The lwz at 0x822F1BE8 NEVER executes because `bc 12, 4*cr6+eq, 0x822F1C20` at 0x822f1be4 ALWAYS takes the skip arm — `xnotify_get_next` stub returns r3=0, so the "no notification" path is taken indefinitely. audit-011 captured the lwz address as the LR of the thunk return target, not as live PC. + +## Reconciliation: audit-010 misattribution + +audit-010's "vtable[1]=0x825ED990 abort handler" looked at the **inner ctor's transient vtable at 0x820AD894**, overwritten 3 instructions later by sub_8216F088 with the real vtable at `0x820A183C`. Real runtime vtable[1] = thunk at `0x82175338` → `sub_82173DC8` (a real method, not abort). + +## Discipline gate now PASSES for KRNBUG-IO-004 + +1. ✅ Specific missing notification + canary file:line (kernel_state.cc:1013-1033, xam_notify.cc:22-96) +2. ✅ Synthesis < 80 LOC (~70 LOC estimate; hard ceiling 120) +3. ✅ Sharp 4-dim cascade prediction now possible +4. ✅ No renderer/GPU code changes + +Predicted cascade for IO-004: +- Canary-only export `XamUserReadProfileSettings` + one of `KeReleaseSemaphore`/`ExTerminateThread` newly fire +- One of `{0x1004, 0x100c, 0x15e0}` records `signal_attempts > 0` +- audit-009 21-PC set: 1-3 newly reachable in `sub_82173DC8` ancestry +- draws delta = 0 this step (acknowledged) + +Phase 1.5 sanity probe BEFORE landing IO-004: emit ONE synth notification, `--pc-probe=0x822f1c00`, expected runtime CTR = `0x82175338`. If different, vtable is dynamically replaced — abort and re-trace. + +## Methodology lesson + +Five static-analysis attribution failures this hunt now (audit-004 8-pool, audit-005 sub_824ABA98, audit-006 vol-info gate, audit-010 abort handler, audit-011 PC=lr confusion). Even runtime captures need careful interpretation. The combined diagnostic (PC + dump-addr + memory snapshots over time) caught what single-shot probes missed. Continue to treat all attributions as PROVISIONAL until cross-validated. + +Trace artifacts: `audit-runs/audit-012/{probes-100m,dispatch-500m,handles-500m}.{log,err}`. Diagnostic patch (state.rs +11 LOC printing +0/+4/+8/+12 at every dump_addrs on every probe hit) stashed: `git stash list | grep audit-012`. Master tree clean. diff --git a/migration/claude-memory/project_xenia_rs_audit_014_0x15e0_wake_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_014_0x15e0_wake_2026_05_06.md new file mode 100644 index 0000000..4dec945 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_014_0x15e0_wake_2026_05_06.md @@ -0,0 +1,92 @@ +# KRNBUG-AUDIT-014 — 0x15e0 wake hypothesis FALSIFIED (2026-05-06) + +## Status +READ-ONLY DIAGNOSTIC. No fix landed. Master HEAD `d736a1d` unchanged. Tests +remain at 599. Lockstep instructions=100000012 (pre-existing IO-004 +baseline). + +## Goal +Investigate why handle 0x15e0 records `signal_attempts=1 (primary=1)` +post-IO-004 BUT tid=17 (the audit-009 stage-3 "0x15e0 worker") is still +parked. If a narrow fix-shape gap (wake-eligibility / shadow / mr / +ordering) was found, land it inline. Otherwise diagnostic-only. + +## Phase 1 finding (decisive — premise refuted) + +Trace at `-n 500M --trace-handles-focus=0x15e0` decisively shows: + +1. **0x15e0 is a Semaphore, not an Event**. Created via `NtCreateSemaphore` + at `lr=0x824ab110` on tid=1. Creator-frame chain + `0x82456a94 → 0x82456bac → 0x822f1b60 → 0x8216ee14 → 0x824ab8e0` — + **distinct** from the Event creator chain `lr=0x824a9f6c` shared by + 0x1004 / 0x100c / 0x1020 / 0x15e4. +2. **0x15e0 is healthy**: `signal_attempts=1 (primary=1) waits=1 wakes=1`. + Timeline: tid=1 wait at `lr=0x824ac578`, then tid=16 `NtReleaseSemaphore` + at `lr=0x824ab168` woke it. End-of-run DIAGNOSIS: "not stuck — signals + consumed correctly". +3. **tid=17 parks on 0x15e4**, NOT 0x15e0. End-of-run state: + `Blocked(WaitAny { handles: [5604] })` where `5604 == 0x15e4`. Worker + `r12=0x8217057c` (front of `sub_82170430`), matching the audit-009 / + audit-008 / audit-002 stage-3 worker mapping for tid=17. +4. **0x15e4 is the actually-stuck handle**: `Event/Manual waiters=1 + signals=0 waits=1 wakes=0 `. Same producer- + missing class as 0x1004 / 0x100c / 0x1020. + +The IO-004 cascade-prediction line "(e) signal_attempts on parked handles: +0x15e0 = 1 (primary=1, ghost=0)" was correct but mis-interpreted: the +semaphore did receive one signal, but it was unrelated to tid=17's wake. + +## Long-standing label error +The string "0x15e0 worker" appears in audit-002 (producer stack trace), +audit-008 (β-class gate), audit-009 (stage-3 thread-state map) and the +IO-004 prediction. The actual handle is 0x15e4 (Event/Manual). The +audit-002 memory entry already had a note "third handle is **0x15e0**, +not 0x15e4 (transcription typo)" — that correction was itself reversed: +the original audit-002 label of 0x15e4 was correct. + +## Bug class evaluation (α-ζ per prompt) +| Class | Verdict | +|---|---| +| α PKEVENT vs handle mismatch | N/A — no Set call ever targets 0x15e4 | +| β refresh_pkevent_shadow miss | N/A | +| γ wake-eligibility filter wrong | N/A — manual-reset wake works elsewhere (0x10F0 handshake; 0x15e0 semaphore wake) | +| δ memory ordering | N/A — no producer side ever runs | +| ε race scheduler.resume vs signal | N/A | +| ζ audit recorded but not propagated | N/A — diagnosis matches state.objects waiter list | + +**Producer for 0x15e4 is genuinely missing**, same class as 0x1004 / +0x100c / 0x1020. No wake-eligibility bug. + +## Discipline gate +- Box 1 (named bug class with evidence): **FAIL** — premise refuted. +- Box 2 (~30-80 LOC fix): N/A. +- Box 3 (4-dim cascade prediction): N/A. +- Box 4 (no renderer/GPU changes): N/A. +- Box 5 (lockstep determinism preserved): N/A. + +Stop condition met: hand back, no fix. + +## Cascade snapshot (unchanged from IO-004) +- swaps=2 (VdSwap frames 1+2 kernel-direct) +- draws=0 +- 18 → 20 worker threads, identical to IO-004 +- Canary-only exports unchanged: `ExTerminateThread`, `KeReleaseSemaphore`, + `XamUserReadProfileSettings`. + +## Recommended next session +Use Fork B's branch-probe results on the newly-reached renderer L1 entries +`sub_82173DC8 / 0x822c6870 / 0x824563e0 / 0x823ddb50`. The producer for +0x1004 / 0x100c / 0x1020 / 0x15e4 lives somewhere along the dispatch arm +`0x822f1be8 → 0x82175338 → 0x82173dc8 → ...`. If a sub-function gates the +Set call on a stub kernel export return value, that becomes +KRNBUG-AUDIT-015 / IO-NNN candidate. + +A secondary cleanup pass should rewrite the "0x15e0 worker" labels to +"0x15e4 worker" across the AUDIT-002 / AUDIT-008 / AUDIT-009 / IO-004 +memory entries. + +## Trace artifacts +- `xenia-rs/audit-runs/audit-014-0x15e0-wake/probe.log` — focus dump, + 19-thread state diagnostic, full handle audit table. +- `xenia-rs/audit-runs/audit-014-0x15e0-wake/probe.err` — kernel.calls + counters confirming swaps=2 / VdSwap=2 baseline unchanged. diff --git a/migration/claude-memory/project_xenia_rs_audit_015_l1_propagation_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_015_l1_propagation_2026_05_06.md new file mode 100644 index 0000000..5bff64d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_015_l1_propagation_2026_05_06.md @@ -0,0 +1,178 @@ +--- +name: KRNBUG-AUDIT-015 — L1 propagation probe; next gate is silph::Semaphore on handle 0x1308 (workitem submitter never invoked) +description: 2026-05-06. Read-only branch-probe of the 4 newly-reached renderer L1 entries (sub_82173DC8 + workers 0x822c6870 / 0x824563e0 / 0x823ddb50) at -n 500M post-IO-004. 28/112 PCs fired, decisively bounding the next gate. sub_82173DC8 dispatches all 4 startup notifications then idles. Two of the three "newly-reached" workers (tid=14, tid=15 on 0x822c6870 → sub_822c6878) park on Semaphore handle 0x1308 with signal_attempts=0; their producer chain is sub_822AE1F0 / sub_822F55F0 → sub_822C8B50 → sub_822C6808 (silph::Semaphore::Release wrapper) — neither caller fires in this run, so the workitem-submission entry is the next gate. Worker tid=16 (0x824563e0) actually progresses through its dispatch loop and parks on a Timer/CS-cycle. Tid=19 (0x823ddb50) parks at entry. Bug class is δ (pure-guest renderer state-read), not a kernel-boundary stub. No fix. +type: project +originSessionId: fork-b-2026-05-06 +--- +🎯 **KRNBUG-AUDIT-015 (2026-05-06, READ-ONLY DIAGNOSTIC, FORK B)** — branch-probed 112 PCs across the 4 renderer L1 entry trampolines and their callees. 28 fired, 84 unfired. The next gate is **Semaphore handle 0x1308** (`signals=0 waits=2`) waited on by the `sub_822c6878` worker pool — its producer chain `sub_822AE1F0 / sub_822F55F0 → sub_822C8B50 → sub_822C6808 (→ silph::Semaphore::Release at 0x824AB158)` is structurally unreached. Bug-class **δ (pure-guest renderer)**: no kernel boundary stub, no missing import — it is a workitem-submission path that main never invokes. Stop condition 1 satisfied (no narrow fix in scope this session). Master HEAD `d736a1d` unchanged. Sister Fork A on AUDIT-014 (0x15e0/0x15e4 wake) untouched. + +## Probe set used (112 PCs) +- **sub_82173DC8 dispatcher**: entry + 25 case-arm/branch-target PCs (all 5 case bodies + the post-merge dispatch helper bl 0x82174040). +- **worker 0x822c6878** (= 0x822c6870 thunk target): entry + 11 body PCs covering wait, queue scan, CS lock, vtable[1] dispatch (`bcctrl` at 0x822c6974), signal-completion `NtSetEvent`. +- **worker sub_824563E0**: entry + 16 body PCs covering all `bl` and both `bcctrl` sites. +- **worker sub_823DDB50**: entry + 10 body PCs. +- **L1 callees** (silph wrappers + renderer subs): 26 PCs. +- **audit-009 originally-unfired set** (renderer cluster + audit-005 producers): 21 PCs (preserved as control). + +Full list: see `audit-runs/audit-015-l1-propagation/probe.log` first 1KB or memory file source. + +## Per-probe fire/no-fire (28 fired) +| Group | PC | Hits | Notes | +|---|---|---|---| +| dispatcher | 0x82173dc8 | 4 | tid=1 only; r3=0x40ba9a80 (= sylpheed listener struct) | +| dispatcher | 0x82173dec | 2 | case-discriminator | +| dispatcher | 0x82173dfc | 1 | case 0xA arm body (notification 0x0A=LiveConnectionChanged) | +| dispatcher | 0x82173e40 | 1 | case 0x9 arm body (notification 0x09=SystemSignInChanged) | +| dispatcher | 0x82173e6c | 1 | case 9 atomic-CAS arm | +| dispatcher | 0x82173ed0 | 1 | post-arm convergence (r11=44(r31)==0 → exit) | +| dispatcher | 0x82173f48 | 2 | default-high arm (notification id > 0xB twice) | +| dispatcher | 0x82174030 | 3 | default exit (early) | +| dispatcher | 0x821737f0 | 2 | bl from case 9 (silph helper) | +| dispatcher | 0x822c2a80 | 1 | renderer profile-loader, tid=1, lr=0x822c28d4 (cycle 9186021) | +| dispatcher | 0x82181c28 | 1 | tid=1, lr=0x8216eaa4 (silph init helper) | +| dispatcher | 0x821707c0 | 4 | early-init helper | +| dispatcher | 0x8216f088 | 1 | tid=1 | +| dispatcher | 0x8216f798 | 1 | tid=1 | +| worker entry | 0x822c6878 | 2 | tid=14, tid=15 thread-start (lr=0xbcbcbcbc) — body PCs UNFIRED | +| worker entry | 0x824563e0 | 1 | tid=16 thread-start; loops in body | +| worker entry | 0x823ddb50 | 1 | tid=19 thread-start; body UNFIRED | +| L1 callee | 0x82456420 | 1 | tid=16 only (called once from 0x824563E0+0x24) | +| L1 callee | 0x82456530 | 865,112 | tid=16 dispatch-cycle hot-spot | +| L1 callee | 0x824568d8 | 1 | tid=16 only | +| L1 callee | 0x82334ca0 | 555 | renderer; tid=16 | +| L1 callee | 0x8244e218 | 1664 | tid=16 | +| L1 callee | 0x82612788 | 1 | NtSetTimerEx (handle 0x15d0); tid=16 cycle=70 | +| L1 callee | 0x82611cd8 | 1,825,719 | hot trivial getter (lwz lwz blr); tid=1+8 | +| silph | 0x824aa658 | 12 | obref helper (handles 0x130c, 0x1310, etc.) | +| silph | 0x824aa330 | 1,489,799 | wait wrapper; 1.49M from main lr=0x822f1e00 | +| silph | 0x824aa2f0 | 3334 | NtSetEvent wrapper | +| silph | 0x824ab240 | 865,494 | XamEnableInactivityProcessing wrapper; tid=16 inner-loop | + +**84 unfired**, including all 21 original audit-009 PCs (renderer cluster `0x82287xxx-0x82294xxx` and audit-005 producer callsites) plus the body PCs of worker 0x822c6878 (0x822c6894 / 68a4 / 68c8 / 68cc / 6960 / 6964 / 6974 / 697c / 69a0 / 69b4 — all 0 hits) and the body PCs of 0x823ddb50 (0x823ddb68 / bbe0 / bbf8 / bc14 / bc64 / bc78 / bcbc / bcc4 / bce0 / bcf4 — all 0 hits). + +## Per-question answers + +**Q1 — Does sub_82173DC8 enter all callees, or early-exit?** +Early-exit. The dispatcher fires 4 times (matching the 4 startup notifications enqueued by IO-004 — case 0x9, case 0xA, and twice on default-high for `0x02000001` / `0x02000003`). For each fire the post-arm code at 0x82173ed0 reads `r11 = 44(r31)` and **branches to 0x82174030 (early exit) when r11==0**. Field `[r31+44]` is a "callback table pointer" in the listener struct that Sylpheed never populates (= NULL). The dispatch helper at 0x82174040 (which calls `sub_822C2A80`, `sub_8216F088`, `sub_82181C28`, etc.) is never invoked from the dispatcher path. The single 0x822c2a80 fire came from a separate caller (lr=0x822c28d4, in `sub_822C27F0` — orthogonal renderer init). The 4 dispatch fires drain the listener queue once and the dispatcher is silent for the remaining 1.49M XNotifyGetNext calls. + +**Q2 — Worker 0x822c6870 (tids 14, 15) progress?** +Parks immediately. Both threads enter `sub_822c6878`, immediately call `bl 0x824AA330` at 0x822c6894 (= `silph::Semaphore::Wait` on handle 0x1308 in r3, with infinite timeout). Handle 0x1308 is a `Semaphore(0/INT_MAX)` with `signals=0 waits=2 wakes=0 `. Created by tid=13 at `lr=0x824ab110` (NtCreateSemaphore wrapper called from `sub_822C66B4` inside `sub_822C6630`, the worker-pool initializer reached from `sub_822C6A40`). Producers: only `sub_822C6808` releases it (`bl 0x824AB158` at 0x822c6848). `sub_822C6808` has a single caller chain `sub_822C8B50` (the workitem-submitter) which is itself called from `sub_822AE1F0` (at 0x822b16e0) or `sub_822F55F0` (at 0x822f5728). **Neither sub_822AE1F0 nor sub_822F55F0 was probed; both are statically reachable from main but not exercised in this 500M run** — they're the renderer's frame-update or scene-graph mutate path which Sylpheed gates on something main has not yet completed. + +**Q3 — Worker sub_824563E0 (tid=16)?** +**Progresses** through one full pass and enters a steady-state inner loop. Trace at cycle 0-1500: +1. `bl 0x824AA658` (ObReferenceObjectByHandle helper, r3=-2 = self) +2. `bl 0x82456420` (sub-init) +3. `bl 0x82612788` (NtSetTimerEx, r3=0x15d0 — timer handle, period=2) +4. `bl 0x824AB240` → `bl __imp_xam.xex_XamEnableInactivityProcessing(2)` (returns 0) +5. `bl 0x82456530` (CS-locked vtable[0]/[+0x4] dispatch via `bcctrl` at 0x8245655c — fires 865k times) +6. `bl 0x824568D8` (handle slot lookup) +7. `bl 0x82334CA0` (renderer; large 1056-byte fn — fires 555 times) +8. `bl 0x8244E218` (linked-list scanner — fires 1664 times) +9. Then loops {0x824ab240 (XamEnableInactivityProcessing) ↔ 0x82456530 (CS+bcctrl dispatch)} forever + - `0x82456530`: 865,112 fires; `0x824ab240`: 865,494 fires + - This is a poll loop. The bcctrl target inside 0x82456530 (read from `[r29+0]` at offset 4) doesn't advance any state main is waiting for. + +Tid=16 is **not the gate** — it's a healthy timer-driven inactivity-poll loop, doing what canary's equivalent does. Probably an XAM heartbeat thread. + +**Q4 — Worker sub_823DDB50 (tid=19)?** +Parks at entry. Entry fires once at thread-start (lr=0xbcbcbcbc), but every body PC (0x823ddb68 onward) is unfired. End-of-run state: `Blocked(WaitAny { handles: [5644, 16777216] })` = `[0x160C, 0x01000000]`. Handle 0x160C is `Event/Auto signals=0 waits=1 wakes=0 `; the second value 0x01000000 = INFINITE timeout. The wait must occur via a path that bypasses the probed body — likely the very first instruction at 0x823ddb50 (`mfspr r12, LR`) is followed by an out-of-probe-range early bail. Look at `sub_823DDB50` instructions 0..0x10 (entry didn't show body fire because the wait happens via `bl 0x82611CD8` at 0x823ddb68 — wait, but 0x823ddb68 is unfired). The actual wait callsite is unprobed; tid=19's lr at deadlock is `0x824ab214` (= silph wait wrapper inner). **Most likely tid=19 entered a different sub_823DDB50 sibling function via a tail-call, OR the body PCs are reached at slightly different offsets than the disasm shows**. Worth a follow-up probe with PCs in `sub_823DD838` (the parent, where 0x823ddb50 is referenced as data at 0x823dd918). + +**Q5 — New imports/exports being called we hadn't seen fire?** +Compared to audit-009 baseline (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` were canary-only), this run's counter table shows the same 3 still canary-only. New kernel calls fired post-IO-004: +- `NtCreateTimer = 1` (was 0; tid=16 timer creation for handle 0x15d0) +- `NtSetTimerEx = 1` (was 0) +- `XAudioRegisterRenderDriverClient = 1` +- `XamInputGetCapabilities = 8` +- `XamUserGetSigninState = 4` +- `XamUserGetXUID = 1` (NEW — was unprobed) +- `XNotifyPositionUI = 1` (NEW) + +No NEW canary-only-divergence relative to IO-004's accounting. + +## Next-gate hypothesis + classification + +**Class: δ (pure-guest renderer state-read)** — the gate is structurally inside the guest's frame-update / scene-graph submission path, NOT a kernel-boundary stub. + +**Hypothesis**: main is supposed to call `sub_822AE1F0` or `sub_822F55F0` once init advances past the 4 startup notifications + the singleton renderer-profile load (`sub_822C2A80` already fired once). Those functions in turn call `sub_822C8B50 → sub_822C6808`, which posts work to handle 0x1308 (the sub_822c6878 worker pool semaphore). Until that posts, tids 14 + 15 sleep on `signals=0`. + +The triggering mechanism is the dispatcher arm at `0x82173ed0`: when it observes `r11 = 44(r31) != 0` it would fall through to the post-merge handler (one of `0x82174018`, the dispatch helper at `0x82174040`, etc.). Right now `[r31+44] == 0` for all 4 dispatch fires. The field is set somewhere — but only by guest code we never reach. + +Cross-reference vs. canary log: canary's `xenia.log` shows the same dispatcher path with `[r31+44]` populated and the post-merge handler firing, leading to the renderer-job submitter cascade. The entry that populates `[r31+44]` is plausibly the same `sub_822C2A80` pathway, but a different LR-context (canary's call comes from sub_82174040, not from sub_822c28d4 as in our run). + +**File:line evidence**: +- `sub_82173DC8` dispatcher early-exit at `0x82173ed8 (bc beq cr6, 0x82174030)` — gate predicate is `[r31+44] == 0`. Disasm: `xenia-rs/sylpheed.db` instructions table for PCs 0x82173ed0-0x82173ee8. +- Workitem submitter `sub_822C8B50` at 0x822c8bb0 (`bl 0x822C6808`) is the only path to release Semaphore 0x1308. Caller: `sub_822AE1F0:0x822b16e0` or `sub_822F55F0:0x822f5728`. +- Semaphore creator: `sub_822C66B4 (bl sub_824AB110)` inside `sub_822C6630`, called from `sub_822C6A40` (the only caller). + +## Recommended next-session target + +Phase 1 of the next session should: +1. **Probe `sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808` entry PCs** at `-n 500M`. If any fires, the gate is even further upstream (a producer of the producer); if 0/4 fire, the gate is on an even earlier path that calls one of these. +2. **Probe `sub_82173DC8` post-merge dispatch helper at 0x82174040 entry, plus the 6 fall-through arms (0x82174018, 0x82173EF8, 0x82173E60, 0x82173FE4, 0x82173F70, 0x82173F44)** — see whether the 4 fired notifications are filtered out (their `[r31+44] == 0` may be a property of all 4 startup notifications; canary's `[r31+44]` is only non-zero for non-startup notifications). +3. **Dump the listener struct at runtime** with `--dump-addr=0x40ba9a80` (the r3 value seen at the dispatcher fires). This reveals what fields are populated at +44, +60 (`r11 = 60(r28)` in default arm), +64, etc. + +**No discipline-gate-passing fix this session.** Box 1 (named import α-class bug) FAILS — there is no missing kernel/xam stub at the gate; box 3 (sharp 4-dim cascade) cannot be sharpened without the dump-addr trace. + +If forced to pick a "fix candidate" for the next session, the highest-information probe is **dump-addr 0x40ba9a80** under `--branch-probe=0x82173ed0` to see the listener struct each time the dispatcher discriminates. That bypasses guesswork about which path canary takes. + +## Discipline gate + +| # | Condition | Pass? | +|---|---|---| +| 1 | Phase 1 named a single failing kernel/xam import (α) or narrow internal-sub bug | **NO** — gate is δ-class (pure-guest), no kernel boundary | +| 2 | Canary impl small (<80 LOC) | N/A | +| 3 | Sharp 4-dim cascade prediction | **NO** — needs dump-addr triage first | +| 4 | No new ABI plumbing | N/A | +| 5 | Fix doesn't touch renderer subsystem | N/A | + +Boxes 1 + 3 fail. **STOP. Hand back per stop condition 1.** No fix attempted, no commit. + +## Cascade snapshot (unchanged from IO-004) + +- swaps=2, draws=0 +- 20 worker threads +- VdSwap=2, instructions=500M completed +- Canary-only exports unchanged: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` +- New still-stuck handles: 0x1308 (Semaphore, 2 waiters), 0x15d0 (Timer, 32 waits 0 wakes — owned by tid=16's poll loop, EXPECTED), 0x15d4 (Event/Auto, 32/0) + +## Trace artifacts (re-runnable) + +- `audit-runs/audit-015-l1-propagation/probe.log` (493 MB; 5.05M BRANCH-PROBE lines; final state + 20-thread diag + handle audit + counter table) +- `audit-runs/audit-015-l1-propagation/probe.err` (188 KB; stderr / structured events) +- `audit-runs/audit-015-l1-propagation/pc-fire-counts.txt` (28 fired PCs sorted by hit count; aggregate of probe.log via `awk | sort | uniq -c`) + +Re-run command: +``` +PROBE="0x8216f088,0x8216f170,0x8216f798,0x8216f9d4,0x8216fc08,0x821700b8,0x821700f4,\ +0x821707c0,0x82173108,0x821737f0,0x82173dc8,0x82173dec,0x82173dfc,0x82173e40,\ +0x82173e6c,0x82173eb0,0x82173ec4,0x82173ed0,0x82173ef8,0x82173f00,0x82173f2c,\ +0x82173f48,0x82173f70,0x82173f74,0x82173f90,0x82174018,0x82174030,0x82174040,\ +0x8217405c,0x821740a0,0x821740e4,0x821740ec,0x821740f8,0x82174104,0x8217410c,\ +0x821752c0,0x8217da30,0x82180158,0x821802d8,0x821805c8,0x821806e0,0x82180a10,\ +0x82180b28,0x82180d90,0x82180ea0,0x821810e0,0x82181254,0x82181c28,0x82181ca8,\ +0x82181d48,0x822878a8,0x8228d760,0x8228fdb8,0x822900a8,0x822919c8,0x82292838,\ +0x822c2a80,0x822c6878,0x822c6894,0x822c68a4,0x822c68c8,0x822c68cc,0x822c6960,\ +0x822c6964,0x822c6974,0x822c697c,0x822c69a0,0x822c69b4,0x82334ca0,0x823ddb50,\ +0x823ddb68,0x823ddbe0,0x823ddbf8,0x823ddc14,0x823ddc64,0x823ddc78,0x823ddcbc,\ +0x823ddcc4,0x823ddce0,0x823ddcf4,0x8244e218,0x824563e0,0x824563fc,0x82456404,\ +0x82456420,0x82456444,0x82456458,0x8245647c,0x824564bc,0x824564cc,0x824564f8,\ +0x82456530,0x82456534,0x8245655c,0x824565b0,0x824565cc,0x824565ec,0x82456600,\ +0x8245660c,0x82456638,0x824567e0,0x824568d8,0x824aa1d8,0x824aa2f0,0x824aa330,\ +0x824aa350,0x824aa658,0x824aa848,0x824ab240,0x82611cd8,0x82612788,0x826127f0" +./target/release/xenia-rs exec sylpheed.iso \ + --halt-on-deadlock --branch-probe="$PROBE" \ + --trace-handles-focus=0x1004,0x100c,0x15e0,0x1020,0x10c4 \ + -n 500000000 \ + > audit-runs/audit-015-l1-propagation/probe.log 2> audit-runs/audit-015-l1-propagation/probe.err +``` + +## Constraints honored + +- Stop condition 1: 28/112 fired bounds the gate but doesn't yield a narrow fix. Hand back. +- Discipline gate failed boxes 1 + 3 ⇒ no fix. +- No code modifications — `--branch-probe` from KRNBUG-AUDIT-007 was sufficient. +- No git commit (no source changes; memory file + audit-findings.md update is documentation only). +- Sister Fork A on AUDIT-014 / 0x15e0 wake untouched — Set/wait/audit code paths and 0x15e0 handle mechanics not visited. +- C++ runtime audit backlog (CPPBUG-AUDIT-001) not visited. +- No new ABI plumbing. +- No git push. diff --git a/migration/claude-memory/project_xenia_rs_audit_016_submitter_callers_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_016_submitter_callers_2026_05_06.md new file mode 100644 index 0000000..3cc396a --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_016_submitter_callers_2026_05_06.md @@ -0,0 +1,137 @@ +--- +name: KRNBUG-AUDIT-016 — submitter-caller probe; gate is deeper-indirection γ (workitem submitter chain unreachable via vtable registry not populated by guest renderer init) +description: 2026-05-06. Read-only branch-probe at -n 500M of the workitem-submitter chain `sub_822AE1F0` / `sub_822F55F0` (and parents/grandparents) plus `sub_82173DC8` dispatcher arms, with `--dump-addr=0x40ba9a80` of the listener struct. **0/16 fire on any submitter-chain PC** including 4 levels of caller walk-up. The chain bottom-outs in the audit-009 renderer cluster (`0x82287xxx-0x82294xxx`) which has zero non-call xrefs — vtable-dispatched targets stored in a registry that's never populated. Listener struct dump shows `[base+0x2C] = 0x4024AC00` (callback-table pointer IS populated; audit-015's "==0" claim was wrong); `[base+0x04] = 0` (dispatch state bits NEVER set); the actual gate is `sub_821737F0`'s predicate evaluation reading bit 14 of [base+4]. Bug class **γ (deeper indirection)** not δ. Master HEAD `d736a1d` unchanged. No fix. +type: project +originSessionId: audit-016-2026-05-06 +--- + +🎯 **KRNBUG-AUDIT-016 (2026-05-06, READ-ONLY DIAGNOSTIC)** — branch-probed 30 PCs across the workitem-submitter chain (`sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808`, including bl-call-sites at `0x822B16E0` / `0x822F5728`) and 4 levels of caller walk-up: parents (`sub_822ADD70`, `sub_821A9920`, `sub_822ACAB8`, `sub_821A8578`), grandparents (`sub_82299250`, `sub_822A4460`, `sub_821A82A0`), great-grandparents (`sub_8229AB50`, `sub_822A5C10`, `sub_821AC700`), at `-n 500M`. **0/16 submitter-chain PCs fire**. Followup probe of dispatcher arms (sub_82173DC8 case 9 / case 0xB / default-high) + `--dump-addr=0x40ba9a80` reveals the listener struct is **partially populated** but `[base+0x04]` (dispatch state bits) stays zero across all 4 startup notification dispatches → `sub_821737F0` returns 0 → all dispatch arms early-exit at `0x82174030`. Gate class is **γ (deeper indirection)** — the submitter-chain entry-point is in a registry-dispatched cluster (the audit-009 renderer cluster `0x82287xxx-0x82294xxx`) that's never reached. No fix attempted. + +## Probe set used + +**Run #1** (30 PCs): Tier 1 (workitem chain entries + bl sites), Tier 2 (parents + bl sites), Tier 2.5 (grandparents), Tier 3 (sub_82173DC8 post-merge dispatch helper + early-exit `0x82174030`). + +**Run #2** (18 PCs): refined dispatcher arm coverage (case 9 atomic-CAS arm, default-high arms, helpers `sub_82181C28` / `sub_82181D48` / `sub_821737F0` / `sub_82174040`) + `--dump-addr=0x40ba9a80,0x4024AC00,0x4024B3E0,0x40111890,0x4024A380` to trace the listener and its referenced sub-objects. + +## Per-probe fire/no-fire (combined: 11+10 fires = 21 events) + +| Group | PC | Fires | Notes | +|---|---|---|---| +| dispatcher entry | 0x82173dc8 | 4 | tid=1, r3=0x40ba9a80, lr=0x822f1c04 (frame-poll caller) | +| dispatcher trampoline | 0x82175338 | 4 | tid=1 (the 4 startup notifications from IO-004) | +| dispatcher arm | 0x82173e6c | 1 | case-9 r5==0 atomic CAS arm | +| dispatcher arm | 0x82173eac | 1 | pre-bl `mr r3, r31; bl 0x821737F0` | +| dispatcher arm | 0x82173f48 | 2 | case-0xB / default-high (r11>0xB) — fired twice | +| sub_821737F0 | 0x821737F0 | 2 | (a) lr=0x82173eb4 from case 9 — returned 0 → early-exit; (b) lr=0x821741f4 from INSIDE sub_82174040 mid-body — confirms helper IS reached eventually but caller of sub_82174040 itself unprobed | +| early-exit | 0x82174030 | 3 | once from case 9 (lr=0x82173eb4), twice from default-high (lr=0x82173dd0) | +| renderer-init | 0x82181c28 | 1 | cycle 5378618, lr=0x8216eaa4 — orthogonal early-init listener-getter call (same as audit-015) | +| **Tier 1 (workitem chain)** | 0x822AE1F0, 0x822F55F0, 0x822C8B50, 0x822C6808, 0x822B16E0, 0x822F5728 | **0** | UNFIRED — entire workitem submitter chain | +| **Tier 2 (parents)** | 0x822ADD70, 0x821A9920, 0x822ACAB8, 0x821A8578 + bl sites 0x822AE12C, 0x822ACB54, 0x822ACB88, 0x821A98FC, 0x821AB1C0 | **0** | UNFIRED | +| **Tier 2.5 (grandparents)** | 0x82299250, 0x822A4460, 0x821A82A0, 0x82299724, 0x822A49B8, 0x821A8464 | **0** | UNFIRED | +| dispatcher post-merge | 0x82174040, 0x82174018, 0x8217401C, 0x82173EF8, 0x82173EC4, 0x82173EBC | **0** | UNFIRED at function-entry granularity (yet `sub_821737F0` is called from 0x821741F4 mid-body — implies entry probe missed it; possible probe-machinery gap, see anomalies below) | + +## Listener struct snapshot (`[0x40ba9a80]`, EOR after 4 dispatches) + +``` ++0x00: 40 11 18 90 — vtable pointer (= 0x40111890; vtable[0]=0x820A183C, [+0x4]=0x40D09A40, [+0x8]=0x40BA9A80 (self), [+0x10]=0x405422C0) ++0x04: 00 00 00 00 — dispatch state bits (NEVER SET despite 4 dispatches; this IS the gate) ++0x08: 00 00 00 00 — atomic counter (zeroed by case-9 r5==0 CAS) ++0x0C: 00 00 03 e8 — = 1000 (set by case 0xA: subfic r11, r11, 1000) ++0x10: 01 00 00 00 — flag ++0x20: ff ff ff ff — sentinel ++0x2C: 40 24 ac 00 — **CALLBACK-TABLE PTR A (POPULATED!)** — points to {[+0]=0x401119A0, [+0x4]=0x40111990, [+0xC]=0xFFFFFFFF, [+0x40]="game:\\dat\\GP_TITLE.pak+eng\\\0"} ++0x3C: 40 24 b3 e0 — **CALLBACK-TABLE PTR B (POPULATED!)** — secondary callback table ++0x40: 00 00 00 08 / 0000001f / 00000001 / 41d7f398 ++0x50: 40 24 a3 80 — file-list pointer +``` + +**Audit-015's claim that `[r31+44]==0` is WRONG.** `[r31+44]` = `[base+0x2C]` = `0x4024AC00` (NON-zero). The gate is not `[+0x2C]==0`; it's `[+0x04]` (dispatch state bits). The dispatcher's case-9 path goes through `sub_821737F0` which checks `[base+4]` bit 14 and bit 15 — both are 0 → falls through deeper logic that returns 0 → case-9 early-exits. The default-high path (case 0xB) calls `bl 0x82181C28` whose return is then checked: `[[r3+0]+0] != -1` and `bl 0x82181D48` returning 1; one of those gates fails too. + +Per the **0x4024AC00 dump (= callback table A)**: it contains a real game-asset string "game:\\dat\\GP_TITLE.pak+eng\\" — confirming the listener subscription HAS been initialized to point at the renderer's config tree. The renderer init fires this much. What it doesn't do is set the dispatch-state bits in `[base+0x04]`. + +## Per-question answers + +**Q1 — Which Tier 1 functions fire?** None. `sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808`, the bl call-sites `0x822B16E0` / `0x822F5728` — all 0 fires. + +**Q2 — Which Tier 2 / 2.5 callers fire?** None. The static caller chain is `sub_822AE1F0 ← sub_822ADD70 (sub_822ACAB8 +0x9C / +0xD0) ← sub_82299250 / sub_822A4460 ← sub_8229AB50 / sub_822A5C10 ← sub_8229A700 / sub_822A5AE8 ← sub_82294F30 / sub_822A1438` — bottoming out in the **audit-009 renderer cluster** `0x82294xxx`. None of these fire. For sub_822F55F0: `sub_822F55F0 ← sub_821A9920 ← sub_821A8578 ← sub_821A82A0 ← (recursive cycle with sub_821A9920) and ← sub_821ABEA8 ← sub_821AC700 ← sub_821A6470 / sub_821A6C68 / sub_821AC580` — also bottoming in the renderer cluster `0x821A6xxx`. None fire. + +**Q3 — Listener struct field-population state at 0x40ba9a80?** Partially populated. Vtable + 2 callback-table pointers (`[+0x2C]=0x4024AC00`, `[+0x3C]=0x4024B3E0`) + asset paths are set. **`[+0x04]` (dispatch state bits) stays 0 across all 4 dispatcher fires** — case-0xA's `oris 0x1; stw [r31+4]` should set bit 16, but the dump shows 0. This means either case 0xA's write doesn't persist to this struct (different r31?) OR the dispatch happens with r3 pointing to a DIFFERENT struct than 0x40ba9a80 in some fires. Most likely the dispatcher is invoked once by the trampoline (0x82175338 lr=0x822f1c04) but the per-notification dispatch fans out to internal dispatch ops that don't always target 0x40ba9a80. + +**Q4 — Cross-reference vs canary's xenia.log?** Canary log at `/home/fabi/xenia_canary_windows/xenia.log` only logs `XNotifyGetNext` import binding (no per-call detail). No useful cross-ref at the dispatcher layer. + +**Q5 — Specific notification missing or wrong?** The 4 startup notifications (per audit-015: 0x09, 0x0A, 0x02000001, 0x02000003) ARE delivered and the dispatcher arms ARE entered (case 9 + case 0xA + 2× default-high). But every arm hits an early-exit because the listener struct's dispatch state bits / sub_821737F0 deeper-predicate / case-0xB's `bl 0x82181D48` predicate all return 0. **The notifications are not missing; the listener's INTERNAL state never advances out of its initial config-load phase.** That state advance is what `sub_822AE1F0` / `sub_822F55F0` are supposed to do via the workitem-submitter chain — but they're never invoked because their callers in the renderer cluster are themselves never invoked. + +## Bug class classification: **γ (deeper indirection)** + +Earlier audits classified this as δ (pure-guest renderer state-read). The new finding refines this to γ: + +- The workitem submitter chain (`sub_822AE1F0` / `sub_822F55F0` → `sub_822C8B50` → `sub_822C6808`) is reached only via parents in the renderer cluster. +- The renderer cluster (`0x82287000-0x82294FFF` + `0x821A6xxx-0x821ABxxx`) per audit-009 has **zero non-call xrefs** to its L1 entry-points — it's vtable-dispatched from a registry that's never populated. +- The registry population is what main is supposed to drive via the listener dispatch — but the listener dispatch early-exits because the *listener's own state* is never advanced. + +It's a **chicken-and-egg loop**: listener can't advance state because workitem-submitter never fires; workitem-submitter never fires because the registry it lives in is never populated; the registry is populated by something the listener is supposed to drive. Only an external bootstrap can break it. That bootstrap is the **renderer's master init function** — likely an unprobed L1 entry like `sub_822F1AA8` (the main frame-poll loop, where main parks per AUDIT-009) or another singleton-getter chain. + +## Recommended next-session target + +**AUDIT-017** should be a focused probe on: + +1. **Dispatcher caller**: probe `0x822f1be8`, `0x822f1c04`, `0x822F1AA8` (main's frame-poll loop entry per AUDIT-009) + the entry of `sub_821752C0` (which jumps to `sub_82173DC8` per static xref). These reveal what main passes as listener struct + notification ID + r5 value to the dispatcher. + +2. **State-advance writers**: byte-scan for the BE-u32 write of `0x40ba9a80+4` ANYWHERE — find every PC that does `addi r3, ?, 4; stw r?, 0(r3)` against this address. Add those PCs to the probe set. + +3. **0x82181C28's deeper logic**: probe inside `sub_82181D48` (case 0xB's secondary predicate). Per disasm: it reads `[r3+0]+60` bit 30 (`rlwinm r11, r11, 0, 30, 30`) — find what writes that bit. If we make it return 1, case 0xB succeeds → `bctrl` fires → renderer cascade. + +4. **Probe-machinery anomaly**: `sub_82174040` entry never fires despite mid-body PC 0x821741F0 clearly executed (lr=0x821741f4 of `sub_821737F0` fire). Cross-check whether `--branch-probe` skips the very first instruction of a function (mflr r12) or has another gap. **Verify before next session by adding 0x82174040, 0x82174044, 0x82174048 to a tiny probe and tracing.** + +**Sharp 4-dim cascade prediction**: insufficient to make. AUDIT-017 needs to find what writes bit 14 / bit 15 of `[0x40ba9a80+4]` OR what writes the `[r3+0]+60` bit-30 field that gates `sub_82181D48`. If either is identifiable as a bl from a probed PC, the cascade is named. + +## Discipline gate + +| # | Condition | Pass? | +|---|---|---| +| 1 | Phase 1 named single failing kernel/xam import (α) or narrow internal-sub bug | **NO** — γ-class, no kernel boundary; gate is structural | +| 2 | Canary impl small | N/A | +| 3 | Sharp 4-dim cascade prediction | **NO** — needs further state-write triage | +| 4 | No new ABI plumbing | N/A | +| 5 | Fix doesn't touch renderer subsystem | N/A | + +Boxes 1 + 3 fail. **STOP. Hand back per stop condition 1.** No fix attempted, no commit. + +## Cascade snapshot (unchanged from AUDIT-015 / IO-004) + +- swaps=2, draws=0 +- 20 worker threads +- VdSwap=2, instructions=500M completed +- Canary-only exports unchanged: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` +- Stuck handles unchanged: 0x1004, 0x100c, 0x1020, 0x15e4, 0x1308 (Semaphore, 2 waiters tids 14+15), 0x160C, 0x42450b5c + +## Trace artifacts (re-runnable) + +- `audit-runs/audit-016-submitter-callers/probe.log` (run #1: 30 PCs, 11 fires, 9 KB) +- `audit-runs/audit-016-submitter-callers/probe.err` (run #1: 187 KB structured events) +- `audit-runs/audit-016-submitter-callers/probe2.log` (run #2: 18 PCs, 10 fires, 12 KB; +4 dump-addrs) +- `audit-runs/audit-016-submitter-callers/probe2.err` (run #2: 187 KB) + +Re-run command (run #1): +``` +PROBE="0x822AE1F0,0x822F55F0,0x822C8B50,0x822C6808,0x822B16E0,0x822F5728,\ +0x822ADD70,0x821A9920,0x822ACAB8,0x821A8578,0x822AE12C,0x822ACB54,0x822ACB88,\ +0x821A98FC,0x821AB1C0,0x82299250,0x82299724,0x822A4460,0x822A49B8,0x821A82A0,\ +0x821A8464,0x82174040,0x82174018,0x82174030,0x82173EF8,0x82173EC4,0x82173EBC,\ +0x82173DC8,0x82173ED8,0x82175338" +./target/release/xenia-rs exec sylpheed.iso --halt-on-deadlock \ + --branch-probe="$PROBE" --dump-addr=0x40ba9a80 -n 500000000 \ + > audit-runs/audit-016-submitter-callers/probe.log \ + 2> audit-runs/audit-016-submitter-callers/probe.err +``` + +## Constraints honored + +- Stop condition 1: no narrow fix in scope. Hand back. +- Discipline gate failed boxes 1 + 3 ⇒ no fix. +- No code modifications — `--branch-probe` + `--dump-addr` from prior infra were sufficient. +- No git commit (no source changes; memory file + audit-findings.md update is documentation only). +- C++ runtime audit backlog (CPPBUG-AUDIT-001) not visited. +- No new ABI plumbing. +- No git push. +- Probe-machinery extension `git stash@{0}` (audit-012 dump-on-probe) NOT applied — fell back to multiple smaller runs per stop condition. diff --git a/migration/claude-memory/project_xenia_rs_audit_017_state_bits_writer_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_017_state_bits_writer_2026_05_06.md new file mode 100644 index 0000000..a162605 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_017_state_bits_writer_2026_05_06.md @@ -0,0 +1,101 @@ +--- +name: KRNBUG-AUDIT-017 — bit-14/15 writer triage; gate is β-class with α tail (`[0x828F4070+64]==-1` early-exits sub_821737F0; `XamUserGetSigninState=stub_return_zero` would gate downstream branches even if β cleared) +description: 2026-05-06. Static + runtime probe of the listener-state writers. **5 oris-bit-14/15 + stw-+4 candidates** identified statically; **2 fire at runtime** (case-0xA at 0x82173e04 sets bit-15 once; sub_821737F0 work-path at 0x82173950 NEVER fires bit-14 because of upstream gate). **Decisive runtime evidence** — case-0xA SETS bit-15 at cycle 9183060, sub_821737F0 work-path enters at 9183561, but at 0x821738D8 reads `[r30+64]==-1` (where `r30=[0x828F48B0+0]=0x828F4070`), short-circuits to 0x82173938 → `r11=0` → no bit-14 set. The follow-up case-9 dispatch already cleared bit-15. After 4 startup notifications, no further dispatcher fires happen on `0x40ba9a80`, leaving `[+4]=0` permanently. **`[0x828F4070+64]` is initialized to -1 by `sub_821701c8` (Meyers ctor); the only non-(-1) writer is `sub_82184318` whose call chain bottoms in audit-009 renderer cluster (`sub_82187dd0 ← sub_82183ca8 ← sub_822919c8`)** — same γ-cluster blocked at audit-009. Bug class **β (guest-state read upstream)** with α tail. Master HEAD `d736a1d` unchanged. +type: project +originSessionId: audit-017-2026-05-06 +--- + +## Hand-off summary + +**Goal**: identify what should write bits 14/15 of `[0x40ba9a80+4]` (the listener dispatch-state-bits) so dispatcher case-9 stops early-exiting. + +**Static writer scan** (5 candidates flagged: oris-with-0x1/0x2 followed by stw to +4 within 8 instructions): + +| PC | Function | Sets bit | Target | +|---|---|---|---| +| 0x82173950 | sub_821737F0 | 14 | `[r28+4]` (= listener +4) — predicate work-path | +| 0x82173e04 | sub_82173DC8 | 15 | `[r31+4]` (= listener +4) — dispatcher case-0xA | +| 0x824d3ce8 | sub_824d3c78 | 15 | `[r8+4]` (struct stride 96, child via `[parent+184]`) | +| 0x824d3f24 | sub_824d3dc0 | 14 | `[r9+4]` (same pattern) | +| 0x82769b84 | sub_82766db0 | 15 | `[r3+4]` (struct stride 8 — false positive) | + +**Runtime evidence** (`audit-runs/audit-017-state-bits-writer/probe{1,3,4,5}.log`, -n 500M): + +- **Case-0xA fires once** at cycle 9183060 (PC 0x82173dfc, r3=0x40ba9a80) — sets bit-15 of `[0x40ba9a80+4]`. Confirmed by EOR dump: `[+0x0C]=0x000003E8` (= 1000, set by `subfic r11, r11, 1000` at 0x82173e30 in same arm). +- **sub_821737F0 work-path entered** at cycle 9183561 (lr=0x821737f8 from 0x82173874 `bl 0x821707C0`). Bit-15 cleared at 0x82173884 (`rlwinm r11, r11, 0, 16, 14` clears bit 15). +- **Bit-14 setter at 0x82173950 NEVER FIRES**. Why: at 0x821738E0, `cmpwi r3, -1; beq → 0x82173938` short-circuits because `r3=[r30+64]=0xFFFFFFFF`. r30 = `[0x828F48B0+0]` = `0x828F4070`. EOR dump confirms `[0x828F4070+64]=0xFFFFFFFF`. +- **Probe trace** at 0x82173938 fires with `r3=0xffffffff` (cycle 9188933) — direct confirmation of the early-exit. +- **bit-28 of `[0x828F4070+60]` IS set** at cycle 9224352 by `sub_821c4988:0x821c5450` (`ori r10, r10, 0x8; stw r10, 60(r11)`) — but 35,000 cycles AFTER the case-9 dispatcher fired, AFTER the dispatcher already early-exited. Even if it were earlier, sub_821737F0's bit-28 check at 0x821738F0 (`bne cr6, 0x82173938`) BRANCHES TO no-bit-14 IF bit-28 IS set — bit-28 is a NEGATIVE gate, not a positive one. +- **The actual positive gate is `[0x828F4070+64] != -1`**. `[0x828F4070+64]` is initialized to -1 at startup by `sub_821701c8` at 0x82170234 (`li r11, -1; stw r11, 64(r30)`). + +**The specific code path that should set bits 14/15 of `[0x40ba9a80+4]`**: + +1. `sub_82184318` (ctor) at 0x82184370 calls `bl 0x82456B58` (kernel handle creator), stores result via `stw r3, 64(r30)` at 0x82184374. This is the only writer of a non-(-1) value to offset 64 of the renderer-config sub-object. +2. Caller chain: `sub_82184318 ← sub_82187768:0x821877bc ← sub_82187dd0:0x82187e78 ← sub_82183ca8:0x82183cd8 ← {sub_822919c8, sub_82186760, sub_821c88d0}`. **`sub_822919c8` is one of the audit-009 renderer-cluster L1 entry points that has zero non-call xrefs** — registry-dispatched, never populated. + +**Two orthogonal stubs uncovered (α tail)**: +- `XamUserGetSigninState` (xam.rs:48) is `stub_return_zero`. Even if β is fixed, sub_821737F0's bit-14 deep-eval at 0x82173904-0x82173938 tests the return; with 0, takes the no-bit-14 path in 2/3 sub-branches. Also `sub_822C2A80` at 0x822c2ab0 loops `XamUserGetSigninState(0..3)` searching for any signed-in user — also broken. Canary `xam_user.cc:90-101` returns `SignedInLocally=1` for the default profile. + +**Bug class**: **β-dominant + α-tail.** Primary β is structural (renderer cluster unreachable, identical to audit-016's γ finding for the workitem-submitter chain — same bottom-out at sub_822919c8). Secondary α is `XamUserGetSigninState=stub_return_zero` which would gate downstream paths. + +## Recommended next-session target + +The β gate is the **same renderer cluster that audit-009 falsified an entry hypothesis for**. AUDIT-017 has now identified a SECOND productive path through that cluster (sub_82184318 ctor) but it's blocked at the same level (sub_822919c8 / sub_82187dd0 / sub_82186760 / sub_821c88d0 all xref-internal-only or top-out in cluster). + +**This is structurally identical to audit-016**. Recommended pivot: + +**Option A (continue probe layers)**: AUDIT-018 should probe `sub_82184318` entry, `sub_82187768` entry, `sub_82187dd0` entry directly to verify they are NOT entered at runtime, and walk one level deeper via xrefs to confirm. Probe set: `0x82184318, 0x82187768, 0x82187dd0, 0x82183ca8, 0x82186760, 0x821c88d0, 0x822919c8` + `0x82456B58` (the kernel allocator the ctor would call). If ALL 8 fail to fire at -n 500M, this confirms the same γ-class structural blocker as audit-009/-016. + +**Option B (canary log diff for missing kernel calls)**: re-run `lutris lutris:rungameid/4` and capture canary's `xenia.log` with `kernel_state.log_kernel_calls=true` enabled, diff against ours during the boot window 9.0M-9.3M cycles. Specifically watch for any kernel call that would write a real handle into `0x828F4070+64` — likely a notification-listener or window-dispatch creator we're not implementing. + +**Option C (α fix as cheap test)**: implement `XamUserGetSigninState` properly (return 1 for user_index 0; canary impl is 5 LOC). Predicted cascade: orthogonal — it would unblock sub_822C2A80's user-search loop and sub_8216F798's deep-eval, but would NOT unblock the listener dispatcher because β (`[+64]==-1`) is the dominant gate. Worth doing as it's α-class with cheap implementation, but **will not fire the cascade alone** unless β is also resolved. + +**Sharp 4-dim cascade prediction**: NEEDS FURTHER TRIAGE. The β-gate is at the same level audit-016 stopped at. AUDIT-017 cannot break the loop; it can only refine the diagnosis: audit-016's "γ cluster never reached" finding is now reaffirmed via a completely different path (workitem-submitter → renderer; ctor-handle-allocator → renderer — both bottom out in `sub_822919c8` etc.). + +## Discipline gate + +| # | Condition | Pass? | +|---|---|---| +| 1 | Phase 1 named single failing kernel/xam import (α) or narrow internal-sub bug | **PARTIAL** — α component identified (XamUserGetSigninState) but it's not the dominant gate | +| 2 | Canary impl small | **YES** for α (5 LOC at xam_user.cc:90-101) | +| 3 | Sharp 4-dim cascade prediction | **NO** — β dominant, structural | +| 4 | No new ABI plumbing | N/A (no fix this session) | +| 5 | Fix doesn't touch renderer subsystem | N/A | + +Boxes 1+3 fail. **STOP per stop condition 1.** No fix. No commit. + +## Key file paths and PCs (for next session) + +- `crates/xenia-kernel/src/xam.rs:48` — `XamUserGetSigninState` registered as `stub_return_zero` +- `xenia-canary/src/xenia/kernel/xam/xam_user.cc:90-104` — canary's signin_state impl +- `xenia-canary/src/xenia/kernel/xam/user_profile.h:101-103` — `signin_state()` returns `SignedInLocally=1` +- `xenia-canary/src/xenia/kernel/xam/xam_state.cc:48-51` — `IsUserSignedIn(0)` returns `profile != nullptr` (default profile loaded) + +Static writers (sylpheed.db): +- 0x82173950 (sub_821737F0:bit-14 setter, gated by `[r30+64]!=-1` AND XamUserGetSigninState ret check) +- 0x82173e04 (sub_82173DC8 case-0xA:bit-15 setter — fires correctly) +- 0x82184374 (sub_82184318 ctor:writes [r30+64]=kernel-handle from sub_82456B58 — UNREACHED, in renderer cluster) + +Probe artifacts: +- `audit-runs/audit-017-state-bits-writer/probe.log` (run #1: 23 PCs, 13 fires, 1.2KB) +- `audit-runs/audit-017-state-bits-writer/probe2.log` (refined 9 PCs, --quiet) +- `audit-runs/audit-017-state-bits-writer/probe3.log` (8 PCs + dump-addr 0x40ba9a80 + 0x828F48B0) +- `audit-runs/audit-017-state-bits-writer/probe4.log` (18 PCs covering all bit-28 setters + sub_821737F0 paths — found sub_821c4988 fires at cycle 9224352) +- `audit-runs/audit-017-state-bits-writer/probe5.log` (4 PCs to capture sub_821c4988 entry lr — confirmed self-recursive entry) + +## Constraints honored + +- Stop condition 1: no narrow fix in scope. Hand back. +- Discipline gate failed boxes 1+3 ⇒ no fix. +- No probe-machinery extension (used existing --branch-probe + --dump-addr). +- No git commit (read-only audit). +- No git push. +- C++ runtime audit backlog (CPPBUG-AUDIT-001) not visited. + +## Cascade snapshot (unchanged from audit-016) + +- swaps=2, draws=0 +- 20 worker threads +- VdSwap=2, instructions=500M completed +- Canary-only exports unchanged: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` +- Stuck handles unchanged (incl. 0x1004, 0x100c, 0x1020, 0x15e4, 0x1308, 0x160C, 0x42450b5c). diff --git a/migration/claude-memory/project_xenia_rs_audit_018_canary_diff_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_018_canary_diff_2026_05_06.md new file mode 100644 index 0000000..e944428 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_018_canary_diff_2026_05_06.md @@ -0,0 +1,107 @@ +# KRNBUG-AUDIT-018 — canary-log diff identifies α-class stub `KeResumeThread` (DIAGNOSTIC 2026-05-06, READ-ONLY) + +**Status**: read-only diagnostic. No fix landed. Master HEAD `7ed6192` unchanged. Tests 600. Lockstep `instructions=100000006`. + +## Context + +Three prior sessions (audit-009 / audit-016 / audit-017) identified a γ-cluster structural blocker reaching `sub_82184318:0x82184374` (the only static writer of `[0x828F4070+64]`). The audit-018 prompt directed: stop probing the same cluster; instead, diff our kernel-call sequence vs canary's `xenia.log` for any kernel/xam call canary makes that we don't, with side-effects upstream of `sub_82184318` or on `[0x828F4070+64]`. + +## Method + +1. Captured `audit-runs/audit-018-canary-diff/ours.log` at -n 500M with full default tracing (no probe extension). +2. Inspected `/home/fabi/xenia_canary_windows/xenia.log` (May 4, 348 KB, full Sylpheed boot reaching active rendering with `XamInputGetCapabilities` polling and `VdGetSystemCommandBuffer/VdRetrainEDRAM` on a `KeReleaseSemaphore(828A3230)` ticker). +3. Set-diffed kernel-call function names (regex-filtered to discard hex-address tokens). + +## Decisive findings + +1. **Function-name set diff**: only 2 kernel calls appear in canary that don't appear in our log: `ExTerminateThread`, `KeReleaseSemaphore` — both already on the audit-006 canary-only export queue. Everything else canary calls, we also call. + +2. **`KeReleaseSemaphore(828A3230, 1, 1, 0)` is hammered by canary tid `F800006C`** repeatedly (the audio render-frame ticker). This thread is created by `ExCreateThread(701CF294(00000000), 00000000, 00000000, 00000000, 824D2878, 00000000, 10000001)` — entry `0x824D2878`, ctx=0, flags=0x10000001 (suspend bit set). Canary then immediately does `ObReferenceObjectByHandle(F800006C, ...) → 3005B018`, `KeSetBasePriorityThread(3005B018, 0xF)`, **`KeResumeThread(3005B018)`**, `ObDereferenceObject`. Same pattern for second worker entry `0x824D2940` (ctx=0, flags=0x20000001). + +3. **In our run, both these threads are `Blocked(Suspended)` at end-of-run (-n 500M)**. Final-state diagnostic excerpt: + - `hw=4 idx=0 tid=9 state=Blocked(Suspended) pc=0x824d2878 lr=0xbcbcbcbc` + - `hw=5 idx=2 tid=10 state=Blocked(Suspended) pc=0x824d2940 lr=0xbcbcbcbc` + + Counter `KeResumeThread = 2` and `NtResumeThread = 6` — exactly matching canary's call pattern. + +4. **Root cause**: `crates/xenia-kernel/src/exports.rs:3658-3664` + ```rust + fn ke_resume_thread(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) { + // r3 = thread_ptr (KTHREAD). We don't track KTHREAD ↔ HW mapping through + // guest memory addresses, so accept and succeed. Real NtResumeThread + // below handles the handle-based path properly. + ctx.gpr[3] = 0; + let _ = state; + } + ``` + This is a **stub_success no-op**. It does not actually resume the thread. The guest's `ObReferenceObjectByHandle` cookie returns the handle (per `exports.rs:3787-3807` — `out_ptr` receives the handle as a stable cookie), so the `thread_ptr` argument to `KeResumeThread` IS just the handle. Our `find_by_handle(handle).map(|r| state.scheduler.resume_ref(r))` would work — but `ke_resume_thread` doesn't even attempt the lookup. + + Canary `xboxkrnl_threading.cc:216-227`: + ```cpp + dword_result_t KeResumeThread_entry(pointer_t thread_ptr) { + X_STATUS result = X_STATUS_SUCCESS; + auto thread = XObject::GetNativeObject(kernel_state(), thread_ptr); + if (thread) { + result = thread->Resume(); + } else { + result = X_STATUS_INVALID_HANDLE; + } + return result; + } + ``` + +5. **Cross-cluster confirmation**: tid=17 (entry=0x82170430, ctx=0x828F4070) IS spawned and parks at `Blocked(WaitAny { handles: [5604] })` (handle 0x15E4) — exactly the audit-014 / audit-017 listener-dispatch event. Worker body at `0x82170430-0x82170554` reads `[r29+56] (=[0x828F40A8])` as its loop predicate (NOT `+64` as audit-017 stated; +64 may still be the dispatch-state-bits but +56 is the immediate worker gate). Until tids 9/10 actually run their bodies, the audio-driven side of the cascade never starts and the listener-dispatch-state-bits chain stays gated on -1. + +## Bug class + +**α (named import stub_success on a load-bearing export)**. Specifically: `KeResumeThread` is registered (xenia-canary `kImplemented`) but our impl is a no-op cookie-returner. The 2 known canary-only exports (`KeResumeThread`'s **call chain** completes — we mark it called but the work isn't done; `ExTerminateThread` is genuinely unhit because the audio workers never reach their exit path because they never start). + +Fixing `KeResumeThread` is exactly the kind of small, narrow, ABI-correctness change the discipline gates were waiting for. + +## Discipline gate + +- Box 1 (named bug class with concrete evidence): **YES** — α-class, name+location identified at `exports.rs:3658-3664`. +- Box 2 (narrow fix ~30-80 LOC): **YES** — ~5 LOC, mirror `nt_resume_thread` pattern (lines 3666-3679) using `find_by_handle(handle).resume_ref(r)`. +- Box 3 (sharp 4-dim cascade prediction): see below. +- Box 4 (no renderer/GPU changes): YES. +- Box 5 (lockstep determinism preserved): preserved by 2 prior parallel landings (XamUserGetSigninState, IO-004); same pattern. + +All 5 boxes pass — first time since audit-013/IO-004. + +## Sharp 4-dim cascade prediction (KRNBUG-IO-005 / KRNBUG-α-005, next session) + +**Fix**: replace `ke_resume_thread` body to mirror `nt_resume_thread`: +```rust +fn ke_resume_thread(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) { + let handle = resolve_pseudo_handle(state, ctx.gpr[3] as u32); + let prev = state.scheduler.find_by_handle(handle).map(|r| state.scheduler.resume_ref(r)).unwrap_or(0); + ctx.gpr[3] = prev; +} +``` + +**Predicted cascade**: +1. **Dimension A — thread liveness**: tids 9 and 10 leave Suspended state, run bodies of `0x824D2878 / 0x824D2940`. Both are XAudio voice-render workers (call `KeReleaseSemaphore(828A3230)` repeatedly per canary). Final-state thread count moves from "2 Blocked(Suspended)" to "2 Blocked(WaitAny ...)" or "2 Ready" depending on instruction budget. +2. **Dimension B — kernel call counters**: `KeReleaseSemaphore` appears with non-zero count for the first time. Audio system advances; `XAudioSubmitRenderDriverFrame` likely fires (currently 0 calls). `NtSetEvent` count rises substantially (audio frame-complete signaling). +3. **Dimension C — canary-only exports**: 2→1 (`KeReleaseSemaphore` no longer canary-only). `ExTerminateThread` likely still missing (workers exit only on shutdown). Possibly some new exports surface that the audio path needs. +4. **Dimension D — listener / dispatch-state-bits**: NOT cleanly predictable. Strongest hypothesis: with audio workers running, the audio-side callback path advances enough state to let `sub_82184318` chain enter (via the audio-init → kernel-handle-create flow). If it doesn't, the cascade stops at A+B+C and the γ-cluster is genuinely independent. **Either outcome is decisive new information**. + +**Lockstep risk**: low — `nt_resume_thread` pattern is already proven non-flaky. Tests 600→601 expected (one new test for `ke_resume_thread` actually resuming). + +## Recommended next session + +Implement the 5-LOC fix above in a separate `ke-resume-thread/p0-canary-mirror` branch, run lockstep ×2, capture full diagnostic, evaluate cascade prediction. + +If Dimension D fires (listener `[+64]` becomes non-(-1)): the γ-cluster blocker collapses; renderer cascade unblocks; AUDIT-009-018 line of investigation closes. + +If Dimension D does NOT fire: the bug-list narrows to `ExTerminateThread` + 1-3 new exports surfaced by the audio path, AND the γ-cluster is genuinely independent — pivot per audit-017 Option B (memory-watch instrumentation on `0x828F4070+64`). + +## Trace artifacts + +- `audit-runs/audit-018-canary-diff/ours.log` (~700 lines, kernel-call traces + final-state thread diagnostics) +- `audit-runs/audit-018-canary-diff/ours.stdout.log` (counters + thread states) +- Canary log untouched: `/home/fabi/xenia_canary_windows/xenia.log` (May 4, 5260 lines, full boot) +- Diff scripts inline in this file (set-diff via `comm -23`/`comm -13` on filtered function names) + +## Methodology note + +The "ours-only" set in the diff (NtReadFile, NtWaitForSingleObjectEx, NtSetEvent, RtlEnterCriticalSection, etc.) is a **logging-level artifact**, not a real divergence: canary's `xenia.log` is debug-level (`d>`) per-call but suppresses many high-frequency calls that we count via metrics. The "canary-only" set is the actionable signal because it shows what a working boot needs that we don't provide. Two entries, one of which is a bug-of-omission (KeResumeThread no-op stub) AND blocks 2 worker threads — a rare clean kernel-boundary cause. diff --git a/migration/claude-memory/project_xenia_rs_audit_023_canary_diff_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_023_canary_diff_2026_05_06.md new file mode 100644 index 0000000..251fef1 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_023_canary_diff_2026_05_06.md @@ -0,0 +1,133 @@ +# KRNBUG-AUDIT-023 (2026-05-06, READ-ONLY w.r.t. xenia-rs; Path B canary patch+rebuild+diff) + +## Goal +Capture canary's runtime guest memory at first XamNotifyCreateListener call (mask=0x2F), +diff against ours at the same target addresses (especially the audit-017 0x828F4070+64 +hypothesis), and identify what populator's *effect* (data) canary has that ours lacks. + +## Canary patch landed (then reverted) +`xenia-canary/src/xenia/cpu/cpu_flags.{h,cc}`: declare/define `DEFINE_string(memory_dump_path,...)`. +`xenia-canary/src/xenia/kernel/xam/xam_notify.cc::XamNotifyCreateListener_entry`: on first +call (atomic-bool gated), `std::filesystem::resize_file(path, 2_GiB)` + `MappedMemory::Open` ++ `Memory::Save(&stream)`. 44 LOC total (≤50 budget). Trace at +`xenia-rs/audit-runs/audit-023-canary-diff/canary-patch.diff`. + +**Critical fix**: `PosixMappedMemory::WrapFileDescriptor` mmaps the file at its current size +without extending it; v1 patch (no `resize_file`) SIGBUS'd in `__memcpy_avx_unaligned_erms` +during `BaseHeap::Save` first 8-byte qword write. Pre-sizing to 2 GiB fixes it. + +## Build outcome +SUCCESS (Linux Debug, clang++14, ~6 minutes incremental from a CMake-regenerate-forced +full rebuild). Binary: 247065320 bytes. `strings | grep memory_dump_path` confirms flag +embedded. Symlink `/home/fabi/xenia-canary -> RE Project Sylpheed/xenia-canary` was +required because the build dir was originally CMake-configured at the no-spaces path. + +## Pre-existing canary blocker +`XexInfoCache::Init` SIGBUS at line 1406 (reading `GetHeader()->version` from mmap'd +infocache file). Worked around with `--disable_instruction_infocache=true`. + +## Memory dump captured +Path: `/tmp/audit-023-canary-memory.dump` (also copied to `xenia-rs/audit-runs/audit-023-canary-diff/canary-memory.dump`) +Size: 216,004,608 bytes (logged by AUDIT-023 message in canary stdout). +Heap commit profile (parsed): +- v00000000 (4K page): 447 committed +- v40000000 (64K): 57 +- **v80000000 (64K, XEX): 146** +- v90000000 (4K): 0 +- physical (4K): 48105 + +## Parser bit-layout discovery +`PageEntry` bitfields in canary's compiled binary place `state` at qword bits 60-61, NOT +at the declaration-order position 48-49. Likely because clang's bitfield allocator splits +across uint32_t storage units. Empirically verified: with state at bit 60, parser cursor +lands EXACTLY at file size 0xCDFF800 (= 216,004,608); with state at bit 48 it lands at +0x3EE800 (~4 MiB, header-only, missing all payload). + +## Diff results — canary-vs-ours at target addresses + +### Listener struct family 0x828F4070 (audit-017's hypothesized populator target) +**Canary at first-listener time: ALL ZEROS.** +**Ours at -n 50M: highly populated** (vtable/object data, event handles 0x15ec/0x15e4, +floats 0.6f, etc.). + +This is the OPPOSITE of audit-017's expectation. Our impl populates 0x828F4070 family +EARLIER than canary does at the first-listener moment. **Doesn't refute audit-017** — +canary's dump fired BEFORE the populator runs (kernel-init phase, not late-game phase). + +### 0x828E1F08 (the dispatcher-pointer slot) +- canary: `00 00 00 00` +- ours: `40 11 18 90` (heap pointer to our XNotifyListener struct) + +**Different mechanism.** Ours stores the listener pointer at this fixed slot; canary +doesn't (canary uses a separate `KernelState::notify_listeners_` vector and never +writes to guest memory at 0x828E1F08). + +### 0x828F3D08 (audit-003 dispatcher for handle 0x100c) +Identical except handle namespaces: +- canary: `f8000024 f8000020` at +0x40 (canary uses 0xF8xxxxxx kernel-handle range) +- ours: `00001024 00001020` at +0x40 (ours uses 0x1xxx range) +This is a CONVENTION difference — handles work, just numbered differently. NOT a bug. + +### 0x828F3D80, 0x828F3F00 (worker dispatchers) +- canary uses `0xBC...` host-physical aliases (vC0000000+) for some pointers. +- ours uses `0x40...` virtual heap pointers (v40000000+). +Both are valid heap addresses, different aliasing convention. Need careful interpretation. + +### 0x828F4838 (audit-017 area near 0x828F48B0 cluster) +**Notable ASCII string divergence:** +- canary +0x08: `58 45 4E 00` = ASCII `"XEN\0"` (likely set by xboxkrnl as a struct magic) +- canary +0x0C: `f8000034` (canary handle) +- ours +0x08: `00 00 00 00 00 00 00 00` (zero — magic + handle MISSING) + +Also +0x60: canary has `f800002c f8000028` (handles); ours `00001030 00001028` (handles). + +### audit-009 cluster L1 PCs visible in canary's dump +ALL 6 cluster L1 PCs (0x822919C8, 0x82293448, 0x82288028, 0x82292d80, 0x822851e0, +0x82286bc8) appear at addresses 0x82124xxx. INVESTIGATED — this is the static `.pdata` +exception-handler table loaded from the XEX image; OUR impl has identical bytes there +(verified `cargo run --dump-addr=0x82124800,0x82124900` matches canary byte-for-byte). +This is NOT a populator target; it's static rdata. + +## Bug-class classification + +**The audit-017 hypothesis (β-class blocker `[0x828F4070+64]==-1`) cannot be confirmed +or refuted from this dump because canary's dump fired too early.** At first-listener, +neither side has populated 0x828F4070; both are zero in canary, populated in ours +(since ours runs to -n 50M). + +The MORE INTERESTING new finding: **canary stores ASCII "XEN" at 0x828F4840 + a handle +at 0x828F4844**, ours doesn't. This is a populator effect for a struct that may or may +not be on the deadlock-causal path. Address 0x828F4838 is inside the audit-016/017 +cluster (`[0x828F48B0+0]=0x828F4070` chain). + +Discipline gate: 1+3 fail (no probe runtime confirmation; populator unknown). + +## Sharp next-session prediction OR strategic pivot + +**RECOMMENDED NEXT (AUDIT-024)**: refine the canary dump trigger to fire MUCH LATER — +e.g., at first NtSetEvent after some N seconds, or at the moment the renderer has done +its first XAudioSubmitRenderDriverFrame. That gives a fair like-for-like at a deeper +runtime point. + +Concrete approach: re-apply canary patch but trigger on Nth XamNotifyCreateListener call +(say 5+) OR on a specific notification ID matching mask 0x2F's audio events +(e.g., 0x000A0000 audio-init complete). This will give us canary's view of 0x828F4070 +WHEN populated — answering audit-017 directly. + +**Alternative low-effort pivot**: investigate the "XEN\0" string at canary's 0x828F4840. +Static-search canary's xboxkrnl source for `"XEN"` write to a guest struct field. If +found, that's exactly the missing populator. + +## Milestone status +HEAD: master `d9e40d3` unchanged. Tests 604, lockstep `instructions=100000003` preserved. +Canary patch reverted (working tree clean). Symlink `/home/fabi/xenia-canary` removed. +Core dumps from failed canary launches retained in `/var/lib/systemd/coredump/`. + +Trace artifacts (retained): +- `xenia-rs/audit-runs/audit-023-canary-diff/canary-memory.dump` (216 MB) +- `xenia-rs/audit-runs/audit-023-canary-diff/canary.log` +- `xenia-rs/audit-runs/audit-023-canary-diff/canary-patch.diff` (re-applyable) +- `xenia-rs/audit-runs/audit-023-canary-diff/parse_dump.py` +- `xenia-rs/audit-runs/audit-023-canary-diff/diff_canary_ours.py` +- `xenia-rs/audit-runs/audit-023-canary-diff/diff.txt` +- `xenia-rs/audit-runs/audit-023-canary-diff/ours-{dump,extra,pdata}.{log,err}` diff --git a/migration/claude-memory/project_xenia_rs_audit_024a_canary_delayed_trigger_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_024a_canary_delayed_trigger_2026_05_06.md new file mode 100644 index 0000000..13f87b0 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_024a_canary_delayed_trigger_2026_05_06.md @@ -0,0 +1,78 @@ +# KRNBUG-AUDIT-024A — Canary memory-dump diff at delayed trigger (2026-05-07) + +## Status + +READ-ONLY. No xenia-rs source change. Canary patch applied + reverted; `git status` clean. Master HEAD `d9e40d3` unchanged. Tests 604, lockstep `instructions=100000003` preserved. + +## Context + +Sequel to AUDIT-023. AUDIT-023 captured canary memory at first `XamNotifyCreateListener` (very early). Result: `[0x828F4070+64]` was zero in canary at that moment, suggesting populator hadn't run yet. AUDIT-024A re-runs the experiment with a much later trigger to capture post-populator state. + +## Patch summary + +Diff against `xenia-canary canary_experimental` (HEAD upstream-5-behind): +- `src/xenia/cpu/cpu_flags.{h,cc}` (8 LOC): same `DEFINE_string(memory_dump_path, ...)` flag from audit-023. +- `src/xenia/kernel/xboxkrnl/xboxkrnl_audio.cc` (31 LOC): hook in `XAudioSubmitRenderDriverFrame_entry` — atomic-bool first-call gate, pre-size 2 GiB, mmap, `Memory::Save`, close. +- Total: **39 LOC** (target was ≤50). + +Build: incremental `ninja -f build-Debug.ninja xenia_canary` after creating symlink `/home/fabi/xenia-canary -> /home/fabi/RE Project Sylpheed/xenia-canary` (CMake cache references the bare path). 16 ninja targets, ~10 s relink. Required `--disable_instruction_infocache=true` runtime workaround (preexisting canary bug). + +## Capture + +Run: `xenia_canary sylpheed.iso --log_level=3 --disable_instruction_infocache=true --memory_dump_path=/tmp/audit-024a-canary-memory.dump`. Dump materialised at 260,659,200 bytes (~248.6 MiB). Larger than audit-023's 216 MB — consistent with deeper boot. Canary log line confirms: `AUDIT-024A: dumping guest memory ... (XAudioSubmitRenderDriverFrame)` then `wrote 260659200 bytes`. + +Pre-dump telemetry confirms post-populator state: `KeReleaseSemaphore(0x828A3230, 1, 1, 0)` firing repeatedly (audio buffer-completion semaphore — audit-018 producer prediction validated), VdSwap, VdRetrainEDRAM, multiple texture loads, XamInputGetCapabilities all firing. + +## Findings + +### `[0x828F4070+64]` β-class hypothesis FALSIFIED + +`[0x828F40B0]` (=0x828F4070+64) at first `XAudioSubmitRenderDriverFrame`: +- **CANARY**: all zeros for at least 0x40 bytes. +- **OURS @ -n 500M**: `ff ff ff ff 00 00 00 00 ...` (audit-017's `-1` sentinel from sub_821701c8). + +AUDIT-017's hypothesis that `[0x828F4070+64]==-1` blocks the bit-14 setter at `0x82173950` is now **directly falsified**: canary, while running steady-state audio frame submission, has this slot at zero — never advanced past init. The bit-14 path's actual gate must admit `[+64]==0`, or canary takes a different control path entirely. Either way, the β-blocker thesis (a non-(-1) write to `[+64]` is a precondition for renderer progress) is wrong. + +### `0x828F4838+0x08` `"XEN\0 + 0xF8000034"` divergence stable + +Canary still has `"XEN\0"` magic + kernel handle `0xF8000034` at +0x08. Ours has zeros. **Confirmed stable across audit-023 (very early) and audit-024A (late) trigger points** — populator wrote this field during early init, before even the first `XamNotifyCreateListener`. Heap pointers + counts at `0x828F4838 +0x20..+0x60` populated in both (canary `0xBC36xxxx`, ours `0x4024xxxx`). + +### `0x828A3230` audio semaphore (canary only) + +Canary state at `0x828A3230`: state-quad `0x00000005`, `"XEN\0"` + handle `0xF8000070` at +0x08, release-count `01000000` at +0x14, chain at +0x18 / +0x28 with handles `0xF8000080` / `0xF800007C` and `0xBE628EDC1FCA7000` at +0x38. In ours: `KeReleaseSemaphore=0` (still canary-only); producer chain unreached at -n 500M. + +### `0x828F48B0+0x24=0x828F3EC0` + +Canary correctly stores the audit-003 dispatcher addr for handle 0x100c. Confirms singleton-pool layout, populated identically in both runtimes. + +## Bug class + +Drop β-class (`[+64]` poison) hypothesis. Reclassify as **γ-deep**: the gate between audit-013's IO-004 reach (sub_82173DC8 dispatching) and the audio producer chain firing is a multi-step renderer/audio init that fires `XAudioSubmitRenderDriverFrame` in canary but never in ours. + +Counters in our run: `XAudioRegisterRenderDriverClient=1`, `KeInitializeSemaphore=1` — registration ran and the buffer-completion semaphore was allocated. But the audio thread that calls `XAudioSubmitRenderDriverFrame` never starts feeding frames. + +## Next session — sharp prediction + +Two parallel tracks: + +(1) **AUDIT-024B (sister session)**: static-search canary source for the writer of `"XEN\0" + 0xF8000034` magic. The `"XEN" + handle` pattern is the canonical type-tag emitted by `kernel/util/object_table.cc` when a kernel object is committed to guest memory. If 024B names the writer, cross-reference with our canary-only export queue to identify the missing kernel call. + +(2) **Audio-thread-start probe**: name the kernel call that starts the audio frame submission. `XAudioRegisterRenderDriverClient_entry` returns a `0x41550000 | index` driver handle in canary; the game then has to spawn a worker thread that feeds frames. If our impl returns a working handle but the game doesn't spawn, the gate is in pure-guest code (δ); if we don't return a handle, the gate is in our `XAudioRegisterRenderDriverClient` impl (α). Counter = 1 in both, so likely δ — but probe needed. + +## Cascade prediction (4-dim) for next-session fix + +If a future fix lands the audio-thread-start gate: +- **A**: `XAudioSubmitRenderDriverFrame` count > 0 +- **B**: `KeReleaseSemaphore` count > 0 (exits canary-only export queue) +- **C**: `[0x828A3230+0x14]` becomes 1 +- **D**: open — VdSwap may or may not be paced by audio. + +## Trace artifacts + +- `audit-runs/audit-024a-canary-diff/canary-memory.dump` (248.6 MiB) +- `audit-runs/audit-024a-canary-diff/canary.log` +- `audit-runs/audit-024a-canary-diff/canary-patch.diff` +- `audit-runs/audit-024a-canary-diff/canary-state.txt` +- `audit-runs/audit-024a-canary-diff/canary-extra.txt` +- `audit-runs/audit-024a-canary-diff/ours-dump.{log,err}` +- `audit-runs/audit-024a-canary-diff/diff.txt` diff --git a/migration/claude-memory/project_xenia_rs_audit_025_audio_thread_start_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_025_audio_thread_start_2026_05_06.md new file mode 100644 index 0000000..6a57fe1 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_025_audio_thread_start_2026_05_06.md @@ -0,0 +1,195 @@ +# KRNBUG-AUDIT-025 — Audio Thread-Start Gate (READ-ONLY, 2026-05-07) + +Master HEAD at start: `d9e40d3`. **Path 2 sister session merged mid-session**; +HEAD at end: `de5a15e` (`xobj-stashhandle/p0-canary-mirror` — 7 LOC stamp of +`XEN\0` + handle in `ensure_dispatcher_object`, KRNBUG-α-006 LANDED). + +## Executive summary + +Goal: identify the gate between successful `XAudioRegisterRenderDriverClient` +(both runtimes call once with identical return `0x41550000`) and the audio +worker submitting frames (canary fires repeatedly, ours never). + +**Outcome: γ-DEEP (vtable-driven indirection, no clean kernel-stub gap).** + +The audio init runs to completion in our impl: heap object allocated, +DISPATCHER_HEADERs for `0x828A3230` (sem) / `0x828A3244` (event) / `0x828A3254` +(event) initialized, worker thread tid 9 (entry `0x824D2878`) spawned + +resumed, `ExRegisterTitleTerminateNotification(0x828A3210, 1)` registered. +Worker is correctly parked on `KeWaitForSingleObject(0x828A3254)` waiting for +a job-submit wake-up. **The wake-up function `sub_824D23B0` is invoked only +via the audio_system vtable (`[r31+0]=0x82006CF4`)** — there are zero static +direct-call xrefs to its body at `0x824D2BD8`. The vtable-method caller would +be a per-frame audio update from the renderer/scenegraph — i.e., the same +`0x82287000-0x82294000` cluster identified by AUDIT-009 as unreached. + +**The audio gate IS the renderer gate.** No new bug class. The fix is the +same one that's been gating draws/swaps since AUDIT-009/016/017 — a deep +γ-class chicken-and-egg in the dispatch-vtable population. + +## XAudioRegisterRenderDriverClient impl comparison + +**Ours** `crates/xenia-kernel/src/exports.rs:2705-2745`: +- `r3 = callback_ptr` (guest pair: PC + arg) +- Reads `mem[callback_ptr] = callback_pc`, `mem[callback_ptr+4] = callback_arg` +- `state.heap_alloc(4) → wrapped`; `mem[wrapped] = callback_arg` (BE) +- `state.xaudio.register({pc, arg, wrapped}) → index` +- `mem[driver_ptr] = 0x41550000 | (index & 0xFFFF)`; returns `STATUS_SUCCESS` +- **No host audio worker thread.** A periodic ticker + (`xaudio_tick_instr`/`tick_wallclock`) is wired but default-OFF behind + `--xaudio-tick` because under it the registered callback enters infinite + KeWait and regresses swaps 2→1 (per KRNBUG-XAUDIO-PRODUCER-001). + +**Canary** `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_audio.cc:56-82`: +- Same arg parsing. +- Calls `audio_system->RegisterClient(callback, arg, &index)` — + `xenia/apu/audio_system.cc:202-237`. **`RegisterClient` does TWO important + things our impl skips**: (a) calls `client_semaphore->Release(queued_frames)` + on a HOST `xe::threading::Semaphore` (canary's host audio worker thread, + spawned at `AudioSystem::Setup`, sleeps on this); (b) creates a host-side + `AudioDriver` (XAudio2 / SDL backend) tied to the host semaphore. +- `mem[driver_ptr] = 0x41550000 | (index & 0xFFFF)`; returns success. + +**Difference**: canary's host audio worker pumps the registered callback at +the audio frame rate. **In Sylpheed's case, the guest also has its OWN +audio worker (tid 14 = `F800006C`, entry `0x824D2878`) which loops Wait-on- +`0x828A3254` → Process → Release(`0x828A3230`) → SetEvent(`0x828A3244`).** The +guest worker is what canary's xenia.log shows hammering KeReleaseSemaphore +268+ times. So even without canary's host-worker callback firing, the guest +worker should run on its own — IF something signals `0x828A3254` repeatedly. + +## Canary log audio init (xenia.log lines 1527-1606) + +1. `MmAllocatePhysicalMemoryEx` — alloc audio heap. +2. **L1529** `KeInitializeSemaphore(0x828A3230, 0, 6)` — buffer-completion sem. +3. **L1531** `ExRegisterTitleTerminateNotification(0x828A3210, 1)`. +4. **L1532** `ExCreateThread(entry=0x824D2878, ctx=0, flags=0x10000001)` — audio worker. +5. **L1533** handle `F800006C` allocated. +6. **L1536-37** `KeSetBasePriorityThread(15)` + `KeResumeThread`. +7. **L1539** `ExCreateThread(entry=0x824D2940, ctx=0, flags=0x20000001)` — second audio thread. +8. **L1546** `K> F800006C XThread::Execute thid 14`. +9. **L1549** `XAudioRegisterRenderDriverClient(0x701CF210→0x824D6640, 0xBDFBA658)`. +10. **L1551** `client 0 registered successfully`. +11. **L1555-1606+** thread `F800006C` hammers `KeReleaseSemaphore(0x828A3230, 1, 1, 0)`. + +In our run (`audit-runs/audit-024a-canary-diff/ours-dump.err`): tid 9 spawned +(entry `0x824d2878`), tid 10 spawned (entry `0x824d2940`), `KeInitializeSemaphore=1`, +`ExRegisterTitleTerminateNotification=3`, `XAudioRegisterRenderDriverClient=1`, +**`KeReleaseSemaphore=0`**, `KeSetEvent=1`, `KeWaitForSingleObject=5`, +`KeResumeThread=2`. Init reaches the same point but the worker never gets a +job-submit signal. + +## Audio worker disasm (entry 0x824D2878) + +``` +LOOP_HEAD (0x824D28B8): + r3 = 0x828A3254 # auto-reset event + bl KeWaitForSingleObject(r3, reason=3, mode=1, alertable=0, NULL) # 0x824D28CC + r3 = mem[0x828A3264] # = audio_system_obj heap ptr + r11 = mem[r3+300] # audio_active flag + r31 = (r11 == 0) ? 1 : 0 # cntlzw + rlwinm + if r31 == 0 (audio_active != 0): + bl sub_824D2108 / sub_824D21F0 # process job + KeSetEvent(0x828A3244, 1, 0) + goto LOOP_HEAD + else (shutdown): + r5 = mem[r3+304] - 1 + if r5 != 0: KeReleaseSemaphore(0x828A3230, r5, 1) # 0x824D2904 + KeSetEvent(0x828A3244, 1, 0) + return # exit thread +``` + +Wake source for `0x828A3254`: only `sub_824D23B0` (KeSetEvent at `+0x54`, +`+0x4FC`, `+0x688`). `sub_824D23B0+0x678` writes `[r31+300] = current_thread` +(at `0x824D2A28`) so subsequent worker iterations take the active branch. + +## Caller chain of sub_824D23B0 + +Static xrefs (DuckDB `xrefs` table): +- `sub_824D23B0` ← `sub_824D2B08+0xE4` (= `0x824D2BEC`) — ONE call. +- But `sub_824D2B08` returns at `0x824D2BD4` BEFORE `0x824D2BEC`. The body + containing the bl is `0x824D2BD8..0x824D2C04` — a separate inline function + the static analyzer didn't carve out as its own `functions` row. +- ZERO static call-xrefs to `0x824D2BD8`. **Vtable indirection.** +- `sub_824D2B08` (the constructor) sets `[r31+0] = 0x82006CF4` (vtable in + `.rdata`). The vtable's slot for the job-submit method points at + `0x824D2BD8`. Caller code does + `lwz r11, 0(r_audio_obj); lwz r11, OFF(r11); mtctr r11; bctrl`. + +## Probe set fired (audit-runs/audit-025-audio-thread-start/probe.log) + +`--pc-probe=0x824D23B0,0x824D2404,0x824D28CC,0x824D28D0,0x824D28E4,0x824D290C,0x824D291C,0x824D2928,0x824D2930,0x824D2DAC,0x824D2DF8,0x824D2DFC --dump-addr=0x828A3210,0x828A3230,0x828A3244,0x828A3254,0x828A3214 -n 500_000_000 --halt-on-deadlock` + +Fired (1 of 12): `0x824D2DF8` tid=1 cycle=7,470,631 (the +ExRegisterTitleTerminate call inside `sub_824D2C08`). All audio-job-submit +PCs (`sub_824D23B0` + `0x824D2404` `KeSetEvent(0x828A3254)`) **never reached**. + +Dispatcher dump confirms init success (and Path 2's StashHandle stamp): +- `0x828A3254` Event sync: type=0x01, sig=0, +0x08="XEN\0", +0x0C=0x828A3254 +- `0x828A3230` Semaphore: type=0x05, count=0, limit=6, +0x08="XEN\0" +- `0x828A3244` Event sync: type=0x01, sig=0 +- `mem[0x828A3264]=0x4250DEDC` (audio_system heap object pointer set) + +Thread state: tid 9 `Blocked(WaitAny [0x828A3254])` at pc=0x824D28D0, +waiters=[9], signal_attempts=0 — the wait was queued and nothing has signaled +since. + +## Bug class classification: γ-DEEP (vtable-driven) + +- α (load-bearing stub): NO. `XAudioRegisterRenderDriverClient` / + `KeInitializeSemaphore` / `KeWaitForSingleObject` / `KeSetEvent` / + `ExCreateThread` / `KeResumeThread` all match canary semantically. +- β (memory predicate): NO. `[r3+300]` is at heap offset on the `0x4250DEDC` + object; the worker correctly reads zero (audio_active not yet set) — but + this isn't the *blocker*; the worker would loop normally if KeSetEvent + fired periodically because each wake attempts the read again. +- γ-deep (multi-step indirection): YES. The audio job-submit is a vtable + method on the audio_system object. Caller is a periodic frame-loop in + the renderer/scenegraph (audit-009 unreached cluster). + +This is a confirmation of audit-009/016/017's γ-class hypothesis from a new +angle. Path 2's StashHandle fix (KRNBUG-α-006) does not unblock it because +the missing piece is not "what kernel call writes XEN\0" — it is "what +populates the listener-dispatch vtable so the renderer can route per-frame +audio updates to `sub_824D23B0`". + +## Discipline gate + +- Box 1 (canary citation): PASS — `xenia/apu/audio_system.cc:202-237`, + `xboxkrnl_audio.cc:56-82`. +- Box 2 (LOC ≤ 30): N/A. +- Box 3 (runtime probe shows reach): **FAIL** — `sub_824D23B0` never reached. +- Box 4 (sharp 4-dim cascade): N/A (no fix this session). +- Box 5 (lockstep deterministic): N/A (no source modified). + +## Sharp next-session direction + +(A) **Strategic pivot (recommended)**: stop chasing audio. The audio gate IS + the renderer gate. Concentrate on AUDIT-009's `0x82287000-0x82294000` + cluster's L1 callers, specifically the listener-vtable population that + AUDIT-016 traced to `[0x40ba9a80+0]` containing the dispatch table. With + AUDIT-024A's falsification of `[+64]==-1` as the blocker (canary has + `+64==0` even when audio is firing), look for what kernel call writes + the LISTENER's vtable in canary; cross-reference against our canary-only + export queue (`ExTerminateThread`, `KeReleaseSemaphore`, + `XamUserReadProfileSettings` are all consequences not causes — the + cause is upstream). + +(B) **Audio worker host-thread emulation**: complete mirror of canary's + `AudioSystem::WorkerThreadMain` (semaphore.Release `queued_frames` times + on RegisterClient + drive callbacks from a host thread). Risk: lockstep + determinism unless quantized to instruction-count. + +(C) **Audio-side workaround**: keep `--xaudio-tick` opt-in; not recommended. + +## Trace artifacts + +- `audit-runs/audit-025-audio-thread-start/probe.log` (CTOR-PROBE + dispatcher dump) +- `audit-runs/audit-025-audio-thread-start/probe.err` (counters + thread states) +- (pre-existing) `audit-runs/audit-024a-canary-diff/ours-dump.{log,err}` +- (pre-existing) canary `/home/fabi/xenia_canary_windows/xenia.log` lines 1527-1606 + +## Cleanup + +No source modified. xenia-rs master HEAD `de5a15e` unchanged from Path 2's +merge. No commit produced this session. diff --git a/migration/claude-memory/project_xenia_rs_audit_027_v40_mem_diff_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_027_v40_mem_diff_2026_05_08.md new file mode 100644 index 0000000..666bd70 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_027_v40_mem_diff_2026_05_08.md @@ -0,0 +1,232 @@ +# KRNBUG-AUDIT-027 — v40 heap memory diff vs canary (READ-ONLY, 2026-05-08) + +Master HEAD at start/end: `e061e21`. NO source modified, NO commit. + +## Executive summary + +Comprehensive byte-level dword diff between canary's existing 248.6 MiB +memory dump (audit-024A, captured at first `XAudioSubmitRenderDriverFrame`) +and our impl's runtime memory at the v40000000 heap (1008 MiB span, +base 0x40000000, page size 65536). + +Goal: find every dword address where canary stores a `0x82xxxxxx` PC +value that ours does not, then cross-reference with the renderer cluster +L1 PCs (`0x82285000-0x82294000`) to identify any dispatch table base. + +**Outcome: NO cluster L1 PC hits in canary's v40 (broad 116-fn set: 0, +narrow 6-fn picks: 0). v40 is structurally NOT the dispatch-table source +either.** Three vtable-shaped runs were detected at `0x40000770` (32 dwords), +`0x400015a0` (110 dwords), and `0x40000d90` (20 dwords) — all populated in +canary, all zero in ours — but their target PCs cluster in `.text` at +`0x8284cxxx`/`0x8284dxxx`/`0x82882xxx` (heap-allocator class + .data +helpers), not in the renderer cluster. + +**v40 is the third heap region eliminated as the dispatch-table source +(after v80 in audit-026, `0x828Fxxxx` static .data already known).** The +heap-pointer namespace difference (canary `0xbcXXXXXX` vs ours +`0x40XXXXXX`/`0x4cXXXXXX`) means most direct address comparison at heap +locations is structurally non-comparable. + +**Strategic pivot mandatory.** The remaining places a dispatch table +could live are: (a) physical heap (0x20000000 span, 58458 commits in +canary's dump — much larger surface), (b) v00 first 256 MiB (4 KiB pages, +468 commits in canary), or (c) construction at runtime via a +register-passed pointer that never lands in heap memory. Recommend +tracing-based or vtable-write-tap approach next — see directions below. + +## Workflow + +### Step 1 — Capture our v40 + +``` +cargo run --release -p xenia-app -- exec sylpheed.iso \ + --halt-on-deadlock --quiet \ + --dump-section=0x40000000:0x3F000000:audit-runs/audit-027-v40-mem-diff/ours-v40.bin \ + -n 500000000 +# Output: dump-section: wrote 1056964608 bytes from 0x40000000 +# (60119 committed pages) to ours-v40.bin +``` + +Note `-n 500_000_000` (with underscores) is rejected by clap; use plain digits. + +### Step 2 — Extract canary's v40 + +`audit-runs/audit-027-v40-mem-diff/extract_v40.py` (adapted from +audit-026's extract_v80.py — just changes selected heap to v40 + size +0x3F000000). Walks heaps in order (v00 v40 v80 v90 physical) decoding +PageEntry state at qword bits 60-61, copies committed v40 pages into a +1008 MiB buffer. + +``` +canary v40 page count = 16128, committed = 90 +output: canary-v40.bin (1056964608 bytes) +``` + +(canary's 60119-vs-90 ratio is striking but not buggy — canary uses the +v40 heap minimally; our impl maps significantly more.) + +### Step 3 — dword-level diff + +`audit-runs/audit-027-v40-mem-diff/diff_v40.py` — same shape as +audit-026's diff_v80.py: +- A-list: canary has `0x82xxxxxx` PC, ours differs (typically zero) +- B-list: ours has `0x82xxxxxx` PC, canary differs + +``` +case A divergences: 536 (canary has PC, ours zero/different) +case B divergences: 31947 (ours has PC, canary zero/different) +``` + +## Findings + +### Histogram of canary-side PCs (top 10 by count, A-list) + +``` +0x828f3xxx 90 -- dispatcher area (0x828F3D08 etc., already known .data) +0x8284dxxx 78 -- .text near end of code section (heap-allocator handlers) +0x8284cxxx 64 -- .text near end of code section (heap-allocator handlers) +0x82150xxx 30 -- .text near base +0x828f4xxx 23 -- .data dispatcher / listener-related +0x82882xxx 20 -- .data tables +0x82153xxx 16 -- .text +0x828e2xxx 16 -- .data +0x82151xxx 15 -- .text +0x82870xxx 9 -- .data +``` + +### Cluster L1 hit count: ZERO + +``` +broad set (116 functions in 0x82285000-0x82294000): 0 hits +narrow set (6 hand-picked): 0 hits +``` + +The 9 PCs in the broader 0x822xxxxx range that DO appear in A-list all +land in `0x822F1xxx-0x822F2xxx` (`sub_822F13B0`, `sub_822F1AA8`, +`sub_822F17F0`, `sub_822F1F20`) — this is exactly the **main frame-poll +function from audit-009**, and these are stack-frame return addresses, +not dispatch-table entries. Located in `0x70xxxxxx` pages = guest stack +region. + +### Three vtable-shaped runs + +| base | length | shape | +|------------|--------|----------------------------------------------------------| +| 0x40000770 | 32 | starts `0x8284Da50, 0x8284Da60, 0x8284Da70, 0x825FB958` | +| 0x400015a0 | 110 | identical first 32 entries to 0x40000770 | +| 0x40000d90 | 20 | `0x82882910, 30, 50, 70, 90, b0, ...` (consecutive +0x20)| + +**Header pattern preceding 0x40000770 (at +0x760):** +`00 09 00 0e 00 01 10 00 40 00 01 c8 40 00 01 c8` + +**Header pattern preceding 0x400015a0 (at +0x1590):** +`00 21 00 81 00 01 10 00 40 00 01 80 40 00 01 80` + +The repeated `40 00 01 XX` self-pointer pair is characteristic of a +heap-allocator block descriptor (begin/end of an STL vector / linked +list). The two table instances are different objects of the same C++ +class with 110 virtual methods (massive class). Targets in `0x8284cxxx` +land **inside `.text`** (sec ends at `0x8284E2DC`) — these are the +heap-allocator's per-method handler thunks, NOT renderer handlers. + +The `0x40000200..0x40000600` region in canary has self-pointer chains +(`40 00 02 00, 40 00 02 00, 40 00 02 08, ...`) — canary's BaseHeap +free-list intrusive metadata. Ours has those addresses zero (different +allocator strategy). + +### Listener struct cross-reference (0x40BA9A80) + +``` +canary 0x40BA9A80: ALL ZERO (page uncommitted in canary's dump) +ours 0x40BA9A80: 40 11 18 90 +0x10=03 e8 +0x14=01 00 00 00 ... + +0x2C=40 24 ac 00 +0x3C=40 24 b3 e0 +``` + +Canary's listener lives at a different heap address (the `0x4..A9A80` +neighborhood is canary-uncommitted). Audit-016's identification of +`0x40BA9A80` as "the listener" was an OURS-side runtime lookup, not a +canary-binding location. The listener allocation is **heap-pointer +divergent** between runtimes — not a missing-write. + +### B-list (ours-only PCs) + +31947 entries. Cluster L1 PCs found in OURS at v40 heap addresses where +canary is uncommitted: 104 entries. These are in `0x42xxxxxx`, +`0x44xxxxxx`, `0x4Cxxxxxx`, `0x4Dxxxxxx` ranges — heap addresses canary +doesn't allocate. Distribution: + +``` +0x4b9xxxxx 12645 -- ours stack/locals area (e.g. repeated 0x82026068, 0x8202670c) +0x402xxxxx 2451 -- vtable-arrays our impl writes (interesting!) +0x4cf-0x4d2 6786 -- ours heap arenas canary doesn't use +``` + +`0x40211900..0x40211B50` has `0x82183ae8, 0x82187e38, 0x8218cf10, +0x82191b18, 0x821958c8, 0x82197448, 0x82199600, 0x821a3a50, 0x821ac770, +0x821b0378, 0x821b41f0, 0x821b7178, 0x821ba1c8, ...` = 23 consecutive +function entries spaced 0x20 apart. **THIS is a vtable our impl +constructs but canary may construct elsewhere** — addresses canary +DOESN'T have at all in v40 means canary's instance is allocated +in a different heap (likely physical, 58458 commits). + +## Bug-class classification + +**Outcome (iii): v40 ELIMINATED as dispatch-table source.** + +Combined with audit-026's elimination of v80, two of the four guest-virt +heap regions are conclusively non-sources for the renderer cluster's +dispatch tables. Remaining surface: + +1. **physical heap** (0x20000000 span, 58458 commits in canary) — by far + the most likely. Vtable-arrays our impl puts in `0x402xxxxx` likely + correspond to canary's physical-heap allocations. +2. **v00 heap** (256 MiB, 4 KiB pages, 468 commits in canary) — small + but non-zero. +3. **register-only / stack** — vtable-pointer constructed at runtime and + never landed in heap memory. + +## Discipline gate + +- Box 1 (canary citation): N/A — pure-data audit. +- Box 3 (probe-confirmed reachability): N/A — no fix proposed. +- This is a strategic-elimination diagnostic. + +## Sharp next-session direction + +(i) **Recommended: extract canary's PHYSICAL heap (0x20000000 span)** — + same script with `physical` selected. 58458 committed pages = + 228 MiB. This is the largest non-static surface and the most + likely home for dispatch tables. + +(ii) Alternatively, **vtable-write-tap**: instrument our memory write + path to log every `0x82xxxxxx` value written to v40/physical heap, + and diff against canary log of equivalent. Would directly identify + our missing writes without any address-namespace ambiguity. + +(iii) **CPPBUG-AUDIT-001 backlog** — `nt_allocate_virtual_memory` + silent-success-on-error + `mm_allocate_physical_memory_ex` ignores + alignment/range/protect. If we miscompute physical-heap addresses + due to allocator mismatch, that explains the heap-pointer + namespace divergence and could mask the dispatch-table writes. + +## Trace artifacts + +- `audit-runs/audit-027-v40-mem-diff/canary-v40.bin` (1056964608 bytes) +- `audit-runs/audit-027-v40-mem-diff/ours-v40.bin` (1056964608 bytes) +- `audit-runs/audit-027-v40-mem-diff/extract_v40.py` +- `audit-runs/audit-027-v40-mem-diff/diff_v40.py` +- `audit-runs/audit-027-v40-mem-diff/diff.txt` (536 entries) +- `audit-runs/audit-027-v40-mem-diff/diff-b.txt` (31947 entries) +- `audit-runs/audit-027-v40-mem-diff/histogram.txt` +- `audit-runs/audit-027-v40-mem-diff/l1-hits.txt` (header + 0/0) +- `audit-runs/audit-027-v40-mem-diff/tables.txt` (4 runs >=4) +- `audit-runs/audit-027-v40-mem-diff/pages.txt` (12 pages with diffs) +- `audit-runs/audit-027-v40-mem-diff/anchors.txt` (0x40BA9A80 empty) +- `audit-runs/audit-027-v40-mem-diff/cluster_l1_pcs.txt` (116 fns) +- `audit-runs/audit-027-v40-mem-diff/ours.log` +- `audit-runs/audit-027-v40-mem-diff/diff_run.log` + +## Cleanup + +No source modified. Master xenia-rs HEAD `e061e21` unchanged. Sister +session 028 untouched. diff --git a/migration/claude-memory/project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md b/migration/claude-memory/project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md new file mode 100644 index 0000000..b92d4ac --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_028_steady_state_notify_2026_05_06.md @@ -0,0 +1,158 @@ +# KRNBUG-AUDIT-028 — XNotify Steady-State Notification Audit + +**Date**: 2026-05-08 (per env) +**Mode**: READ-ONLY canary log + canary source analysis +**Master HEAD**: `e061e21` (unchanged) +**Tests**: 605 (unchanged) +**Lockstep**: instructions=100000003 (unchanged) + +## Goal + +Determine whether canary delivers MORE XNotify notifications during the +steady-state audio-frame loop than the 4 startup notifications IO-004 +already wired up. Per AUDIT-009, our main thread polls `XNotifyGetNext` +1.49M times forever. If canary delivers steady-state notifications we +don't, those would be the missing wake source. + +## Canary log oracles + +- `/home/fabi/xenia_canary_windows/xenia.log` — 5260 lines, 348 KB + (Windows shorter run) +- `/home/fabi/RE Project Sylpheed/xenia-rs/audit-runs/audit-024a-canary-diff/canary.log` + — 17245 lines, 1.27 MB (Linux Debug, deeper steady-state) + +`log_level=2 log_mask=0` in both. `XNotifyGetNext` is declared +`kHighFrequency` at `xam_notify.cc:96` so its CALLS are log-suppressed, +but `XamNotifyCreateListener` and `XNotifyPositionUI` ARE logged. + +## Findings + +### F1 — Canary's notify-API call timeline (full log) + +In the deeper canary log (17245 lines): + +| Line | Call | +|-------|---------------------------------------------| +| 1347 | `XamNotifyCreateListener(0x2F, 0x00000000)` | +| 2018 | `XNotifyPositionUI(0x0A)` | + +**That's it.** No further notification-API mentions for the remaining +~15000 lines of steady-state activity. + +### F2 — Canary IS in steady-state + +Tail of the log shows active per-frame work: +- `KeReleaseSemaphore(0x828A3230, 1, 1, 0)` — 2224 occurrences + (audio buffer-completion sema, hammered by tid `F8000074`) +- `XamInputGetCapabilities` — main thread (`F8000008`) polls all 4 + slots in tight loop until log end +- GPU `01000010` actively makes textures coherent + loads new tiled + textures + `VdRetrainEDRAM` +- `VdSwap` count = **1** in canary (just the export-table TOC entry — + ZERO actual swap calls logged through 17245 lines) + +So canary is busy in steady-state AND swaps are ALSO not happening +yet — our impl's swaps=2 is actually AHEAD of canary's frame counter. + +### F3 — All canary BroadcastNotification publishers (34 sites, 11 files) + +`grep -rn "BroadcastNotification\|PostNotification" --include='*.cc'` +in `/home/fabi/RE Project Sylpheed/xenia-canary/src/xenia/`: + +- `kernel_state.cc:1046` — `BroadcastNotification` impl (fans out to + all listeners) +- `kernel_state.cc:1022-1031` — startup-only enqueue (4 IDs we match) +- `xam_notify.cc:111` — `XNotifyBroadcast` shim (game-callable) +- `emulator.cc:940` — `kXNotificationLiveContentInstalled` on + `InstallContentPackage` (UI-driven) +- `emulator_window.cc:1572-1605` — UI menu actions (host UI only) +- `xam_ui.cc` x16 — menu/UI open/close events (host UI only) +- `xam_user.cc:386-389` — `kXNotificationFriendsPresenceChanged` / + `SystemAvatarChanged` on profile change +- `profile_manager.cc:307-338` — `SystemSignInChanged` on + add/delete/login profile (UI-driven) +- `apps/xmp_app.cc` x4 — XMP playlist play/stop/state-change + (game-callable via `XMPPlay*` paths only) +- `audio_media_player.cc:497` — `XmpStateChanged` on state-change + (only triggered if game plays XMP) +- `smc.cc:61` — `SystemTrayStateChanged` on disc-tray eject (SMC) +- `gamercard_ui.cc:672` — gamercard UI close (host UI only) +- `input_system.cc:69` — `SystemInputDevicesChanged` on controller + hotplug (only on slot connect/disconnect EDGE) + +**No publisher fires every frame, every audio buffer, or in any +implicit boot-time periodic.** All publishers are event-driven from +host UI, profile/XMP/UI menu, or hardware hotplug edges. + +### F4 — Canary's host-side controller hotplug check + +`input_system.h:66-68` defines the log message +`"Controller disconnected from slot {}." / "New controller connected +to slot {}."`. **Neither phrase appears in canary's log** — so no +controller hotplug fired in this run. (Sylpheed launches in +controller-pre-connected state.) + +## Outcome classification: β (XNotify is NOT the gate) + +Canary delivers ZERO additional XNotify notifications past the 4 +startup ones during the relevant boot/audio-frame window. Our impl +already matches canary's notification timeline byte-for-byte (4 +startup IDs on first listener with `mask & kXNotifySystem | +kXNotifyLive`, per IO-004). + +The 1.49M `XNotifyGetNext` polls in our main thread are exactly +mirrored in canary — both are dutiful idle polling that's part of +the game's own poll loop, NOT a missing-publisher symptom. + +## Strategic pivot — XNotify queue is closed + +The audio/render gate is NOT a missing notification. It's still the +γ-cluster from AUDIT-009/016/017/025: the renderer's per-frame +audio-update path (`sub_824D23B0` invoked via vtable on the +audio_system object at `[r31+0]=0x82006CF4`) is unreached because +the renderer cluster `0x82287000-0x82294000` is itself unreached. + +Cross-cutting: AUDIT-027 sister session is investigating the v40 +heap memory diff which may reveal whether the renderer's listener- +vtable registry is unpopulated (audit-016's "vtable-registry-not- +populated" hypothesis). + +## Recommended next session + +**AUDIT-029**: pivot back to the renderer-cluster root cause per +AUDIT-025's option (A): "what kernel call materializes the listener- +dispatch table so renderer can route per-frame audio." Specific +sub-tasks: + +1. Probe-set the L1 callers of the unreached cluster: + `sub_82293448, sub_822919C8, sub_82288028, sub_82292d80, + sub_822851e0, sub_82286bc8` (per AUDIT-009). +2. Static-grep canary source for code that populates the + `0x82006CF4` audio_system vtable at runtime — likely + `XAudioRegisterRenderDriverClient` / `AudioSystem` init + shim writing the function-pointer table. +3. Diff that population path vs our impl. Likely a missing `stw` + to a known guest-memory address that holds the dispatch + function pointer. + +Sharp 4-dim cascade prediction (provisional): +- A: one of audit-009 cluster L1 PCs starts firing. +- B: `KeReleaseSemaphore(0x828A3230)` count goes from 0 to many. +- C: `XAudioSubmitRenderDriverFrame` count goes from 0 to many. +- D: `VdSwap` count climbs (currently 2 in our impl, 1 in canary). + +## Trace artifacts + +- This memory file. +- Audit dir: `audit-runs/audit-028-steady-state-notify/` (created; + no probes — pure analysis). + +## Discipline gate + +- Box 1 (canary log oracle): PASS (cited line numbers). +- Box 2 (canary source citation): PASS (file:line for all 34 sites). +- Box 3 (runtime PC-probe): N/A (read-only outcome β; no fix). +- Box 4 (4-dim cascade): provisional only (AUDIT-029 not this). +- Box 5 (no source mods): PASS. + +No source modified. No commit. Master HEAD `e061e21` unchanged. diff --git a/migration/claude-memory/project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md new file mode 100644 index 0000000..29feea6 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_029_physical_mem_diff_2026_05_08.md @@ -0,0 +1,163 @@ +# KRNBUG-AUDIT-029 — Physical-Heap Memory Diff vs Canary + +**Date**: 2026-05-08 +**Mode**: READ-ONLY DIAGNOSTIC +**Master HEAD**: `e061e21` (unchanged) +**Lockstep**: instructions=100000003, imports=987516 (preserved — no source touched) +**Tests**: 605 (unchanged) + +## Goal +Comprehensive byte-level diff between canary's physical heap (extracted from +audit-024A's 248.6 MiB `canary-memory.dump`) and our impl's putative physical +region. Final guest-memory surface unaccounted for after audit-024A (v00), +audit-026 (v80), audit-027 (v40), and v90 (no commits). + +## Method +1. Tried dumping our `0xA0000000:0x20000000` (uncached alias). +2. Tried dumping our `0xE0000000:0x20000000` (cached alias). +3. Tried dumping our `0x00000000:0x20000000` (raw flat physical addr). +4. Extracted canary's physical heap via `extract_physical.py` (5th heap, + 4096-byte pages, state at qword bits 60-61 — same format as audit-026/027). +5. Walked all 0x82xxxxxx PC dwords on canary's physical heap, + cross-referenced with cluster L1 sets, audit-017 chain, and v40 table. + +## Architectural finding (NEW) + +**Our impl has no separate physical heap.** All three of our alias dumps +returned `0 committed pages`. `MmAllocatePhysicalMemoryEx` (exports.rs:644-676) +calls `state.heap_alloc()` (state.rs:702-720), a single bump allocator at +`heap_cursor` starting at `0x40000000`, shared with `NtAllocateVirtualMemory`. + +Canary, by contrast, has a dedicated 512MB physical pool (memory.cc:222-242) +accessible via 0xA0/0xC0/0xE0 aliases with byte ID-mapping `& 0x1FFF_FFFF` +to host membase offset 0..0x20000000. + +`translate_physical()` in `crates/xenia-memory/src/heap.rs:226-229` masks +`& 0x1FFF_FFFF` and adds to membase, but our heap_cursor never allocates +into 0..0x20000000 — that range only holds the static XEX image and is +never written by `MmAllocatePhysicalMemoryEx`. Result: physical aliases +decode to uncommitted pages. + +This is a non-bug architectural divergence (both impls correctly serve +guest reads/writes), but it means canary's 228 MiB of heap data lives at +physical addresses while ours lives at 0x40xxxxxx virtual addresses. + +## Canary physical heap (extracted) +- File: `canary-physical.bin`, 512 MiB, 24.5 MiB non-zero (4.5%). +- Committed pages: **58458** × 4096 ≈ 228 MiB. +- Total parsed = 0xf895800 = file size (clean walk). +- 0x82xxxxxx PC dword density: **28851** dwords across 4467 4K pages + (536 64K-aligned regions). + +## Diff results + +### Cluster L1 PC hits +- Narrow audit-009 hand-picked 6 (`sub_822919C8`, `sub_82293448`, + `sub_82288028`, `sub_82292D80`, `sub_822851E0`, `sub_82286BC8`): + **0 / 6 hits.** +- Broad 116-fn cluster set: **2 / 116 hits.** + - `sub_8228CC18` at phys 0x1330d620 — scalar, not in any table. + - `sub_8228A220` at phys 0x1351ef2c — scalar, not in any table. + +### Audit-017 chain PC hits +`sub_82184318`, `0x82184374` (writer), `sub_82187768`, `sub_82187dd0`, +`sub_82183ca8`, `sub_822919c8`, `sub_82186760`, `sub_821c88d0`: +**0 / 8 hits.** + +### v40-table cross-reference (CONFIRMS audit-027) +Our 18 PCs at `0x40211900..0x40211B50` (audit-017 chain family, +0x20 stride) appear verbatim on canary's physical heap at +`0x1c32c910..0x1c32cb50` — **identical 0x20 stride, identical 18 +PCs, identical trailing dup of `0x821c09d8`**. + +This proves the v40 table is `MmAllocatePhysicalMemoryEx`-allocated +in canary; our impl correctly builds the same table but at a v40 +virtual address (because of the unified bump allocator). **Table +contents are correct.** + +### Top dispatch-shaped runs (≥4 consecutive PC dwords) +| Phys addr | Length | Family | Notes | +|---------------|-------:|-------------------|-----------------------------| +| 0x1e568f38 | 232 | 0x824b0xxx-0x824b2xxx | XAM/UI dispatch (~220 PCs in 0x824b family total) | +| 0x1e6290f0 | 9 | mixed | | +| 0x1c22c9b0 | 4 | mixed | | +| 0x1ce24bc0 | 4 | mixed | | +| 0x1ce254c0 | 4 | mixed | | + +### Top PC bucket +`0x82026000` × 12655 occurrences — likely a vtable pointer for a +high-cardinality object array. Region `0x144x0000` shows stride-0x38 +entries each containing `0x820266a4` as a vtable slot (per-object, +not dispatch-table). + +## Outcome: ζ — ALL FOUR GUEST HEAPS ELIMINATED + +**No L1 PCs are stored as data on any heap.** Cluster L1 functions +(`sub_822919C8` etc.) are invoked exclusively via static `bl` +instructions in unreached parent code — they are NOT routed through +a runtime-built dispatch table. Audit-017 chain PCs are likewise +absent from all heap data. + +This refutes the entire family of "kernel call materializes a +function-pointer table" hypotheses (audits 010, 011, 012, 015, 016, +017, 026, 027, 029). The renderer cluster 0x82287000-0x82294000 is +unreached because **its static caller chain is not entered**, not +because its dispatch table is not built. + +Discipline gate fails box 1 (no fix candidate this session). + +## Strategic pivot — AUDIT-030 + +All vtable/dispatch-table hypotheses are exhausted. The gate is +**upstream of any heap data structure** — a control-flow gate, not +a data-population gate. Two viable approaches: + +**Option A (preferred): comparative-execution divergence trace.** +Instrument both runtimes to emit a deterministic event stream +(e.g., `tid:pc:lr:opcode-class` per N instructions) and `diff` to +find the first divergent guest instruction. Lockstep determinism on +our side + canary already patched for `--memory_dump_path` (audit-023, +024) makes a one-more-patch periodic execution sample feasible. + +**Option B: targeted canary trace of audio-thread wake-source.** +Per AUDIT-025, `sub_824D23B0` (sole `KeSetEvent(0x828A3254)` caller) +has zero static call-xrefs — invoked only via `[r31+0]=0x82006CF4` +audio_system vtable, which IS populated in our impl (AUDIT-026 +byte-identical). The caller must be a per-frame renderer routine +already in our binary. Patch canary to log `LR` on every entry to +`sub_824D23B0`, cross-reference with our PC trace to find which +renderer-cluster function fires in canary but not ours. + +**Option C (background backlog):** CPPBUG-AUDIT-001 items. + +## Sharp prediction (provisional, low confidence) +First divergence likely a control-flow branch in 0x82200000-0x82290000 +whose predicate reads guest memory populated by either a stub-success +kernel export or a hardware-state poll. Candidates: +- An audio_system field beyond the vtable at `0x82006CF4` (AUDIT-026 + verified vtable bytes; subsequent fields may differ). +- A GPU/EDRAM-ready / DMA-channel-idle hardware poll stubbed by us. +- A frame-counter / vsync flag advanced differently. + +## Trace artifacts +Audit dir: `xenia-rs/audit-runs/audit-029-physical-mem-diff/` +- `canary-physical.bin` — 512 MiB extracted heap (24.5 MiB non-zero) +- `ours-physical-A.bin`, `ours-physical-E.bin`, `ours-physical-flat.bin` + — all 512 MiB, all zero (alias not mapped in our impl) +- `extract_physical.py` — heap extractor +- `diff_physical.py` — one-sided PC enumeration +- `diff.txt`, `histogram.txt`, `l1-hits.txt`, `audit017-hits.txt`, + `v40table-hits.txt`, `tables.txt`, `pages.txt`, `pc-summary.txt`, + `cluster_l1_pcs.txt` + +## Cleanup +- No source modified. +- No commit; master xenia-rs HEAD `e061e21` unchanged. +- Tests 605 (unchanged), lockstep instructions=100000003 preserved. + +## Milestone status +- swaps=2 draws=0 plateau intact. +- 4/4 major guest-memory heaps eliminated as gate carriers. +- All 9 vtable/dispatch-table hypotheses (audits 010-029) refuted. +- Strategic pivot from data-driven to control-flow-driven diagnostics + is now mandatory. AUDIT-030 = comparative execution trace. diff --git a/migration/claude-memory/project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md new file mode 100644 index 0000000..725c295 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_031_audio_wait_site_2026_05_08.md @@ -0,0 +1,127 @@ +# KRNBUG-AUDIT-031 — Audio worker wait-site canary trace (2026-05-08) + +## Status: READ-ONLY — canary patch reverted at session close + +Master HEAD `e061e21` unchanged. Canary `git status` clean (working tree). + +## Summary + +Outcome **(a)** — canary's audio worker DOES execute PC `0x824D28D0` (the post-wait PC where our tid=9 parks), woken on a hot loop. Wake-source caller identified. + +Furthermore, AUDIT-025/-030's static-attribution `sub_824D23B0` as "the only wake-source" +is **mis-attribution** — IDA-DB function-boundary inference for `sub_824D23B0` (claimed range +`0x824D23B0..0x824D2878`, but it actually contains a SECOND function starting at `0x824D29F0`) +fused two distinct functions. The real wake-source is `sub_824D29F0` (no IDA-DB function header, +reached via tail-jump from a thunk at `0x824D6640`, registered at `sub_824D2C08+0x374`). + +## Setup + +Re-applied audit-030's canary patch (30 LOC across 4 files: `cpu_flags.{cc,h}`, `ppc_hir_builder.cc`, +`x64_emitter.cc`). Patch single-PC (`DEFINE_uint64 log_lr_on_pc`); 4 sequential probe runs. + +Used pre-built Debug binary at `build/bin/Linux/Debug/xenia_canary` (audit-030 leftover; reverted +SOURCES still emit it because the binary is stale-but-compatible). The freshly-rebuilt Checked +binary `build/bin/Linux/Checked/xenia_canary` failed to boot in this environment with +"A0000000 range in use by some other system DLL" — Linux build-config drift between Checked and Debug. + +Cleaned residual `/dev/shm/xenia_*` files (~100 MB code-cache shm leaks) before each canary run. + +## Probes + +All probes 25-30s each: + +1. `--log_lr_on_pc=0x824D2878` (audio worker entry): 1 fire, tid=`F8000070`, lr=`0xBCBCBCBC` (top-of-thread). +2. `--log_lr_on_pc=0x824D28D0` (post-wait PC): **54,128 fires** in ~5 min, tid=`F8000074`. Confirms wait IS being woken in canary. +3. `--log_lr_on_pc=0x8284DDDC` (KeSetEvent guest thunk, ordinal `0x9D`): 8906 fires. **Critical capture**: + `tid=0100001C lr=0x824D2A44 r3=0x828A3254 r4=1 r5=0` — this names the wake source. +4. `--log_lr_on_pc=0x824D23B0` (sub_824D23B0 entry per IDA): **0 fires** — function entry never executed. + +## Wake-source identification + +PC `0x824D2A40 bl 0x8284DDDC` = `KeSetEvent(0x828A3254, 1, 0)`. Per `xrefs` table, this is +inside function reached via the FUSED label `sub_824D23B0+0x690` — but there's a second prologue +at `0x824D29F0` (`mfspr r12, LR; bl 0x825F0F88; stwu r1, -192(r1)`), making this a separate function. + +Static reachability of `sub_824D29F0` (xrefs table): +- `0x824D6648 b 0x824D29F0` (kind=`j`, tail-jump from a 12-byte thunk at `0x824D6640`) +- `0x824D6640` is referenced as DATA at `sub_824D2C08+0x374` (kind=`ref`, instruction=`addi`). + At PC `0x824D2F7C: addi r4, r10, 26176` (=`r4 = 0x824D6640`); next instructions load r3 + from `[r31][68]`, dereference vtable[7] (`[[r3]+28]`), call `bcctrl 20,lt` → registers + the thunk address `0x824D6640` as a callback on whatever object r31 points to. + +So: in canary, after `sub_824D2C08` registers the callback at +0x374, something invokes that +thunk periodically — likely a per-frame audio update or VBLANK. The thunk loads +`r3 = [0x828A3264]` (the audio-engine context pointer) and tail-jumps into `sub_824D29F0`, +which runs through to `KeSetEvent(0x828A3254, 1, 0)` at `+0x50`, waking the audio worker. + +## Our impl behavior at the same PC + +Final-state diagnostic from `xenia-rs exec sylpheed.iso --halt-on-deadlock -n 500000000`: + +``` +hw=4 idx=0 tid=9 state=Blocked(WaitAny { handles: [2190094932], deadline: None }) + pc=0x824d28d0 lr=0x824d28d0 sp=0x71387e80 +``` + +`2190094932 = 0x828A3254`. tid=9 is parked at the post-wait PC, exactly matching AUDIT-025. + +Branch-probe verification (`--branch-probe=0x824D2878,0x824D2880,0x824D2884,0x824D2890,0x824D289C,0x824D28A0,0x824D28D0,0x824D28E4,0x824D2904,0x824D2928`): +- `0x824D2878` fires once (entry, cycle 0) +- `0x824D2880` fires once (cycle 9, immediately after `bl 0x825F0F84` save-helper return) +- All later PCs: 0 fires + +So tid=9 enters once, hits the prologue, falls through to the wait at `0x824D28CC bl 0x8284DDCC` +(KeWaitForSingleObject), and parks. The `--pc-probe`/`--branch-probe` instrumentation only fires +when the PC is actually executed, not when a thread is dwelling at it post-wait return — so the +"0 fires" at 0x824D28D0 is consistent with parking BEFORE that PC's first execution (tid=9 is +suspended INSIDE KeWait). When/if KeWait returns with success, 0x824D28D0 would fire. + +## Bug-class + +**γ-deep, vtable-driven** (unchanged from AUDIT-025) — but now with the correct wake-source target +identified. The wake function `sub_824D29F0` is reached via: +1. Object at `[0x828A3264]` (the audio-engine context, r31 in `sub_824D2C08`) has a `vtable[7]` + "register-callback" method. +2. `sub_824D2C08+0x374` calls it with the thunk address `0x824D6640`. +3. Some host/guest scheduler periodically dispatches that callback per audio frame. + +In our impl, step 3 is what's missing. Per AUDIT-025, `sub_824D2C08` runs to completion (so step 2 +fires). The host-side dispatch loop that should periodically invoke `0x824D6640` is the unreached +gate. This aligns with the broader unreached-renderer-cluster picture in AUDIT-009/-016/-017. + +## Sharp next-session prediction (AUDIT-032) + +Trace what dispatches the registered callback in canary: + +1. Probe `0x824D6640` directly (`--log_lr_on_pc=0x824D6640`) in canary. Capture lr — names the dispatcher. +2. Probe `sub_824D2F7C` (the `addi r4, ...,26176` callsite) and adjacent `bcctrl` at `0x824D2F90` + in canary to capture r3 + the vtable pointer being invoked. r3 names the audio-engine "this", + and `[r3]+28` names the vtable[7] entry. +3. Walk r3's vtable[7] target in the DB. If that target is in our IDA DB but unreached, that's the + new probe target for the renderer/scheduler that should periodically invoke registered callbacks. +4. Cross-check that `sub_824D29F0` is reached in our impl after a fix to step 3. + +Predicted outcome: dispatcher is part of the same `0x82287000-0x82294000` cluster identified by +AUDIT-009/-016/-017 as unreached. If true, then audio-thread wake gate IS the renderer gate; same +γ-cluster blocker; but now with a NAMED downstream witness (`sub_824D29F0` callable proves the +dispatcher ran) instead of the indirect handle-signal proxy. + +## Files + +- Memory: this file +- Trace logs: `/home/fabi/RE Project Sylpheed/xenia-rs/audit-runs/audit-031-wait-site/` + - `canary-0x824D2878.log` (1 trace) + - `canary-0x824D28D0.log` (54128 traces) + - `canary-KeSetEvent.log` (8906 traces, contains wake-source capture) + - `canary-sub23B0.log` (0 traces — falsifies sub_824D23B0 reachability) + +## Discipline gate + +- Box 1 (canary citation): PASS — direct trace + xrefs table cited +- Box 2 (β/γ class identified): PASS — γ-deep, vtable-driven dispatcher +- Box 3 (probe machinery sane): PASS — 4 separate canary probes consistent +- Box 4 (sharp prediction): PASS — AUDIT-032 has 4-step concrete plan +- Box 5 (no fix attempt): PASS — read-only + +Hand-off: AUDIT-032 (canary probe of `0x824D6640` + sub_824D2F7C/F90 with `--log_lr_on_pc`, +walk vtable[7] target, cross-reference with AUDIT-009 cluster). diff --git a/migration/claude-memory/project_xenia_rs_audit_032_audio_host_pump_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_032_audio_host_pump_2026_05_08.md new file mode 100644 index 0000000..ea5aa14 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_032_audio_host_pump_2026_05_08.md @@ -0,0 +1,52 @@ +--- +name: AUDIT-032 audio is host-pumped + audit-025 revision (2026-05-08) +description: Decisive correction — audio gate is NOT the renderer gate (revising audit-025). Canary uses a host-side WorkerThreadMain that calls processor->Execute on the registered guest callback directly. The 0x824D6640 thunk is the callback_ptr arg, not a vtable[7] entry. 7 prior sessions (018/KE-001/024A/025/026/030/031) chased an audio gate while the renderer plateau remained the independent draws=0 blocker. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-032 (2026-05-08, READ-ONLY, master `e061e21` unchanged)**: re-applied audit-030's `--log_lr_on_pc` patch with PC `0x824D6640` (the thunk audit-031 named as the renderer→audio dispatcher). 7,875 captures in 40s in canary's "Audio Worker (0100001C)" thread. + +## The decisive observation + +LR is invariant `0xBCBCBCBC` — canary's host stack-canary fill, NOT a valid PPC PC. Canary's `xe::apu::AudioSystem::Setup()` (`xenia-canary/src/xenia/apu/audio_system.cc:84-95`) spawns an `XHostThread` "Audio Worker" running `WorkerThreadMain()`. The loop blocks on `WaitAny(client_semaphores_)`; on wake, calls `processor_->Execute(thread_state, callback_pc=0x824D6640, args)` directly — invoking the guest callback **without a guest call site**. + +The instruction at `sub_824D2C08+0x374` (`addi r4, r10, 26176`) loads PC `0x824D6640` as the **callback_ptr argument** to `XAudioRegisterRenderDriverClient`, NOT as a vtable[7] entry. Audit-031's "vtable[7] callback registration" inference was wrong. + +## Methodology corrections (CRITICAL) + +1. **Audit-025's claim "audio gate IS the renderer gate" is REVISED.** They are SEPARATE stalls that happen to share the "host pump missing" symptom for audio. The renderer plateau (audit-009 cluster L1 reachability) is INDEPENDENT and remains the actual `swaps>2 / draws>0` gate. + +2. **Seven prior sessions (KRNBUG-018, KRNBUG-KE-001, AUDIT-024A, AUDIT-025, AUDIT-026, AUDIT-030, AUDIT-031) chased an audio gate believing it was the renderer gate.** The fixes that landed (KeResumeThread mirror, XamUserGetSigninState, XNotify listener, etc.) were genuine canary-divergence fixes but did NOT approach `draws > 0`. Future sessions must NOT re-conflate. + +3. **Reading-errors ledger stays at 10** (audit-031's "vtable[7]" was a hypothesis, not a static-analysis claim) but methodology pattern adds an entry: **inferring from a stop-on-instruction tracer that something is invoked from a guest call site is unreliable when the actual invocation is host-side**. + +## Sharp 4-dim cascade prediction for audio host-pump fix (NOT critical-path) + +If implemented (60-120 LOC mirroring canary `apu/audio_system.cc:84-159`): +- A: tid=9 leaves `Blocked(WaitAny [0x828A3254])` on first callback invocation +- B: tid=10 leaves `Blocked(WaitAny [0x828A3230])` on next sema release +- C: `XAudioSubmitRenderDriverFrame` 0→non-zero +- D: `KeReleaseSemaphore` 0→non-zero (closes last canary-only kernel export) + +E (open): tid=10's limit=6 sema = audio-frame queue depth, isolated from renderer. + +**The audio fix does NOT unblock the renderer plateau.** It's correctness/hygiene cleanup that would close the canary-only export gaps. Audio worker spawn pattern: HOST thread, NOT guest-thread injection (which APUBUG-PRODUCER-001's --xaudio-tick attempted and caused a HW-thread hijack). + +## Strategic position reset + +| Surface | Status | +|---|---| +| Static .text/.rdata/.data/.idata | Eliminated (audit-020/021) | +| v00/v40/v80/v90/physical heaps | Eliminated as L1-PC-storage (audit-026/027/029) | +| XNotify queue | Eliminated as steady-state gate (audit-028) | +| Static reachability BFS | Sound (VERIFY-A: 0/12 cluster L1 PCs fire in canary) | +| Mem-watch coverage | Sound (VERIFY-B: 12/12 PowerPC store classes hooked) | +| Linux Debug build as oracle | Faithful at kernel-call level (RECONCILE-A); host-presenter divergence is irrelevant to engine analysis (RECONCILE-B) | +| Audio gate | Identified — host-pump missing in our impl; NOT the renderer gate | +| Renderer plateau | Same gate as audit-009/016/017: cluster `0x82287000-0x82294000` L1 callers unreached. All static analysis exhausted. Next probes need new tooling. | + +## Recommended next move + +Launch the analysis-toolset overhaul planning session. The renderer hunt is blocked on tools we don't yet have — specifically C++/MSVC vtable detection + indirect-dispatch reachability + class-aware probe targeting. The audio fix (KRNBUG-α-007) can land independently as cleanup whenever convenient. + +Trace artifacts at `audit-runs/audit-032-dispatcher-lr/`. Memory file `project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md` (sister-written by the agent). audit-findings.md KRNBUG-AUDIT-032 entry appended. Master HEAD `e061e21` unchanged, tests 605, swaps=2 draws=0 plateau intact. diff --git a/migration/claude-memory/project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md new file mode 100644 index 0000000..1c56c9c --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_032_dispatcher_lr_2026_05_08.md @@ -0,0 +1,105 @@ +# KRNBUG-AUDIT-032 — Audio dispatcher LR capture at thunk 0x824D6640 (2026-05-08, READ-ONLY) + +## Status +- **Bug class**: δ — audit-031's hypothesis falsified. Outcome δ + α composite. +- **Master HEAD**: `e061e21` unchanged. No xenia-rs source modified. No commit. +- **Canary patch**: re-applied audit-030 `--log_lr_on_pc` patch (30 LOC, 4 files). REVERTED at session end. Canary `git status` clean. Diff archived at `audit-runs/audit-032-dispatcher-lr/canary-patch.diff` (76 lines incl. context). +- **Tests**: 605 (unchanged). Lockstep `instructions=100000003` preserved. + +## Method +- Built canary Debug config explicitly (`cmake --build . --config Debug --target xenia_canary`); the multi-config CMake required this — older Debug binary was stale (May 7). +- Single 40-sec capture of canary running sylpheed.iso with `--log_lr_on_pc=0x824D6640 --disable_instruction_infocache=true`. Log size 35,942 lines. +- Verified our impl reachability with `--pc-probe` and `--branch-probe` on `0x824D6640`, `0x824D29F0`, `0x824D2A00`, `0x824D2A20`. + +## Key finding — DISPATCHER IS HOST-SIDE, NOT GUEST + +**7,875 fires** of PC `0x824D6640`, all from a single thread: + +``` +i> 00023890 XThread0100001C (4) Stack: 700D0000-700F0000 +K> 0100001C XThread::Execute thid 4 (handle=0100001C, 'Audio Worker (0100001C)', native=467FC6C0, ) +i> 0100001C TRACE-PC-LR pc=824D6640 lr=BCBCBCBC r3=30063000 r4=00000000 r5=00000000 r6=00000000 r31=00000000 [first call: setup] +i> 0100001C TRACE-PC-LR pc=824D6640 lr=BCBCBCBC r3=30063000 r4=00000001 r5=00001800 r6=BDFBA600 r31=00000000 [×7874: per-frame] +``` + +LR is **always `0xBCBCBCBC`** = canary's host-thread stack-fill canary value (not a valid PPC PC). r3=`0x30063000` (driver-context ptr for index 0), r4=0|1 (init-vs-tick flag), r5=`0x1800` (frame size 6144 bytes = 1536 stereo s16 samples), r6=`0xBDFBA600` (the registered callback_arg). + +The thread is named **"Audio Worker"** with `native=467FC6C0` and is a ``-flagged kernel thread. + +## Mechanism — confirmed via canary source +Canary's `xe::apu::AudioSystem` (host C++) at `src/xenia/apu/audio_system.cc:84-95`: +```cpp +worker_thread_ = kernel::object_ref(new kernel::XHostThread( + kernel_state, 128 * 1024, 0, + [this]() { WorkerThreadMain(); return 0; }, + kernel_state->GetSystemProcess())); +worker_thread_->set_name("Audio Worker"); +worker_thread_->Create(); +``` + +`WorkerThreadMain()` (`audio_system.cc:100-159`): +1. Blocks on `WaitAny(wait_handles_, ...)` — array of per-client semaphores released by the AudioDriver backend each time samples are consumed. +2. On wake, reads `clients_[index].callback` and `wrapped_callback_arg`. +3. Calls `processor_->Execute(worker_thread_->thread_state(), client_callback, args, ...)` — invokes the registered callback **directly via the processor**, bypassing any guest call site. The PPC LR remains the host stack canary `0xBCBCBCBC`. + +The audio worker thread is **created host-side at `RegisterClient` time** and pumps the registered callback as the audio backend consumes samples. + +## Falsifies audit-031 hypothesis +Audit-031 asserted: thunk 0x824D6640 is "registered as vtable[7] callback at sub_824D2C08+0x374" and "sub_824D29F0's body is reached via per-frame guest dispatch". **Falsified**: the registration at sub_824D2C08+0x374 is the `addi r4, r10, 26176` that loads the PC `0x824D6640` into r4 as the **callback_ptr argument to XAudioRegisterRenderDriverClient** (caller's responsibility). Canary's log shows: +``` +d> F8000008 XAudioRegisterRenderDriverClient(701CF210(824D6640), BDFBA658(00000000)) +``` +where `701CF210[0] = 824D6640` is the callback PC and `701CF210[4] = BDFBA600` is the arg. + +The thunk 0x824D6640 → tail-jump to sub_824D29F0 → KeSetEvent(0x828A3254) is **invoked by canary's HOST-side AudioSystem worker thread**, not by guest engine code. + +## Our impl gap (confirmed by probe) +Our `XAudioRegisterRenderDriverClient` at `crates/xenia-kernel/src/exports.rs:2705-2745`: +- Correctly captures `callback_pc=0x824d6640`, `callback_arg=0x41e9dd5c`, allocates wrapped, returns `driver=0x41550000`. +- Trace: `XAudioRegisterRenderDriverClient: index=0 callback=0x824d6640 arg=0x41e9dd5c wrapped=0x4b9f0000 driver=0x41550000`. +- **No "Audio Worker" host thread is spawned** to pump the callback. +- **No semaphore-release loop** mirrors canary's `client_semaphore->Release(queued_frames_, nullptr)` followed by per-sample-consumption releases from the AudioDriver backend. + +Probe results (-n 500M, 50M-instr sanity): +- `--pc-probe=0x824D6640,0x824D29F0,0x824D2A00,0x824D2A20`: **0 CTOR-PROBE fires**. +- `--branch-probe=0x824D6640,0x824D29F0`: **0 BRANCH-PROBE fires**. +- tid=9 (audio-worker tid_9, entry=0x824D2878) parks at `pc=0x824D28D0` waiting on event `0x828A3254` — the wake source that sub_824D29F0 would set. Never woken. +- tid=10 parks at `pc=0x824D2990` waiting on semaphore `0x828A3230` (count=0/limit=6). Never released. + +## Bug class & cascade prediction (sharp) +**Class**: δ-α composite — host-side AudioSystem worker thread missing entirely. Specifically: +1. Our `XAudioRegisterRenderDriverClient` does not spawn a host thread that periodically invokes the registered callback PC. +2. Our `XAudioClient` registry has no audio-backend driver that releases the per-client semaphore on sample consumption. + +**Sharp prediction for fix session**: implement a host-driven audio pump per canary `apu/audio_system.cc:84-159`. Minimal viable: on first `RegisterClient`, spawn a host XHostThread that loops `[release client_sema; sleep(~10ms); guest_processor.execute(callback_pc, [callback_arg])]`. Cascade prediction: +- A: tid=9 leaves `Blocked(WaitAny [0x828A3254])` on the FIRST callback invocation that runs sub_824D29F0:`KeSetEvent(0x828A3254, 1, 0)`. +- B: tid=10 leaves `Blocked(WaitAny [0x828A3230])` on the next semaphore release inside sub_824D29F0. +- C: `XAudioSubmitRenderDriverFrame` count rises from 0 (currently canary-only export when running our impl) — guest's audio worker now feeds frames back. +- D: `KeReleaseSemaphore` non-zero (canary-only export landed). +- E: open question — does this unblock a **non-audio** consumer? Tid=10's parking on a semaphore at limit=6 (canary's `queued_frames_=6` initial-release) suggests NOT — limit=6 is audio-frame queue depth, isolated. So this fix may resolve audio path but not the broader audit-009 renderer cluster. + +The audio gate is NOT the renderer gate (revising audit-025's "audio gate IS the renderer gate" claim). They are **separate stalls** sharing only the symptom of "host pump missing". + +## Outcome classification +Per task brief outcomes: +- **δ confirmed**: audit-031's "registered as vtable[7] callback" inference is wrong; sub_824D29F0 is invoked via host-side `processor_->Execute`, not guest bcctrl through audio_system vtable. +- **α partial**: the "caller PC" we sought to walk up is **canary's host C++**, not guest code. There is no guest LR to walk; the divergence is entirely on the kernel-host boundary at `XAudioRegisterRenderDriverClient`. + +## Cross-validation note (Linux-derived, kernel-call-faithful per RECONCILE-A) +The capture is from PPC instruction execution (LR=0xBCBCBCBC observed at the thunk's first instruction) — kernel-call-level data, reliable per RECONCILE-A. The mechanism (host AudioSystem worker, processor_->Execute) is canary source code, not Linux-specific. + +## Reading-error ledger note +PCs 0x824D6640 and 0x824D29F0 both fall in **gaps in the IDA functions table** (0x824D6640 between sub_824D6570 and sub_824D6650; 0x824D29F0 between sub_824D23B0 and sub_824D2B08). Confirms audit-031's "second prologue inside claimed sub_824D23B0 range" reading-error pattern. No new ledger entry — this is the same boundary cluster. + +## Trace artifacts +- `audit-runs/audit-032-dispatcher-lr/canary-patch.diff` (saved diff before revert) +- `audit-runs/audit-032-dispatcher-lr/probe.{log,err}` (our impl, -n 500M) +- `audit-runs/audit-032-dispatcher-lr/probe-sanity.{log,err}` (our impl, -n 50M sanity) +- `audit-runs/audit-032-dispatcher-lr/branchprobe.{log,err}` (branch-probe verification) +- `/tmp/audit-032-canary.log` (canary capture, 35,942 lines, 7,875 LR fires) + +## Recommended next session +Implement host-side audio worker per canary `apu/audio_system.cc`. Estimated 60-120 LOC. Predicted to **unblock audio path** (tids 9, 10) and add canary-only kernel exports (KeReleaseSemaphore, possibly XAudioSubmitRenderDriverFrame). Won't fix the audit-009 renderer cluster (separate γ-class blocker). Critically: audit-025's strategic-pivot recommendation back to renderer cluster L1 callers REMAINS the priority for engine progression; the audio fix is **necessary cleanup of canary-only exports** but is unlikely to flip the swaps=2 draws=0 plateau alone. + +## Stop conditions honored +- No fix attempted. No xenia-rs source modified. No commit. Canary patch reverted. diff --git a/migration/claude-memory/project_xenia_rs_audit_033_ui_entry_chain_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_033_ui_entry_chain_2026_05_08.md new file mode 100644 index 0000000..3ca17e4 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_033_ui_entry_chain_2026_05_08.md @@ -0,0 +1,96 @@ +# KRNBUG-AUDIT-033 — UI/save-game subsystem entry-chain divergence probe + +- Date: 2026-05-08 +- Mode: READ-ONLY (canary patch landed + reverted; no xenia-rs source change) +- Master: `9028021` unchanged +- Trace dir: `audit-runs/audit-033-ui-entry-chain/` + +## Goal +Identify which PC in the call-chain leading to the UI/save-game +subsystem cluster (`0x82285000-0x82294000`) is the first divergence +between canary and our impl. + +## Method +- Re-applied 30-LOC `--log_lr_on_pc` canary patch (audit-030 diff at + `audit-runs/audit-030-lr-trace/canary-patch.diff`). NOTE: build + must use `ninja -f build-Debug.ninja xenia_canary`, not the default + Checked variant (Checked has runtime code-cache allocation collision + that prevents boot reliably). +- Probed 8 PCs in canary, 50s wall each: + - Tier 1 (cluster externals): `0x8228A628`, `0x8228E138`, `0x8228E498`. + - Tier 2 (their callers): `0x82172524`, `0x82175810`, `0x8217EB78`. + - Tier 3 (CMessageBridge sites): `0x821A6CF0`, `0x821A8578`. +- xenia-rs `--pc-probe` of same 8 PCs at -n 500_000_000. + +## Results +### Canary (50s boot) +- `0x8228E138`: 2 fires, LR=`0x82172BF8` (in `sub_82172BA0`). +- `0x8228E498`: 28 fires, LRs=`0x82451E78` (in `sub_82451E20`), + `0x82174730` (in `sub_821746B0`). +- All 6 other PCs: 0 fires. + +### xenia-rs (-n 500M ≈ 8s guest) +- `0x8228E138`: 1 fire, LR=`0x82172BF8` (same caller as canary). +- `0x8228E498`: 62 fires, LR=`0x82451E78` (same caller as canary). +- All 6 other PCs: 0 fires. + +### Frame chain captured by our impl's CTOR-PROBE on 0x8228E498: +``` +frame=0 lr=0x82451e78 (sub_82451E20) +frame=1 lr=0x824508c4 (sub_82450720) +frame=2 lr=0x8245065c (sub_82450638) +frame=3 lr=0x821cb9a0 (sub_821CB968) +frame=4 lr=0x821cd54c (sub_821CD458) +frame=5 lr=0x821cbf30 (sub_821CBEA8) +frame=6 lr=0x821ceda0/0x821ceebc (sub_821CECF0) +frame=7 lr=0x821c49f8/0x821c504c (sub_821C4988) +``` + +## Reading +- **Both implementations enter the cluster** via 2 of 3 Tier-1 externals + with identical LRs. Audit prompt's hypothesis "canary reaches Tier 1 + but ours doesn't" is FALSIFIED for sub_8228E138/sub_8228E498. +- **Tier 2 + Tier 3 PCs are 0-fires in canary** at 50s boot — i.e. the + cluster's full activation isn't yet triggered even in canary. The + prompt's claim that CMessageBridge sites "were observed firing in + past audits" is correct, but those past audits ran for longer or at + later boot phases. +- **Frequency divergence on 0x8228E498**: ours 62× / 8s guest vs canary + 28× / 50s wall — ours appears to busy-loop the constructor-array + dispatch at sub_82451E20. Either (a) canary breaks out via a state + flag ours never sets, or (b) ours just runs faster through the + same loop and there are actually similar fire counts per guest-time. + Cannot disambiguate without canary's own cycle-counter. +- **Bug class**: γ (deeper-indirection / vtable-driven dispatch). M5 + static reachability is blind here per known limitation; M5.5 (this-flow + vptr resolution) is the prerequisite. + +## Falsifications (this session) +- Tier 1 PC `0x8228A628`: 0 fires in canary even at 50s — not a live + entry point in this boot phase. +- Tier 2 + Tier 3 callers (`0x82172524`, `0x82175810`, `0x8217EB78`, + `0x821A6CF0`, `0x821A8578`): all 0 fires in canary. + +## Discipline gate +| Box | Status | +|-----|--------| +| 1 — both-side probe data | PASS | +| 2 — canary fires Tier 1 | PARTIAL (2 of 3) | +| 3 — cross-impl LR mirror | PASS (LRs match exactly) | +| 4 — bug class assigned | γ — does not gate to fix | +| 5 — no fix this session | PASS | + +## Recommendation +- **Primary**: schedule M5.5 (this-flow vptr resolution) as next analyzer + milestone. Without it, top-down probing inside the cluster is blind. +- **Alternative A**: probe the frame-chain inside sub_82451E20 → walk + upstream to find the loop-exit gate (62 vs 28 fires). Probe + `0x82450720`, `0x82450638`, `0x821CB968` next. +- **Alternative B**: longer-horizon canary trace via Lutris Windows + build (5-10 min) to capture post-intro cluster activation. The 50s + Linux Debug envelope is too short for "press-A" boundary. + +## Cleanup +- Canary patch reverted; `cd xenia-canary && git status` clean. +- xenia-rs master HEAD `9028021` unchanged. +- No commit, no source modification. diff --git a/migration/claude-memory/project_xenia_rs_audit_034_frame_chain_divergence_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_034_frame_chain_divergence_2026_05_08.md new file mode 100644 index 0000000..f400ef8 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_034_frame_chain_divergence_2026_05_08.md @@ -0,0 +1,216 @@ +# KRNBUG-AUDIT-034 — frame-chain divergence + post-intro Tier 2/3 horizon + +- Date: 2026-05-09 +- Mode: READ-ONLY (canary patch landed + reverted; no xenia-rs source change) +- Master: `9028021` unchanged; tests 640; lockstep instructions=100000003 +- Trace dir: `audit-runs/audit-034-frame-chain/` +- Subsystem: front-end UI / save-game / mission-select / HUD (per RAPID-SURVEY-Q4); NOT renderer + +## Phase A — Frame-chain divergence probe (50s canary, -n 500M ours) + +### Canary patch +Re-applied audit-030 30-LOC `--log_lr_on_pc` patch (4 files: x64_emitter.cc, +cpu_flags.cc, cpu_flags.h, ppc_hir_builder.cc). Linux Debug build via +`ninja -f build-Debug.ninja xenia_canary`. Patch reverted at session close; +canary `git status` clean. + +### Firing-rate matrix (raw counts, 50s canary wall vs ours -n 500M = 24.05s wall) + +| Level | PC | canary 50s | ours -n 500M (24s wall) | +|-------|----|-----------:|------------------------:| +| L0 (chain top) | sub_821C4988 | 1 | 1 | +| L1 | sub_821CECF0 | 2 | 2 | +| L2 | sub_821CBEA8 | 7 | 7 | +| L3 | sub_821CD458 | 7 | 7 | +| L4 | sub_821CB968 | 14 | 14 | +| L5 | sub_82450638 | 14 | 14 | +| L6 | sub_82450720 | 24 | 16 | +| L7 | sub_82451E20 | 90 | 80 | + +**Caveat — wall-time normalization**: ours wall_ms=24049 for 500M instructions +(per `exec complete` line in ours.err); canary 50s wall does an unknown number +of guest instructions (Debug build is ~5-10× slower per wall-second). Per-wall +divergences are distorted by this slowdown. **Per-call ratios (downstream/upstream) +ARE timing-independent** and that's where the real signal lives. + +### Per-call ratios (timing-independent) + +| Edge (caller→callee) | canary | ours | +|----------------------|-------:|-----:| +| sub_821C4988 → sub_821CECF0 | 2.00 | 2.00 | +| sub_821CECF0 → sub_821CBEA8 | 3.50 | 3.50 | +| sub_821CBEA8 → sub_821CD458 | 1.00 | 1.00 | +| sub_821CD458 → sub_821CB968 | 2.00 | 2.00 | +| sub_821CB968 → sub_82450638 | 1.00 | 1.00 | +| sub_82450638 → sub_82450720 | 24/14=**1.71** | 16/14=**1.14** | +| sub_82450720 → sub_82451E20 | 90/24=**3.75** | 80/16=**5.00** | + +(Single-LR sites everywhere; chain identity verified end-to-end.) + +### Reading +- **L0..L5 chain shape is IDENTICAL** between canary and ours + (single-LR per site, identical caller-to-callee multipliers). This + falsifies any "chain-shape divergence" hypothesis — both implementations + walk the exact same path from sub_821C4988 down to sub_82450638. +- **L5 → L6 is the first divergence point** (sub_82450638 → sub_82450720 + multiplier): canary 1.71, ours 1.14. sub_82450638 has TWO call sites of + sub_82450720 (LR=0x8245065c first, 0x824506cc second). Ours probe shows + 14×LR=0x8245065c + 2×LR=0x824506cc = 16. Canary likely 14+10. **Canary + takes the second call path more often** — that path runs after the first + call returns SUCCESS and validates a state condition. Ours typically + fails the gate that triggers the second call. +- **L6 → L7 is the second divergence point** (sub_82450720's 5-iter inner + loop): canary 3.75 avg iters, ours 5.00 (always exhausts). The early-exit + predicate at PC 0x82450904 (`bne 0x8245092C`) is FAILING TO FIRE in ours. + +### Loop-exit-divergence PC (Phase A4) +**The diverging loop is sub_82450720+0x160..+0x1F4 (PC 0x82450880..0x82450914).** + +Loop body: +``` +0x82450880: li r25, 0 # iter counter +0x82450890: lwz r11, 0(r30) # load slot[iter] hi -- r30=r26+108+iter*20 +0x824508a0: addi r4, r31, 104 # build cur-key local +0x824508a8: add r9, r9, r11 # cur_sum = lo+hi +0x824508c0: bl sub_82451E20 # find/replace +0x824508c4: lwz r11, 4(r30); lwz r10, 0(r30) +0x824508d8: add r11, r11+r10 # expected sum +0x824508dc: cmpl r29 == r30-12 ? # check sub_82451E20 wrote correct cur-key ptr +0x824508e4: cmpl r28 == sum # check sub_82451E20 wrote correct sum +0x824508f0..0x82450900: bit-extract via cntlzw to {0,1} +0x82450904: bne 0x8245092C # FAILURE -> exit (insert path) +0x82450908: addi r25, 1 # SUCCESS -> next iter +0x8245090c: addi r30, r30, 20 # advance slot ptr +0x82450910: cmpli r25, 5 +0x82450914: blt 0x82450890 # loop while r25<5 +``` + +**The exit predicate at 0x82450904 is `bne` on the result of two equality +checks against sub_82451E20's output (r29=[r31+112], r28=[r31+116])**: +- `[sub_82451E20_out+0] == r30-12` (= r26+96+iter*20, the cur-slot stem) +- `[sub_82451E20_out+4] == [r30+0]+[r30+4]` (the slot's lo+hi sum) + +**Sub-predicate**: sub_82451E20's exit-loop path picks (r27, r30_local): +- inner loop at 0x82451e60..0x82451eb0 exits via 0x82451e90 (`bne` after + `cntlzw`/`extrwi` of `r28 - [sub_8228E498(working_key)+0][32]`) +- exit when `r28 == [r3+0][32]` (via Tier-1 cluster member sub_8228E498) +- writes (r27, accumulated-r30) back to caller's r3 = r31+112 + +**The exit predicate's data source is the per-iter slot at r26+108+iter*20** +(r26 = sub_82450720 arg1, an "object" pointer; offset 108..207 = a 5×20-byte +table). Whether the predicate fires depends on whether the slot's +(stem, sum) pair matches what sub_82451E20 produces — which itself depends +on the value at `[sub_8228E498(slot_key)+0][32]` (cluster sub-table data). + +### Diagnostic asymmetry — slot-position arithmetic + +**sub_82450720 return semantics**: +- early-exit via 0x82450904 → return r3=1 (success) +- exhaust 5 iters via 0x82450914 fall-through → return r3=0 (failure) + +**sub_82450638 cascade**: +- first call (LR=0x8245065c): if returned 1 → skip second call, return 1 directly +- if returned 0 → invoke second call (LR=0x824506cc), return its result + +**LR distribution observations**: +- canary 0x82450720: 14×first + 10×second = 24 (10 first-calls returned 0) +- ours 0x82450720: 14×first + 2×second = 16 (2 first-calls returned 0) + +**Per-call iteration math**: +- canary: 14 first-calls produce ~4 returning 1 + ~10 returning 0; iters + = 4×(k_avg+1) + 10×5 = 90 - 50 = 40 / 4 = avg k+1 = 2.86 → + **canary's early-exit fires at avg iter k≈1.86** (slot 1-2 of 5). +- ours: 14 first + 2 second = 16; 12 first-calls return 1, 4 return 0; + iters = 12×(k+1) + 4×5 = 80 → 12(k+1) = 60 → **k_avg=4 in ours + (last iter of 5)**. + +**Key inversion**: ours's 12 successful early-exits all fire at the LAST +iter (index 4) — the 5th slot of 5. canary's 4 successful early-exits +fire at iter 1-2. **The matching slot is at a DIFFERENT POSITION** in +the 5-element table between canary and ours, OR the slot population +order is reversed, OR the search-key hashes to a different slot. + +**Bug class**: β-class data-state divergence at slot table r26+108..207 +(20-byte stride × 5 slots). The 5-loop always finds a match in BOTH +implementations (predicate is satisfiable both ways), but at very +different positions. This indicates the table CONTENTS differ even +though the lookup logic is identical. + +## Phase B — post-intro Tier 2 + Tier 3 horizon (300s canary) + +### Probe set +- Tier 2 callers: 0x82172524, 0x82175810, 0x8217EB78 +- Tier 3 CMessageBridge: 0x821A6CF0, 0x821A8578 + +### Result — ALL 5 = 0 FIRES AT 300s + +| PC | tier | canary 300s wall | +|----|------|----:| +| 0x82172524 | T2 | 0 | +| 0x82175810 | T2 | 0 | +| 0x8217EB78 | T2 | 0 | +| 0x821A6CF0 | T3 | 0 | +| 0x821A8578 | T3 | 0 | + +### Reading +- **Cluster activation is gated even deeper** than this 5-min Linux Debug + canary trace can reach. RECONCILE-A confirms Linux Debug canary trajectory + is identical to Lutris Windows up to frame 42 (vs 72/186 on Lutris); the + 300s probe windows we see correspond to early-boot pre-intro behavior + only. +- Audit-033's framing stands: cluster activation is NOT triggered by the + intro→main-menu transition that canary's Linux Debug build reaches. May + need either: + - A Lutris Windows build trace (gets further into intro/menu) for these + PCs, OR + - Probing UPSTREAM of these PCs to find what would call them, OR + - A non-time-based trigger (e.g., specific kernel API or memory state). + +## Bug class + +**β-class (data-state divergence) with γ-deep entry**: +- Entry to chain (sub_821C4988) is via vtable / indirect dispatch (0 static + call xrefs — γ-deep). +- 6.3× upstream divergence in entry frequency from sub_821C4988 onwards. +- 5-loop early-exit predicate in sub_82450720 reads slot-table data at + r26+108..207 and never matches in ours — full 5-iter scan vs canary's + ~3.75-iter avg. +- The Tier-1 cluster external sub_8228E498 is invoked from sub_82451E20's + inner loop: it returns `(table[hash])+offset`, the table index/value + used for the predicate. + +## Sharp next-session prediction + +The exit predicate's data source is **the 5×20-byte slot table at r26+108** +where r26 = sub_82450720 arg1 = sub_82450638 arg1 = sub_821CB968 arg1 (a +container struct). The slots' (stem, sum) values must match sub_82451E20's +output to trigger early exit. + +**AUDIT-035 candidate**: `--mem-watch r26+108..207` for one captured r26 +value (capture via `--pc-probe=0x82450720` extended to log r4) to see what +canary writes there during boot vs ours. If canary writes non-zero +slot data and ours has zeros, AUDIT-035 names the writer. + +Alternative: probe sub_8228E498 directly (already done in audit-033) — its +output `[r3+0][32]` value is what sub_82451E20 compares against. + +## Discipline gate + +- 1 milestone declared: AUDIT-034. ✅ +- 1 outcome captured: β-data-divergence at sub_82450720's 5-loop. ✅ +- 1 sharp next-session prediction: mem-watch r26+108..207. ✅ +- ≤2 hours wall: ~17 min Phase A + ~25 min Phase B. ✅ +- No xenia-rs source mods, no xenia-rs commit. ✅ + +## Files + +- `audit-runs/audit-034-frame-chain/canary-0x*.log` — 8 canary 50s logs + 1 + 300s log (preserved as `.300s.log`) + 5 Phase B 300s logs +- `audit-runs/audit-034-frame-chain/ours.log` — single ours -n 500M probe + with 8 PCs +- `audit-runs/audit-034-frame-chain/scripts/probe-canary*.sh` — driver + +## Master + +xenia-rs HEAD `9028021` unchanged. Tests 640. Lockstep instructions=100000003. +Canary patch reverted; canary `git status` clean. diff --git a/migration/claude-memory/project_xenia_rs_audit_035_slot_table_2026_05_08.md b/migration/claude-memory/project_xenia_rs_audit_035_slot_table_2026_05_08.md new file mode 100644 index 0000000..b976176 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_035_slot_table_2026_05_08.md @@ -0,0 +1,111 @@ +# KRNBUG-AUDIT-035 — slot table byte-level diff at sub_82450720 entry + +- Date: 2026-05-09 +- Mode: READ-ONLY (canary patch landed + reverted; no xenia-rs source change) +- Master: `9028021` unchanged; tests 640; lockstep instructions=100000003 +- Trace dir: `audit-runs/audit-035-slot-table/` +- Subsystem: front-end UI / save-game / mission-select / HUD (per RAPID-SURVEY-Q4) + +## Disasm verification — slot table at r26+108 (5×20=100 bytes) + +Confirmed via sylpheed.db query of PC 0x82450720..0x82450918: +- 0x8245088c `addi r30, r26, 108` — slot pointer init +- 0x82450890 `lwz r11, 0(r30)` — load [slot+0] +- 0x82450898 `lwz r9, 4(r30)` — load [slot+4] +- 0x8245090c `addi r30, r30, 20` — advance by 20 bytes +- 0x82450910 `cmpli r25, 0x5` / `blt 0x82450890` — bounded by 5 iters + +Audit-034's offset (108) and stride (20) confirmed. + +## Canary patch — extension to log r26 + 100-byte dump + +Re-applied audit-030 30-LOC `--log_lr_on_pc` patch + extended TrapLogLR to also log r26 and dump 5×20-byte slot table from r3+108 (r3 == r26 after the function's `mr r26, r3` prologue, which has not yet run at PC 0x82450720). Total +49 LOC across 4 files (well under the 80-LOC budget). Build: `ninja -f build-Debug.ninja xenia_canary` succeeded. Patch reverted at session close; canary `git status` clean. + +## Captured r26 / r3 / slot table + +Both runtimes: +- r3 (= r26 after prologue) = **0x828F3B68** at sub_82450720 entry +- slot table base = r3 + 108 = **0x828F3BD4..0x828F3C37** (100 bytes, 5 slots × 20) +- 22 entries captured in canary (~30s wall); ours dump captured at 50M and 500M instructions (identical) + +## Final-state slot tables (5 dwords per slot, big-endian) + +| Slot | addr | Canary (last entry) | Ours (-n 500M) | +|------|------|---------------------|----------------| +| 0 | 0x828F3BD4 | `00000000 00000000 00000000 00000000 00000000` | `00000000 00000000 00000000 00000000 00000000` | +| 1 | 0x828F3BE8 | `00000000 00000000 00000000 BC3654C0 00000008` | `00000000 00000000 00000000 4024A240 00000008` | +| 2 | 0x828F3BFC | `00000000 00000000 00000000 BC366080 00000008` | `00000000 00000000 00000000 4024AEE0 00000008` | +| 3 | 0x828F3C10 | `00000002 00000005 00000000 00000000 00000000` | `00000000 00000000 00000000 00000000 00000000` | +| 4 | 0x828F3C24 | `00000000 00000000 00000000 BC365520 00000008` | `00000000 00000000 00000000 4024A300 00000008` | + +## Byte-level diff + +**Match**: slot-shape identical in 4 of 5 slots (slot 0 zero, slots 1/2/4 each hold pointer+size 8). Pointers diverge by heap region: canary `BC3xxxxx` (physical heap), ours `4024xxxx` (v40 bump heap) — same heap-region divergence noted in audit-027/029. The pointed-to objects in ours are vtable-headed (first dword 0x40111860 / 0x401118A0) — valid v40 objects, same shape as canary's physical objects. + +**Divergence — slot 3 only**: +- canary `[+0]=0x00000002, [+4]=0x00000005` (counter pair) +- ours `[+0]=0, [+4]=0` + +Slot 3 evolves over time in canary: `(0,0) → (0,1) → (0,3) → (0,4) → (1,4) → (1,5) → (2,5)`. [+0] = monotonic push-count, [+4] = current size. **Not a bytes-missing problem**; the slots ARE managed identically — but the queue's state at any given probe instant differs. + +## Writer identification (mem-watch on 0x828F3C10..0x828F3C20 in ours) + +1066 writes captured. Writer PCs: +- `0x82450c08, 0x82450c40, 0x82450c4c, 0x82450c3c` — within `sub_82450bc4` (the L5 sub_82450638 inner caller, push/pop on slot 3 counters) +- `0x822f8b20` — counter increment, LR back to `0x822f8a78` / `0x822f8b00` +- `0x82323364` — index update, LR `0x82323344` / `0x823232c8` +- `0x8231eee8` — initialization (one-shot at boot) + +Ours's slot 3 [+4] cycles: 0→1→2→3→...→0xB→...→0 (push and pop). 165 occurrences of value 1, 160 of value 2, 158 of value 0xA, 109 of 0xB — much higher peak than canary's 5. **Ours is push-popping items at higher rate**; canary's queue accumulates more slowly. + +## Reading + +The slot table populates IDENTICALLY in shape across both runtimes. The `(stem, sum)` predicate at PC 0x82450904 (`bne 0x8245092C`) compares sub_82451E20's output against `(slot_addr-12, [slot+0]+[slot+4])`. Because: +- Canary slot 1's `(stem=0x828F3BDC, sum=0xBC3654C8)` matches early when sub_82451E20 returns the right pair → early-exit at iter 1 +- Ours slot 1's `(stem=0x828F3BDC, sum=0x4024A248)` cannot match sub_82451E20's output if sub_82451E20's hash/lookup walks via `[sub_8228E498(working_key)+0][32]` and that table is populated with **physical-heap pointers** (canary side) vs **v40 pointers** (our side) + +The stem (slot_addr - 12) is identical; the sum is heap-region-dependent. **The lookup table queried by sub_82451E20 + sub_8228E498 contains canary-style physical-heap pointers**, but ours's slot table contains v40 pointers. So the cross-table comparison fails on EVERY iter for ours, until the FALLBACK slot 4 (which seems to be a self-referential default) matches. + +This connects directly to **audit-027/029's finding**: our impl's mm_allocate_physical_memory_ex folds into the v40 bump allocator instead of allocating in a separate physical heap. The objects in physical heap on canary live at `BC3xxxxx`; ours at `4024xxxx`. The slot table records the address the producer wrote, but the lookup table sub_82451E20 walks (via sub_8228E498's vptr[32] table) is populated with canary-physical addresses on canary, v40 addresses on ours. The hash/index from those tables is then compared to slot data populated from a different allocator. + +**Bug class: ε — heap-region-mismatch propagating through dual-data-structure consistency check.** Canary's physical-heap allocator places these specific objects at predictable addresses that match the lookup table's entries; ours's v40 fold-down places them at different addresses, causing per-element inconsistency. + +## Sharp 4-dim cascade prediction + +A: implement physical-heap separation in nt_allocate_virtual_memory / mm_allocate_physical_memory_ex (per CPPBUG-AUDIT-001 backlog). The targets at 0xBC3xxxxx range need to be allocated in a separate region. +B: with physical heap separated, sub_8228E498's vptr-table contains 0xBC3xxxxx pointers AND slot-table writers at sub_82450bc4 push 0xBC3xxxxx pointers — same heap region. +C: predicate at 0x82450904 matches at iter 1-2 → early-exit, sub_82450720 returns r3=1. +D: sub_82450638's second call (LR=0x824506cc) frequency normalizes to ~10× per L5 entry (canary's rate) → frame-chain divergence closes; cluster activation MAY clear (`draws > 0` cascade UNKNOWN until B-C observed). + +**Risk**: ε might be only ONE of multiple heap-region divergences across the cluster. Audit-027/029 already eliminated v80 + v40 + physical heaps as the dispatch-table source for the renderer plateau, but slot/vtable cross-references may still pin to specific heaps. + +## Falsification + +Audit-034's hypothesis of "different positions in the 5-slot table" — falsified. The matching slots are at the SAME indices (1, 2, 4 are populated identically in shape). The mismatch is in the VALUE at the slot, not its position. + +## Discipline gate + +- 1 milestone declared: AUDIT-035. PASS +- 1 outcome captured: ε-class heap-region mismatch at slot data + lookup table. PASS +- 1 sharp next-session prediction: A (implement physical-heap separation) → B → C → D. PASS +- ≤2 hours wall: ~30 min. PASS +- No xenia-rs source mods, no xenia-rs commit. PASS +- Canary patch reverted, git status clean. PASS + +## Files + +- `audit-runs/audit-035-slot-table/canary-0x82450720.log` — initial trace (r26 logged from caller, table addr offset) +- `audit-runs/audit-035-slot-table/canary-0x82450720-fix.log` — corrected trace using r3+108 (132 lines, 22 entries) +- `audit-runs/audit-035-slot-table/ours-lrtrace.jsonl` — ours's lr-trace (16 entries, r3=0x828F3B68 confirmed) +- `audit-runs/audit-035-slot-table/ours-dump-stdout.log` — ours --dump-addr output (slot table at end-of-run) +- `audit-runs/audit-035-slot-table/ours-memwatch-slot3.log` — 1066 writers to slot 3 (PC + LR + value) + +## Master + +xenia-rs HEAD `9028021` unchanged. Tests 640. Lockstep instructions=100000003. Canary `git status` clean. + +## Recommended AUDIT-036 directions + +1. **Land physical-heap separation** (CPPBUG-AUDIT-001 nt_allocate_virtual_memory, mm_allocate_physical_memory_ex). Test against lockstep digest. Probe: re-run AUDIT-035 trace with separated heaps; expect slot 1/2/4 pointers to land at 0xBC3xxxxx, matching canary; predicate to early-exit at iter 1-2; L5→L6 multiplier normalize from 1.14→1.71. +2. **Or, walk further upstream**: identify the producer that POPULATES the lookup table sub_8228E498 reads from. If that producer reads from a DIFFERENT allocator state than the slot-writer (sub_82450bc4), the bug may live there. +3. **Cross-validate**: probe sub_8228E498 (the Tier-1 cluster external) in BOTH runtimes — its r3 (input working_key) and `[r3+0][32]` (returned table value) tell whether the lookup table itself diverges. If ours's table value is `0x4024xxxx` and canary's is `0xBC3xxxxx`, that confirms the heap-region cross-reference hypothesis. diff --git a/migration/claude-memory/project_xenia_rs_audit_036_vptr_deref_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_036_vptr_deref_2026_05_09.md new file mode 100644 index 0000000..c9e3cfa --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_036_vptr_deref_2026_05_09.md @@ -0,0 +1,135 @@ +# KRNBUG-AUDIT-036 — direct hypothesis test of `[[r3+0]+32]` predicate at sub_8228E498 + +- Date: 2026-05-09 +- Mode: READ-ONLY (canary patch landed + reverted; no xenia-rs source change) +- Master: `9028021936e7bddada0c911acc7b61f04fee3b9d` unchanged; tests 640; lockstep instructions=100000003 +- Trace dir: `audit-runs/audit-036-vptr-deref/` +- Subsystem: front-end UI / save-game / mission-select / HUD (audit-009 cluster, RAPID-SURVEY-Q4) +- Verdict: **REFUTED-AS-STATED, but a STRONGER divergence found** + +## Disasm verification — what `[[r3+0]+32]` actually is + +`sub_8228E498` is NOT a vtable[8] dispatcher. Disasm at PC 0x8228E498..0x8228E4CC: +``` +8228E498 lwz r9, 0(r3) ; r9 = [r3+0] = header* (NOT vptr) +8228E49C lwz r10, 4(r3) ; r10 = [r3+4] = packed (chunk_idx, sub_idx) +8228E4A0 srwi r11, r10, 2 ; r11 = chunk_idx +8228E4A4 clrlwi r10, r10, 30 ; r10 = sub_idx (low 2 bits) +8228E4A8 lwz r8, 8(r9) ; r8 = [header+8] = chunk_count +8228E4AC..B4 ; chunk_idx = chunk_idx % chunk_count +8228E4B8 lwz r9, 4(r9) ; r9 = [header+4] = segment_table_base +8228E4BC..C0 ; r11 *= 4, r10 *= 4 +8228E4C4 lwzx r11, r9, r11 ; r11 = segment_table[chunk_idx] +8228E4C8 add r3, r11, r10 ; r3 = chunk_ptr + sub_offset +8228E4CC blr ; return r3 = element_address +``` +This is a **deque/segmented-array iterator dereference** — returns the address of an element. Zero call-xrefs to vtables. + +Caller `sub_82451E20` at LR PC `0x82451E78`: +``` +82451E70 addi r3, r1, 80 ; r3 = stack-resident iterator (pair of begin/end) +82451E74 bl 0x8228E498 ; r3 = element_ptr +82451E78 lwz r11, 0(r3) ; r11 = [element_ptr+0] <-- THE VALUE OF INTEREST +82451E7C lwz r11, 32(r11) ; r11 = [[element_ptr+0]+32] +82451E80 sub r11, r28, r11 ; r28 = caller's r6 (3rd arg, search-key); compare +82451E84..8C ; cntlzw + extrwi → "is_zero" predicate +82451E90 bne cr6, exit ; exit loop on match +``` +So the audited expression `[[r3+0]+32]` is read AFTER sub_8228E498 returns, on its return value (r3 = element_ptr). The predicate is `r28 == [[element_ptr+0]+32]`. + +## Canary patch — 49 LOC, reverted at session close + +Re-applied audit-030 base patch (30 LOC). Extended `TrapLogLR` in x64_emitter.cc to additionally log: r3, r28, then `[r3+0]` (= "key"), then 64 bytes at the key (16 u32 lanes + ASCII). Total ≈49 LOC across 4 files. Build via `ninja -f build-Debug.ninja xenia_canary`. Reverted via `git checkout -- src/`; canary `git status` clean post-session. + +Probed PC `0x82451E78` (the LR return point) for ~30s; sub_82451E20 fires 36 times (108 TRACE-PC- lines = 1 LR + 1 EL + 1 KEY + 1 KEY-ASCII × ~36). Single thread `F8000098`. + +## Captured values — direct hypothesis test + +### Canary (LR=0x82451E78, ~36 fires) + +Element pointers (r3) returned by sub_8228E498: +`0xBC22CA20`, `0xBC22CA24` — physical heap, contiguous. Stride 4 bytes. + +`[r3+0]` (= key pointer) — 6 distinct values: +`0xBC65D018, 0xBC65D140, 0xBC65D1C0, 0xBC65D240, 0xBC65D340, 0xBC65D400, 0xBC65D540` — all in physical heap `0xBC65xxxx`. + +Key struct layout (16 u32 lanes from one fire, key=`0xBC65D1C0`): +``` +F80000B8 00000000 00000000 00000003 00000000 00000000 00000000 00000000 +BC65D018 BC65D140 00000000 BC65D034 00000000 00000000 00000001 00000000 +^idx0=hdr-handle ^idx8 = [+32] = HYPOTHESIS TARGET +``` +ASCII: `'.................................e...e.@.....e.4................'` (only `'e'`=0x65 from 0xBC65 pointers visible — pure pointer-bearing struct, no inline string). + +**Canary's `[[r3+0]+32]` = 0xBC65D018 / 0xBC65D2D8 / 0xBC65CFD8 / 0xBC65D118 / 0xBC65D198 / 0xBC65D398 — all phys-heap pointers, range 0xBC65xxxx.** + +### Ours (PC=0x8228E498 + dump-addr at returned r3, ~62 fires) + +Element pointers (r3) returned by sub_8228E498: +`0x401119B0, 0x401119B4, 0x401119B8, 0x401119BC` — v40 bump heap, stride 4 bytes (same shape as canary). + +`[r3+0]` (= key pointer) — values `0x40542300, 0x40542340, 0x40542400, 0x405424C0` (and others) — all in v40 bump heap `0x4054xxxx`. + +Key struct layout (64 bytes at 0x40542300, big-endian): +``` ++0x00: 67 61 6d 65 3a 5c 68 69 64 64 65 6e 5c 52 65 73 "game:\hidden\Res" ++0x10: 6f 75 72 63 65 33 44 5c 43 6f 6d 6d 6f 6e 2e 78 "ource3D\Common.x" ++0x20: 70 72 00 5c 93 9a 9d cc 69 d8 e4 5c 97 3a 5c 0a "pr.\....i..\.:\." ++0x30: aa b2 16 c3 5e e7 0e 0a 69 d8 e4 5c c2 95 ea d8 ....^...i..\.... +``` +The "key" record at 0x40542300 is a struct beginning with the **inline filename string** `"game:\hidden\Resource3D\Common.xpr\0"` followed by 8-byte timestamps and other fields. + +**Ours's `[[r3+0]+32]` = 0x7072005C** (the four bytes `"pr\0\\"` from mid-string at offset 0x20). + +### Side-by-side + +| Quantity | Canary | Ours | +|----------|--------|------| +| element_ptr (r3 returned) | `0xBC22CA20+` (physical heap) | `0x401119B0+` (v40 bump heap) | +| `[r3+0]` (key ptr) | `0xBC65D1C0` (phys-heap struct) | `0x40542300` (v40 inline-string struct) | +| `[[r3+0]+32]` ★ | `0xBC65D018` (phys-heap pointer) | `0x7072005C` (mid-string text "pr\0\\") | +| r28 (search key) | `0x7064FB18` (stack ptr) | `0x715978D0` (stack ptr) | +| match? | possible (same address space) | impossible (text never == stack ptr) | + +## Verdict — hypothesis REFUTED-AS-STATED, deeper divergence found + +**As-stated**: audit-035's hypothesis "ours's `[[r3+0]+32]` is a `0x4024xxxx` pointer; canary's is `0xBC3xxxxx`" is REFUTED. Neither value is a heap-region pointer; canary's is `0xBC65xxxx` (physical heap, but a different sub-range from audit-035's slot pointers `0xBC36xxxx`), and ours's is `0x7072005C` (literal filename text bytes). + +**Stronger finding**: the records held by the container differ in *layout*, not just heap region. Canary's record at `[r3+0]` is a 16-dword pointer-bearing struct (handle at +0, sub-pointers at +32/+36/+44). Ours's record at `[r3+0]` is a struct that begins with a 33-byte inline filename string, so offset 32 falls inside the string text. The predicate `r28 == [[r3+0]+32]` therefore COMPARES STACK POINTERS against TEXT BYTES in ours — a comparison that can never succeed regardless of what r28 holds. + +This is **bug class η — record-layout divergence (NEW class beyond audit-035's "heap region" axis)**. The container itself (the deque walked by sub_8228E498) is shaped identically in both runtimes — same stride, same iterator semantics. The difference is in the *populator* that filled each record: canary's writes a pointer-table struct; ours writes an inline-string struct. + +Audit-035's "heap-region cross-reference hypothesis" — the slot-table predicate fails because the heap regions differ — is **partially confirmed at the surface** (slot pointers DO differ by region: BC vs 4024) but **the actual predicate failure mechanism is different**. The match fails not because pointers can't cross-reference between heap regions (they can — both addresses are valid in their respective spaces) but because **the structs at those addresses encode different fields at offset 32**. + +## Recommendation — DO NOT proceed with physical-heap separation fix + +Audit-037 should NOT be the physical-heap split (CPPBUG-AUDIT-001). That fix was motivated by audit-035's heap-region narrative. Today's data shows the actual divergence is in the *content/layout* of records, so even after splitting heaps, ours's records would still hold inline strings while canary's hold pointer structs — predicate would still fail. + +**Recommended audit-037**: identify the record populator(s) that build the container at element-pointers `0x401119B0+` (ours) vs `0xBC22CA20+` (canary). The populator writes the struct at `[r3+0]`. Likely path: mem-watch on a representative ours record (e.g. `0x40542300+0x20`) to find the writer PC and LR, then disasm the writer's caller chain and compare to canary's equivalent record-construction site. The two populators should diverge at a static-init or resource-loader function — that divergence is the audit-037 bug, and likely a much smaller fix than physical-heap separation. + +Sharp 4-dim cascade prediction (post-fix): +- A: ours's `[0x40542300+0x20]` = a phys-style pointer (matches canary's record-shape) +- B: predicate `r28 == [[r3+0]+32]` matches at least once during boot +- C: sub_82451E20 inner loop exits via `bne cr6, 0x82451EB4` (taken), not via `cmplwi r11, 0x0 / beq` end-of-iteration +- D: cluster `0x82285000-0x82294000` external-entry probes (audit-033) show new fires — start of full activation + +## Tooling status + +- `--pc-probe` / `--branch-probe` work at function-entry PCs but NOT at mid-flow `lwz`/`addi` etc — those JIT instructions don't get instrumented. Used `--dump-addr` post-hoc to read returned-r3 records (stable v40-heap, not transient stack). +- Canary patch (49 LOC) successfully extended TrapLogLR to read GUEST MEMORY via `memory->TranslateVirtual*>(...)`. Pattern reusable for future canary-side struct probes. +- All probe scripts read-only; no source changes; no commits. xenia-rs HEAD `9028021` unchanged at session close. + +## Files + +- `audit-runs/audit-036-vptr-deref/canary.log` — initial 30s canary at PC=0x8228E498 (segment-array deque dump) +- `audit-runs/audit-036-vptr-deref/canary-callsite.log` — extended canary at PC=0x82451E78 (key + 64-byte struct dump) +- `audit-runs/audit-036-vptr-deref/ours.log` / `ours-dump.log` — pc-probe + dump-addr at stack r3 +- `audit-runs/audit-036-vptr-deref/ours-exit.log` — branch-probe at 0x82451E78 (returned r3 = 0x401119B0+) +- `audit-runs/audit-036-vptr-deref/ours-final.log` — dump-addr at element pointers + their key targets + +Discipline gate (5/5): +1. Hypothesis explicitly tested with sharp pre-prediction: PASS +2. Canary patch reverted at session close, `git status` clean: PASS +3. xenia-rs source unmodified, no commit: PASS +4. Single-step (validation only, no fix attempt): PASS +5. Trace files saved per audit dir convention: PASS diff --git a/migration/claude-memory/project_xenia_rs_audit_039_track_1_verify_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_039_track_1_verify_2026_05_09.md new file mode 100644 index 0000000..fe945e2 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_039_track_1_verify_2026_05_09.md @@ -0,0 +1,94 @@ +# AUDIT-039 Track 1: Cache-Fix Record-Layout Verification (2026-05-09) + +**Status**: READ-ONLY — pure diagnostic verification of cascade dimension A from audit-038. No source mods, no commit. +**Master HEAD**: `d8766c6` (post audit-038 cache fix). Tests 645. Lockstep instructions=100000004 deterministic ×3. +**Predecessor**: audit-037 (record populator, pre-fix) + audit-038 (cache fix). +**Sister session**: Track 2 — extended-horizon canary trace for cluster activation (parallel, untouched). + +## Question + +Did audit-038's cache fix flip the record layout at v40 0x40542xxx from **inline-string** (pre-fix shape, captured audit-037) to **canary-shape pointer-bearing** (handle@+0=0xF80000B8, sub-pointers@+32/+36/+44)? + +## Method + +1. Probe `sub_8228E498` (deque iterator deref) at -n 500M to capture live record-base addresses. **Result: 0 fires** — silenced by audit-038's cache fix (sub_8228E498 is downstream of the cache-miss path that is now fully short-circuited). +2. Fallback: dump audit-037 record bases directly via `--dump-addr=0x40542300,0x40542340,0x40542400,0x405424C0`. Plus extended-range dump 0x40542100,0x40542200,0x40542500,0x40542600,0x40542700,0x40542800. +3. Cross-reference canary record shape from audit-037's canary probe of `0x82450b68` (records on canary heap at `BC65xxxx`). + +## Observed Record Bases (post-fix, master d8766c6) + +All 4 audit-037 record bases are still allocated and contain identical or near-identical data to pre-fix: + +### 0x40542300 — IDENTICAL to pre-fix +``` ++0x00: 67 61 6d 65 3a 5c 68 69 64 64 65 6e 5c 52 65 73 | "game:\hidden\Res ++0x10: 6f 75 72 63 65 33 44 5c 43 6f 6d 6d 6f 6e 2e 78 | ource3D\Common.x ++0x20: 70 72 00 5c 93 9a 9d cc 69 d8 e4 5c 97 3a 5c 0a | pr.\..i...\.:\. ++0x30: aa b2 16 c3 5e e7 0e 0a 69 d8 e4 5c c2 95 ea d8 | ++0x40: 40 54 28 80 00 00 00 00 00 00 00 02 00 00 00 03 | +... +``` +**+0x20 dword = 0x7072005C ("pr\0\\")**, IDENTICAL to audit-037 pre-fix. + +### 0x40542340 — descriptor-shape (header) +``` ++0x00: 40 54 28 80 ... | be32=40542880 (ptr to next record) ++0x40: 40 54 1f 40 40 54 1f 40 64 64 65 6e 40 54 23 00 | "...dden@T#." ++0x50: 6f 75 72 63 65 33 44 5c 43 6f 6d 6d 00 00 00 22 | "ource3D\Comm..." +``` + +### 0x40542400 — descriptor-shape +``` ++0x00: 40 54 24 80 ... | be32=40542480 (ptr) ++0x40: 40 54 26 00 40 54 1e c0 40 54 25 40 5f 54 49 54 | "@T&.@T..@T%@_TIT" +``` + +### 0x405424c0 — pointer-bearing PARTIAL +``` ++0x00: 40 54 25 80 ... | be32=40542580 (ptr) ++0x20: 40 54 1e d8 00 00 00 00 00 00 00 00 40 54 1e f4 | be32=40541ed8 ... 40541ef4 ++0x40: 40 54 23 40 3a 5c 68 69 64 64 65 6e 5c 52 65 73 | "@T#@:\hidden\Res ++0x50: 6f 75 72 63 65 33 44 5c 70 74 63 5f 70 61 63 6b | "ource3D\ptc_pack +``` +**+0x20 dword = 0x40541ED8 (pointer)** — descriptor-shape; **+0x44 onward holds inline path string** "game:\hidden\Resource3D\ptc_pack.xpr". + +## Canary Comparison (audit-037 ground truth) + +Canary populates filenames via `RtlInitAnsiString(BC365xxx,"game:\\hidden\\Resource3D\\Common.xpr")` at `0xBC365xxx` separately from the per-file struct at `0xBC65xxxx` — the struct holds **pointers** (handle@+0=0xF80000B8 + sub-pointers BC65xxxx/BC36xxxx). Strings live on a different heap from records. + +Our impl: filenames are **inlined into the record itself** at the record base or at +0x40. No separate filename heap is allocated; the populator path that splits filename→pointer doesn't run. + +## Verdict — Cascade Dimension A + +**FAIL.** Cache fix (audit-038) did NOT flip record layout to canary-shape. + +- 0x40542300: inline-string layout unchanged. +0x20 = 0x7072005C (text bytes "pr\0\\") — IDENTICAL to audit-037 pre-fix. +- 0x405424c0: descriptor-shape with pointers at +0x20, but **filename still inlined at +0x44** rather than externalized to a separate `RtlInitAnsiString`-style heap pointer. +- No record begins with `0xF80000B8` handle. No record contains BC65xxxx-equivalent sub-pointers. The populator that should externalize filenames into ANSI-string heap before the pointer-bearing record stage is NOT running. + +Audit-038's fix is correct hygiene (cache:/* paths persist via `/tmp/xenia-rs-cache--/`); it silenced sub_82459D18/sub_8245D230/0x82450904 (cache-miss/resize path). But the **record-population transformation step** (filename string → ANSI-heap → pointer-bearing struct) is a DIFFERENT mechanism, upstream or sibling to the cache machinery, that audit-038 didn't touch. + +## Lockstep Determinism + +Probe-element run: `instructions=500000019, imports=5629636, swaps=2, VdSwap=2`. Stable. + +## Recommended Next + +Cache fix is **necessary but not sufficient**. The hidden Track is: + +- **Option A** — find the ANSI-string populator: trace `RtlInitAnsiString` callers in our impl vs canary; specifically the path that takes a `game:\hidden\…` literal and writes it to a fresh heap allocation before the per-file record sees it. If this path doesn't fire in our impl, that's the missing transformation. +- **Option B** — mem-watch the +0x20 dword on a specific record (e.g. 0x40542320) to capture WHO writes the inline-string bytes. The writer's PC + LR identifies the populator function; cross-check whether canary's equivalent function instead writes a pointer. +- **Option C** — let sister Track 2's extended-horizon canary trace land first; if cluster L1 activates in canary at e.g. T+30s, rule out timing/horizon as a confound before declaring transformation-step missing. +- **Option D** — KRNBUG entry: audit our `RtlInitAnsiString` (and adjacent string-init paths) for handling of `cache:/*` vs `game:/*` vs `dat:/*` prefixes; if the populator branches on prefix and we mis-handle one, that's the bug. + +Sister Track 2's findings are now load-bearing: if cluster L1 fires in canary at extended horizon, then transformation-step is the gate; if it doesn't fire there either, we need a higher-up activation trigger. + +## Trace + +`audit-runs/audit-039-track-1-verify/`: +- `probe-element.{out,log}` — pc-probe sub_8228E498 (0 fires) + 4-record dump +- `dump-extended.{out,log}` — extended dump 0x40542100..0x40542800 + +## Reading-Error Ledger + +No new errors. Audit-037's record-base attribution (0x40542300 etc) holds post-fix; audit-038's cache-fix scope is correctly characterized as cache-only (not record-layout). diff --git a/migration/claude-memory/project_xenia_rs_audit_039_track_2_extended_canary_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_039_track_2_extended_canary_2026_05_09.md new file mode 100644 index 0000000..823854d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_039_track_2_extended_canary_2026_05_09.md @@ -0,0 +1,98 @@ +# AUDIT-039 Track 2: Extended-Horizon Canary Trace for Cluster Activation (2026-05-09) + +**Status**: READ-ONLY — canary instrumentation only (patch reverted at session close). NO xenia-rs source mods, NO commit. +**Canary HEAD**: `6de80dffe` (clean baseline; patch applied + reverted in-session). +**xenia-rs HEAD (untouched)**: `d8766c6` (post audit-038 cache fix). Tests 645. +**Sister session**: Track 1 — cache-fix record-layout verification (parallel, untouched on xenia-rs side). + +## Question + +At extended horizon (10–15 min, 2–3× longer than audit-034 Phase B's 5 min), does Linux Debug canary EVER reach the audit-009 cluster's Tier-2 callers (`sub_82172524`, `sub_82175810`, `sub_8217EB78`) — and through them the cluster's L1 entries? + +If YES → capture LR (caller PC) → name the activation gate. +If NO → cluster activation is past Linux Debug's reach in 15 min → strategic pivot mandatory. + +## Method + +1. Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC across 4 files). +2. Build via `ninja -f build-Debug.ninja xenia_canary` (success, 13 objects). +3. Probed 3 Tier-2 PCs serially with 15-min wallclock each (single PC at a time per audit-031 finding): + - `0x82172524` — actual run 22 min (timeout(1) didn't enforce the 900s SIGTERM cleanly until force-kill) + - `0x82175810` — 15 min wallclock + - `0x8217EB78` — 15 min wallclock +4. Trace marker `TRACE-PC-LR pc=...lr=...r3=...r4=...r5=...r6=...r31=...` per fire. +5. Compressed plan per task brief: skip Tier-1 (3 PCs) + L1 (6 PCs) when Tier-2 = 0× (consequences of Tier-2 firing). + +## Result Table + +| Tier | PC | Horizon (wall) | Hits | LR captured | Notes | +|------|-------------|----------------|------|-------------|--------------------------------------------------| +| T2-A | 0x82172524 | 22 min | **0** | — | Steady-state idle: 240k KeReleaseSemaphore / 75k texture-load loop | +| T2-B | 0x82175810 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) | +| T2-C | 0x8217EB78 | 15 min | **0** | — | Steady-state idle (same kernel-call mix) | + +Total wallclock: ~52 min canary CPU. All three external Tier-2 callers of the cluster STAYED 0× across extended horizons. + +## Steady-State Engine Mix (representative T2-A 22 min) + +``` +240438 KeReleaseSemaphore(828A3230, 1, 1, 0) ← audio sema repeat + 74635 VdRetrainEDRAM, VdGetSystemCommandBuffer ← renderer idle pump + 74635 XamInputGetCapabilities(0..3) ← input poll + 432 Kernel object Removed; 396 Added; 381 NtStatusToDosError +``` + +Identical mix in T2-B, T2-C. Engine reaches the same plateau as 5-min Phase B but progresses no further across 3× the wallclock. RECONCILE-A's "Linux reaches frame 42/186 vs Lutris Windows 72/186 in 10× more time" framing holds: the engine is alive at the kernel-call level but not advancing through the front-end-UI / save-game state machine. + +## Verdict — OUTCOME (ii) + +**Cluster activation is past Linux Debug's reach in 15 min.** Per task brief Step 3 outcome (ii): + +> Tier 2 stays 0× even at 15 min: cluster activation is past Linux Debug's reach in 15 min. The gate may be tied to intro-video duration / state-transition that Linux can't reach (per RECONCILE-B host-presenter caveat). + +Confirms and extends audit-034 Phase B (5 min, 0× Tier-2/3) and VERIFY-A (35 sec, 0/12 cluster L1). The static-reachability claim from audit-009 stays sound; the runtime gate is genuinely upstream of Tier-2 calls in the front-end-UI subsystem. + +## Strategic Implication + +The **shared Linux-host-presenter caveat (RECONCILE-B)** dominates: Vulkan/XCB on Linux fails to display intro video; user confirmed Weston also shows black; the front-end-UI state machine never advances past the post-intro state-transition that Tier-2 callers gate on. Three independent canary horizons (35 sec / 5 min / 15 min) all stop in the same idle loop. + +Methodology consequence: **15-min Linux Debug canary cannot witness the cluster activation event on this host.** Continued probing at higher horizons on Linux is unlikely to yield. Two pivots open: + +- **Pivot A — Lutris Windows canary instrumentation.** The user's Lutris Windows build of canary reaches further (frame 72/186 in observed traces). Re-port the `--log_lr_on_pc` patch to a Windows build and probe Tier-2 there. Higher cost (Windows toolchain, Lutris config, longer iteration), but could finally witness Tier-2 fires and LR-name the trigger. +- **Pivot B — Static-only path forward.** Drop runtime probing on this side; lean on M5.5 (alias-aware vtable dispatch resolution per analysis-overhaul SCHEMA.md) to statically name the gate function in xenia-rs's IDA database, then probe THAT function in our impl + canary-Linux at 5-min horizon to see if it fires there at all. + +**Recommendation**: Pivot B first (low-cost, exhausts static analysis avenue per audit-029 verdict); Pivot A as fallback if M5.5 doesn't reach a probeable witness. + +## Sister-Session Coordination + +Track 1's verdict on cascade dimension A: **FAIL** — audit-038 cache fix did NOT flip record layout to canary-shape. Track 1 recommended waiting for Track 2 before declaring transformation-step missing (Option C) to rule out horizon-as-confound. Track 2 now rules that out: 15-min horizon does not move the needle. Combined hand-off: **transformation-step (RtlInitAnsiString-driven filename externalization) IS missing AND cluster activation IS past Linux Debug's reach.** These are independent gates; Track 1's Option A (trace `RtlInitAnsiString` callers on the `game:/dat:/cache:` prefix family) becomes the next concrete xenia-rs-side action regardless of cluster activation horizon. + +## Falsifications + +- Audit-034 Phase B's "5 min may be too short" caveat is now closed: 15 min doesn't reach Tier-2 either. +- Hypothesis "extended horizon would witness cluster activation" — falsified for Linux Debug at 15 min. + +## Trace + +`audit-runs/audit-039-track-2-extended-canary/`: +- `canary-0x82172524.{log,err}` — 77 MB log, 0 fires, 22-min wall (force-killed) +- `canary-0x82175810.{log,err}` — 52 MB log, 0 fires, 15-min wall +- `canary-0x8217EB78.{log,err}` — 55 MB log, 0 fires, 15-min wall (force-killed at +3s post-timeout) + +## Cleanup + +- Canary patch reverted: `git checkout -- src/`; `git status` clean; HEAD `6de80dffe` unchanged. +- xenia-rs source unmodified, no commit, no push. +- Sister Track 1's territory (xenia-rs runtime probe) untouched. + +## Discipline Gate (5/5 PASS) + +1. Hypothesis explicitly tested with sharp pre-prediction (Tier-2 fires → LR-names gate; 0 fires → outcome ii). +2. Canary patch applied + reverted at session close (clean baseline confirmed). +3. xenia-rs source unmodified, no commit. +4. Single-step (verification only, no fix attempt). +5. Trace files saved per audit dir convention. + +## Reading-Error Ledger + +No new errors. Reaffirms (does not contradict) the cluster's "front-end UI / save-game / mission-select" subsystem identity per RAPID-SURVEY-Q4 (NOT renderer plateau). diff --git a/migration/claude-memory/project_xenia_rs_audit_040_record_ctor_inputs_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_040_record_ctor_inputs_2026_05_09.md new file mode 100644 index 0000000..15a6eb9 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_040_record_ctor_inputs_2026_05_09.md @@ -0,0 +1,131 @@ +# KRNBUG-AUDIT-040 — record ctor input divergence (sub_8244FC90) + +**Date:** 2026-05-09 (session 1 of 10-session autonomous budget) +**Status:** READ-ONLY. Canary patch applied + reverted. Master HEAD `d8766c6` unchanged. Tests 645. +**Lockstep:** instructions=100000004, swaps=2, draws=0 (plateau). + +## Goal + +Identify the divergent INPUT to `sub_8244FC90` (record ctor) between canary and ours. Per audit-037 the ctor fires identically in both impls but produces different layouts (canary = pointer-bearing, ours = inline-string). + +## Method + +Re-applied audit-030 `--log_lr_on_pc` canary patch and EXTENDED `TrapLogLR` to log r3..r10 + r28..r31 + LR + a 32-byte hex dump from `*r4` and `*r5` (sub_8244FC90 is `_init(this=r3, src=r4, struct=r5, arg=r6, arg=r7)` per disasm). +56 LOC across 4 files (under 80 LOC cap). Build via `ninja -f build-Debug.ninja xenia_canary`. Saved patch reference at `audit-runs/audit-040-record-ctor-inputs/canary-patch.diff`. Reverted at session close (`git status` clean). + +Captured 30 s canary trace at `--log_lr_on_pc=0x8244FC90` (33 fires). Captured ours via `--lr-trace=0x8244FC90` (8 fires) + `--dump-addr` against the recurring r4/r5 values. + +## Calling convention (sub_8244FC90, from sylpheed.db disasm 0x8244FC90..0x8244FD30) + +- **r3** = dest record (allocated by caller via `operator new` at `sub_824503A0+0x418`) +- **r4** = source struct ptr (28 bytes; saved as r29; loop at +0xFD1C..+0xFD2C copies 7 dwords from `*r4` to `r31+60`) +- **r5** = secondary "this" ptr (stored at `r31+36`, vtable bearing in canary) +- **r6** = scalar (stored at `r31+48`) +- **r7** = scalar (stored at `r31+52`) + +## Concrete register snapshots (one representative fire each) + +| reg | canary (fire 2) | ours (fire 2) | +|-----|-----------------|---------------| +| r3 | `BC65D440` | `405420C0` | +| **r4** | **`BC79C9EC`** | **`406819EC`** | +| r5 | `BC65D2C0` | `40542100` | +| r6 | `BC79C9C4` | `406819C4` | +| r7 | `0000000C` | `40542100`-region | +| LR | `82450440` | `82450440` | + +Both impls call from the **same site** (LR = `0x82450440` = `sub_824503A0+0xA0`, the `bl 0x8244FC90` instruction). + +## The divergence — content of `*r4` (the 28-byte source struct memcpy'd into the record) + +| word | canary `0xBC79C9EC` | ours `0x406819EC` (≈ same role at `0x40681A4C`) | classification | +|------|---------------------|-----------|----------------| +| +0 | **`F80000DC`** | **`00001454`** | **DIFFERENT** — kernel handle vs handle | +| +4 | `00000000` | `00000000` | same | +| +8 | `00000000` | `00000002` | DIFFERENT | +| +12 | `00000003` | `00000003` | same | +| +16 | `00000000` | `0000000C` | DIFFERENT | +| +20 | `0000000C` | `0000000C` | same | +| +24 | `00000000` | `00000000` | same | + +The first dword (the slot at `[r31+44]` of `sub_822DFBC8`'s "this") is the load-bearing divergence: it is later read at `sub_822DFBC8+0x5C` (`lwz r3, 0(r30)`) and waited on via `bl 0x824AA330` (= `WaitForSingleObject(handle, -1)`). + +## Upstream caller — where the divergent dword originates + +Backtrace from sub_8244FC90 entry: + +``` +frame 0 sub_8244FC90 (r3=dest, r4=*src-struct) +frame 1 sub_824503A0+0xA0 bl 0x8244FC90 (LR=0x82450440) +frame 2 sub_824528A8+0x90 bl 0x824503A0 — vtable forwarder, r9=src-struct +frame 3 sub_822DFBC8+0x40 bcctrl 20,lt — vtable[7] of [r31+40], r8=r31+44 +frame 4 sub_822DFCC4 (callers x4 — sub_822DFCC4/E20/ECC, sub_822E0334) +``` + +In **sub_822DFC74** (the +0x822DFCC4 caller frame), the slot at `r31+44` is populated as: + +``` +0x822DFC8C-90 r3=r4=r5=r6=0; bl 0x824A9F18 ; create event, returns handle in r3 +0x822DFC94 or r4, r3, r3 ; r4 = handle +0x822DFC98-9C r3 = r1+80; bl 0x821820B0 ; stw r4, 0(r3) at r1+80 +0x822DFCA0 lwz r11, 80(r1) ; r11 = handle +0x822DFCB8 stw r11, 44(r31) ; *** [this+44] = handle *** +0x822DFCC4 bl sub_822DFBC8 ; dispatch (memcpys this+44 into record) +``` + +`sub_824A9F18` is a **wrapper around `NtCreateEvent`** (xboxkrnl.exe ordinal 209, at thunk `0x8284DF1C`). It calls `bl 0x824AC268` which itself uses `RtlInitAnsiString` (ordinal 300) for the optional name. + +So the divergent dword is **the OUT handle returned by `NtCreateEvent`**: +- canary: `NtCreateEvent` → `0xF80000DC` (kernel-region pseudo-handle, canary's `XObject` namespace) +- ours: `NtCreateEvent` → `0x00001454` (small-int handle ID, our `KernelState::handle_table` namespace) + +## Bug-class refinement + +This is **NOT a logic bug in our code path**. Both impls call `NtCreateEvent` 395 times during boot; both succeed. The DIVERGENCE is **handle-namespace cosmetics**: canary returns `0xF8000xxx`, ours returns `0x10xx-0x14xx`. Both are valid handle values within their own kernel. + +**However:** the divergent dword IS the load-bearing first-word of the source struct that sub_8244FC90 memcpys into the record. The downstream code interprets `[record+0x3C]` as a handle and does `WaitForSingleObject(handle, INFINITE)` — which works in both runtimes. This handle-namespace divergence is therefore NOT a stuck state in the record-ctor flow itself. + +The audit-037 framing of "canary records hold pointer-bearing structs while ours holds inline-string structs" needs a careful re-read: +- The 28 bytes copied at sub_8244FC90 (record `+0x3C..+0x57`) ARE different in handle slot, but only by namespace. +- The "filename text starting at +0" the audit-037 saw must be at a DIFFERENT offset of the dest record (e.g. `+0x40..` of our `0x40542100` dump shows `40541F80 40542000 745c4750` then ASCII `LE.pak\0eng\p`) — that's the SECOND dword block written by sub_824503A0 AFTER sub_8244FC90 returns, NOT the source struct. + +**Provisional class:** δ-namespace (handle representation divergence; lossy round-trip when guest treats handle as semantically meaningful in a struct field). Sister of the audit-024A "F8 vs heap pointer" class. + +## Falsifications + +- The 11-fire claim from audit-037 is too low — 30 s canary window yields 33 fires, suggesting earlier session terminated too quickly (≤ 10 s effective trace). +- The "record holds inline-string struct" framing is partially incorrect: the inline string lives in the dest at offset `+0x40+`, written by sub_824503A0 OR a subsequent write — not by sub_8244FC90's memcpy. Track-1's "filename text starting at +0" finding is approximate; needs re-verification against record offset boundaries. + +## Recommended audit-041 + +**Two parallel options:** + +1. **DOWNSTREAM-USE PROBE (preferred)** — instrument the record AFTER sub_8244FC90 returns. Where does the record's `+0x3C` (handle slot) get READ in canary vs ours? If both runtimes correctly route the handle through `WaitForSingleObject` then the namespace divergence is benign and the actual blocker is elsewhere. Probe `sub_822DFC34` (`bl 0x824AA330`) entry in BOTH runtimes; capture r3 = handle being waited on; verify wait completes (signal source). Targeted PCs: `0x822DFC34` (waitsite), `0x824AA330` (KeWaitForSingleObject thunk). + +2. **AUDIT-037 RE-VERIFICATION** — re-read audit-037's "record at +0x40 has filename text vs canary has pointers" claim against the new ground truth. Specifically dump 128 bytes from canary's `r3=BC65D440` and ours's `r3=0x405420C0` AT THE EXIT of sub_8244FC90 (not at session-end which is far later). If the filename-text-at-+0x40 bytes are written by sub_824503A0+0x478 (`bl 0x822F8A70` / `bl 0x82150030`) then those callees are the actual filename-vs-pointer divergence and are the real audit-041 target, not sub_8244FC90 itself. + +**Stop conditions for audit-041:** if the downstream probe shows canary's wait at `0x822DFC34` ALSO never completes (matching ours), the handle-namespace finding is benign and the gate is upstream of `WaitForSingleObject`; pivot to RDX-search-criteria producer. If canary's wait completes but ours doesn't, it's a **signaler-missing** bug; trace which kernel call signals canary's `0xF80000DC` event handle. + +## Trace artifacts + +- `audit-runs/audit-040-record-ctor-inputs/canary-0x8244FC90.log` (33 canary fires + dumps) +- `audit-runs/audit-040-record-ctor-inputs/ours-lrtrace.jsonl` (8 fires r3-r6+LR JSONL) +- `audit-runs/audit-040-record-ctor-inputs/ours-dump.log` (10 dump-addr snapshots) +- `audit-runs/audit-040-record-ctor-inputs/canary-patch.diff` (notes/reference, code reverted) + +## Ledger update + +This audit ADDS one entry to the running 11-error ledger: +- **Subsystem-mislabel adjacent:** "audit-037 r4 source struct = inline filename" → corrected to "r4 source struct = NtCreateEvent handle slot at this+44 of sub_822DFBC8's class instance". The filename text lives in a DIFFERENT region of the dest record, written by a sibling callee (`sub_822F8A70` / `sub_82150030` family). + +## Discipline gate + +5/5 PASS: +- Read-only on xenia-rs source: PASS +- Canary patch reverted (`git status` clean): PASS +- ≤80 LOC patch: PASS (56 LOC) +- No fix this session: PASS +- Sharp prediction provided for audit-041: PASS + +## Master HEAD + +xenia-rs `d8766c6` unchanged (645 tests). xenia-canary `6de80dffe` clean post-revert. diff --git a/migration/claude-memory/project_xenia_rs_audit_041_wait_site_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_041_wait_site_2026_05_09.md new file mode 100644 index 0000000..0aadeae --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_041_wait_site_2026_05_09.md @@ -0,0 +1,133 @@ +# KRNBUG-AUDIT-041 — wait-site signaler determination (READ-ONLY, 2026-05-09) + +**Master HEAD**: `d8766c6` unchanged. +**Canary HEAD**: `6de80dffe` unchanged. +**Tests**: 645. **Lockstep**: instructions=100000004 unchanged. +**Trace dir**: `audit-runs/audit-041-wait-site/`. + +## Goal + +Determine whether the handle-namespace divergence identified in AUDIT-040 +(canary `0xF80000DC` family vs ours `0x00001454` family for event handles +created at sub_822DFC74 → NtCreateEvent ord 209) is **benign** or +**load-bearing** — i.e., does ours actually stall at the wait while +canary completes? + +## Method + +- Re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files); + rebuilt Linux Debug; reverted at session close (canary clean). +- Identified wait site: PC `0x822DFC34` `bl 0x824AA330` + (KeWaitForSingleObject wrapper, alertable=0, timeout INFINITE). +- Containing function: **sub_822DFBC8** (range 0x822DFBC8..0x822DFC68); + caller LR `0x822DFC0C` (single caller, sub_822DFBC8 itself). +- Wait loop: `0x822DFC30 addi r4,r0,-1` (INFINITE) → `0x822DFC34 bl wait` + → `0x822DFC38 cmpli cr6,0,r3,0x102` (STATUS_TIMEOUT) → branch to + `0x822DFC18` retry on timeout OR on `[r31+52]==3`. +- Captured 30s canary traces at PC 0x822DFC34 (bl) + 0x822DFC38 (post-bl). +- Captured our impl via `--lr-trace=0x822DFC30,0x822DFC38 -n 500M` + (probing `bl` itself returned 0 fires in our HIR — `bl` is elided as + a control-flow terminator; pre-bl `addi` at 0x822DFC30 is the fair + comparison). + +## Findings + +### Wait completion ratio + +| Runtime | bl/pre-bl fires | post-bl fires | completes? | +|---------|-----------------|---------------|------------| +| canary | 9 | 9 | 9/9 = 100% | +| ours | 7 (pre-bl) | 6 | **6/7 = 85%** | + +The 7th wait in ours stalls. Sample handles: +- canary r3: `0xF80000CC`, `0xF80000C0` (kHostHandleBase namespace). +- ours r3: `0x000012C0`, `0x000012FC`, **`0x00001454`** (audit-040 family). +- The stalled 7th wait is on handle `0x00001454`, cycle 48,849. All 6 + earlier waits (handles 0x12C0/0x12FC) returned r3=0 (STATUS_SUCCESS). + +**Outcome (i) confirmed**: handle-namespace divergence is +**load-bearing**, manifesting as a stalled wait in ours. + +### Signaler identification + +Probed canary's KeSetEvent thunk (0x8284DDDC) and NtSetEvent thunk +(0x8284DF5C) over 30s. Filter for r3 ∈ {F80000CC, F80000C0}: + +- KeSetEvent: 20,588 fires globally, **0** on F80000CC/C0 (KeSetEvent + takes a KEVENT*, not a handle). +- **NtSetEvent: 9,245 fires globally, 2 on F80000CC/C0**: + ``` + pc=8284DF5C lr=824AA304 r3=F80000C0 r4=00000000 r5=F80000C0 r6=03A72328 r31=BC65CF98 + pc=8284DF5C lr=824AA304 r3=F80000CC r4=00000000 r5=F80000CC r6=03A72328 r31=BC65D058 + ``` + +**Signaler = NtSetEvent** (xboxkrnl ord 246), thunk PC 0x8284DF5C. +LR=0x824AA304 → wrapper sub_824AA2F0 (single-arg passthrough at +0x824AA300). sub_824AA2F0 has **89 callers** in the static graph; the +actual signaler caller chain is the next investigation step. + +### Cross-check: ours fires NtSetEvent on 0x1454? + +Probed our impl's NtSetEvent thunk at 0x8284DF5C: 3,334 total fires. +- `r3=0x000012C0` × 1 (cycle 514,138) +- `r3=0x000012FC` × 3 +- `r3=0x00001454` × **1** (cycle 3,519,453) + +**The signaler IS firing on 0x1454 in ours.** But the wait at cycle +48,849 (much earlier) never returns. So this is NOT signaler-missing in +the trivial sense — the signaler call exists. The signal arrives but +the waiter isn't woken. + +## Bug class refinement (provisional) + +This is **δ-namespace AND δ-wakeup**, not pure signaler-missing: + +- The handle namespace divergence (audit-040) means our `0x1454` is + *some* event in our table, but the NtSetEvent on 0x1454 may resolve + to a *different KEVENT object* than the one our wait registered for — + if our handle table aliases handle slots between Nt-create epochs. +- Alternatively: the signal hits the right object, but our `KeWaitForSingleObject` + / `KeSetEvent` plumbing has a missed-wake (signal-before-wait race or + event reset path). +- Three sites where pre-bl fires but no post-bl: cycle 48,849 with handle + 0x1454. The signal at cycle 3.5M is AFTER the wait registered, so + signal-before-wait race is ruled out. + +## Discipline gate (5/5 PASS) + +1. Hypothesis explicitly tested (wait-completion-ratio canary vs ours). +2. Canary patch applied (30 LOC) + reverted at session close. +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only, no fix attempt). +5. Trace files saved: `canary-bl-0x822DFC34.log` (4.5 MB), + `canary-postbl-0x822DFC38.log`, `canary-keset-0x8284DDDC.log` (22 MB), + `canary-ntset-0x8284DF5C.log` (11 MB), `ours-pre-bl.jsonl`, + `ours-lr.jsonl`, `ours-ntset.jsonl`. + +## Recommended audit-042 (autonomous) + +**Priority**: pivot to audit-042 to identify which of NtSetEvent's 89 +callers signals our hung 0x1454, AND verify whether the handle binds +to the same KEVENT pointer in both runtimes. + +Two-track: + +1. **Caller chain**: probe sub_824AA2F0 (NtSetEvent wrapper) entry; on + each fire log r3 + LR + r31. Filter for r3=0x1454 (ours) / + r3=0xF80000CC family (canary). The caller LR + r31 names the + creator-side state machine. + +2. **Handle resolution**: in our impl, dump handle table state for slot + 0x1454 at cycles 48,849 (wait) and 3,519,453 (signal). Verify that + both lookups return the same `Arc`. If not — handle + table aliases between epochs (creator in audit-040 made a NEW + 0x1454 between wait and signal because NtClose recycled the slot). + +If handle aliasing confirmed → bug class collapses to handle-recycling +in our slab allocator (`xenia_kernel::handle_table`) — fix is in our +impl, not in the kernel exports. + +If handle resolution stable → bug is in our `KeSetEvent` / +wait-queue machinery (signal lands but waiter list pointer is wrong). + +Both fixes are small (≤60 LOC) and both fit the discipline budget. diff --git a/migration/claude-memory/project_xenia_rs_audit_042_handle_lifecycle_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_042_handle_lifecycle_2026_05_09.md new file mode 100644 index 0000000..c49c258 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_042_handle_lifecycle_2026_05_09.md @@ -0,0 +1,172 @@ +# KRNBUG-AUDIT-042 — handle 0x1454 lifecycle disambiguation (READ-ONLY, 2026-05-09) + +**Master HEAD**: `d8766c6` unchanged. +**Canary HEAD**: `6de80dffe` unchanged (patch reverted at session close). +**Tests**: 645. **Lockstep**: instructions=100000004 unchanged. +**Trace dir**: `audit-runs/audit-042-handle-lifecycle/`. + +## Goal + +Disambiguate audit-041's stall hypothesis for handle `0x1454`: +- **(A)** handle-recycling — slot 0x1454 reused between create-epochs; + signal hits new instance, waiter is on old. +- **(B)** wakeup-plumbing — same KEVENT object, signal lands but + waiter list pointer / wake-eligibility check is wrong. + +## Method + +- Ours: `cargo run --release -p xenia-app -- exec sylpheed.iso + --halt-on-deadlock --probe-db=sylpheed.db + --trace-handles-focus=0x1454 -n 500_000_000` + (existing `xenia_kernel::audit` infrastructure; no source mods). + Two reruns identical. +- Canary: re-applied `audit-030-lr-trace/canary-patch.diff` (30 LOC, + 4 files); rebuilt Linux Debug; ran with `--log_lr_on_pc=0x8284DF1C` + (NtCreateEvent thunk, ord 209). Reverted at session close. +- Cross-ref: re-grepped audit-041's existing `canary-bl-0x822DFC34.log` + for `Added handle:` / `Removed handle:` lifecycle markers. + +## Findings + +### (1) Allocator architecture — ours + +`KernelState::alloc_handle` (state.rs:588-593): +```rust +pub fn alloc_handle(&mut self) -> u32 { + self.next_handle.fetch_add(4, Relaxed) // init = 0x1000 +} +``` +**Monotonic atomic counter, bump-only.** `nt_close` (exports.rs:1869) +removes the object from `state.objects` but never returns the handle +ID to a free list. **Recycling is structurally impossible.** + +### (2) Lifecycle of 0x1454 in ours (deterministic across reruns) + +| event | cycle | tid | LR | source | +|---|---|---|---|---| +| create | 0\* | 13 | 0x824a9f6c | NtCreateEvent (Event/Manual) | +| wait | 0\* | 13 | 0x824ac578 | do_wait_single | +| signal | 0\* | 5 | 0x824aa304 | NtSetEvent | +| wake | 0\* | 5 | — | wake_eligible_waiters/auto | + +\* `cycle=0` is a separate audit-instrumentation gap +(`audit_entry` reads `scheduler.ctx(0).timebase` which is 0 in this +build); counts and ordering remain authoritative because rings are +append-only. + +**Final state**: `waiters=0 signaled=true signal_attempts=1 waits=1 +wakes=1`. Single create, single wait, single signal, single wake — +**fully consumed**. **0x1454 is not stuck.** + +Created stack: `lr=0x822dfc94` (audit-041's sub_822DFC74 caller), +chained through `0x822e0344, 0x822d2ca4, 0x822de768, 0x821c4b1c`. + +### (3) Lifecycle of 0xF80000CC family in canary + +From `canary-bl-0x822DFC34.log` (audit-041) + `canary-create-0x8284DF1C.log` +(this audit, ~11.5 MB): +``` +Added handle:F80000CC for XObject (fresh KEVENT slot) +NtDuplicateObject(F80000CC, ...) × 3 +TRACE-PC-LR pc=822DFC34 r3=F80000CC (the wait, lr=822DFC0C) +NtClose(F80000CC) → Removed handle:F80000CC for XEvent +Added handle:F80000CC for XEvent (NEW KEVENT, SAME SLOT) +NtClose → Removed → Added × 4 more iterations +``` + +Recycling counts in 30s: +- `F8000098`: 130 reuses +- `F80000D0`: 95 +- `F80000DC`: 71 +- `F80000C0`: 10 +- `F80000CC`: 5 + +Canary's `ObjectTable::AllocateHandle` is a slab/free-list allocator; +ours is bump-only. **Canary recycles heavily, ours never recycles.** + +### (4) Decisive disambiguation + +| | ours | canary | +|---|---|---| +| handle 0x1454 NtCreateEvent fires | 1 | n/a | +| handle 0xF80000CC `Added handle:` | n/a | 5+ in 30s | +| recycling? | **NO** | **YES** | +| 0x1454 wait completes? | **YES** (wakes=1) | n/a | + +**Verdict: ROOT CAUSE IS NOT (A) HANDLE-RECYCLING.** Recycling is +structurally impossible in our impl. The signal lands on the same +object the waiter registered for. + +**Sub-conclusion on audit-041**: audit-041 inferred a stall from +"7 pre-bl, 6 post-bl" lr-trace data, but ran with `--quiet` so +end-of-run audit dumps were suppressed. The post-bl miss is +explained by the wait's return path not crossing the post-bl PC +(KeWaitForSingleObject's wake-side context-restore can bypass it). +**Audit-041's "wait NEVER returns" premise is provisionally +falsified for handle 0x1454.** + +### (5) Real wedge points (end-of-run handle waiter list) + +Stalled handles in this session's run, all ``: +- `0x1004` Event/Manual (tid=11 parked) +- `0x1020` Event/Manual (tid=3 parked) +- `0x1040` Event/Auto (tid=5 via WaitMultiple) +- `0x1544` Event/Manual (tid=17 parked) +- `0x1578` Event/Auto (tid=19 parked) +- `0x12ac` Semaphore (tid=14, 15 parked) +- `0x10a0` Event/Auto + `0x10a4` Semaphore (tid=6 paired wait) + +These are **γ-class missing-signaler** candidates; distinct from +0x1454. + +## Bug-class refinement + +- **δ-wakeup**: RULED OUT for 0x1454 (wake fired). +- **δ-namespace**: RULED OUT (single create, no aliasing). +- **Wedge migrates** to a different handle set — re-pivot needed. + +## Sharp 4-dim cascade prediction (for audit-043 fix on real stalled handle) + +- **A**: target handle's `signal_attempts` 0 → ≥1. +- **B**: stalled tid transitions Blocked → Ready/Exited. +- **C**: `` count drops by ≥2 (cascading dependents). +- **D**: `swaps` past 2 OR `draws` 0 → >0. Probability: low — γ-cluster + plateau likely needs multiple gates. + +## Recommended audit-043 (autonomous) + +**Pivot off 0x1454.** Target the actually-stalled handles, ranked: + +1. **`0x10a0` Event/Auto + `0x10a4` Semaphore on tid=6** — + Event+Semaphore pair = canonical worker-waits-for-job pattern. +2. **`0x12ac` Semaphore (2 waiters: tid=14, 15)** — `KeReleaseSemaphore` + source is the target. +3. **`0x1004` Event/Manual on tid=11** — earliest-created stalled + handle. + +For each: `--trace-handles-focus=` → capture created stack +→ identify producer-side function. Canary cross-trace via +`--log_lr_on_pc=0x8284DF5C` (NtSetEvent) / `0x8284DDDC` (KeSetEvent) +filtering for equivalent canary handle. + +**Bug class for audit-043**: **γ (missing signaler)** — primary +candidate. NOT δ-namespace, NOT δ-wakeup. Audit-040's +handle-namespace divergence appears benign at the level of 0x1454. + +## Milestone status + +- Tests: 645 (unchanged). +- Lockstep: instructions=100000004 unchanged (no source mods). +- Master HEAD: `d8766c6` (unchanged). +- Canary HEAD: `6de80dffe` (clean, post-revert). +- Patch budget: 0 LOC consumed (read-only audit). +- Discipline gate: 5/5 PASS. +- Session 3 of 10 budget consumed. + +## Trace artifacts + +- `audit-runs/audit-042-handle-lifecycle/probe.log` (15 KB) — ours run 1 +- `audit-runs/audit-042-handle-lifecycle/probe-run2.log` (15 KB) — ours run 2 (determinism) +- `audit-runs/audit-042-handle-lifecycle/canary-create-0x8284DF1C.log` (11.5 MB) — canary NtCreateEvent fires (~452 in 35s window) +- Cross-ref: `audit-runs/audit-041-wait-site/canary-bl-0x822DFC34.log` + (existing) re-grepped for canary's `Added/Removed handle:` markers. diff --git a/migration/claude-memory/project_xenia_rs_audit_043_record_zero_offset_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_043_record_zero_offset_2026_05_09.md new file mode 100644 index 0000000..67dd806 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_043_record_zero_offset_2026_05_09.md @@ -0,0 +1,124 @@ +# KRNBUG-AUDIT-043 — Record +0x00 writer identification (2026-05-09) + +**Status**: READ-ONLY, master `d8766c6` unchanged, canary patch reverted at session close. + +**Tests**: 645 (unchanged). **Lockstep**: instructions=100000004 (no source mods). + +## Goal + +Identify the writer of `+0x00` in records at `0x40542300/0x40542340/0x40542400/0x405424c0` in our impl and reconcile with canary's divergent value. Per audit-039, ours has inline-filename text "game" (`0x67616D65`); canary has handle `0xF80000B8`. + +## Method + +1. Mem-watch `0x40542300/04, 0x40542340/44, 0x40542400/04, 0x405424C0/C4` in our impl, `-n 500_000_000`. +2. Group writers by (PC, LR), look up containing fns in `sylpheed.db`. +3. Disasm writer fns + caller chain. +4. Apply audit-030 `--log_lr_on_pc` patch to canary; probe writer PC = `0x825F1080` (memcpy core) and pool-init PC `0x82152728` in canary. + +## Key findings + +### Writer attribution (our impl) + +Mem-watch produced 92 events, 16 distinct (PC, LR) pairs. + +The `0x67616D65` ("game") writes — the anomaly under investigation — fire **only** at: + +- **PC `0x825F1080`, LR `0x825ED608`**, `store_len=8` (`stdu r7, 8(r3)` loop body of memcpy). +- 16 fires total across all 4 records. + +Containing functions: +- `0x825F1000-0x825F10B4` = **memcpy** (classic dword `ldu/stdu` loop with fall-thru to byte tail). +- `0x825ED588-0x825ED660` = **memcpy_s wrapper** (`r3=dest, r4=destsize, r5=src, r6=srclen` → `bl memcpy(r3,r4=src,r5=len)` at `+0x7C`). + +### The records are NOT records — they are **64-byte slots in a Sylpheed-managed pool allocator** + +Walking the chain: + +- The pool layout is built by `sub_82152728` (called from `sub_82152570`, called from `sub_821505D8` at boot). +- `sub_821505D8` allocates a ~58 MB region via `sub_824A8858` (size `0x03A723D0`, type `0x20000004`, alignment 4) — VirtualAlloc-style. +- `sub_82152728` chains free-lists in two blocks. **Block 2 is 64-byte-stride slots over a 1.25 MB span**: chain step `r11 += 0x40` until `r9 = r31 + 0x140000`. The records `0x40542300/40/400/4C0` ARE these 64-byte slots. +- The slot-size table (`0x82150610`+) lists the bucket sizes: 4, 16, 32, 64, 96, 128, 160, 192, 256. + +The `0x67616D65` writer is **`std::basic_string::reserve_then_assign`** at `sub_8216E138`: +- Calls heap allocator `sub_8216D9C8` (returns one of the 64-byte pool slots). +- `bl memcpy_s` at `+0xC8` copies the source string ("game\\source3D\\Common.x...") into the slot. + +So the 4 records are **transient std::string heap buffers**, not statically-laid-out resource records. + +### Canary writer comparison (PC `0x825F1080`) + +Applied `audit-030` LR-trace patch to canary, probed `pc=0x825F1080`: +- **94,945 fires in 25s**, never to address `0x40542xxx`. +- Top destinations (r3 prefix): `705D` (76674), `7033` (6642), `7036` (6254), `7043` (5952), `BC36` (1211), `BD17`, `B4Dx`, etc. +- Top LR distribution: `0x824AB1D4` (84,400), `0x824C7EF0` (9727), `0x824C27AC` (4880), `0x824C2760` (4880), `0x825ED608` (1,782), … +- Memcpy fires from `LR=0x825ED608` (matching our impl) **only 1,782× and never targets `0x40542xxx`**. + +Pool-init probe `pc=0x82152728` in canary fires **once** with `r3=0xBC32C880` — canary's pool BASE address. + +## Divergence — final interpretation + +**Both impls run the same guest code** (`sub_82152728` initializes the same pool layout in both). The divergence is purely **virtual-address-space layout in the host allocator**: + +- **Our impl**: VirtualAlloc-style backing returns address `~0x40541xxx` for the 58 MB pool → block-2 64-byte slots fall on `0x40542300, 40, 400, 4C0`. +- **Canary**: Same call returns `0xBC32C880` → block-2 slots fall on `0xBCxxxxxx` instead. + +The **same guest virtual address `0x40542300` therefore backs DIFFERENT live data in the two emulators**: +- Ours: a 64-byte std::string heap buffer holding "game\\source3D\\Common.x..." +- Canary: a kernel object handle slot (or another structure entirely) where `+0x00 = 0xF80000B8`. + +The "0xF80000B8 vs 'game'" comparison from audit-039 was therefore **not a comparison of equivalent data** — it was an artifact of comparing two emulators' memory at the same guest address while their respective allocators had returned that address for two different purposes. + +The audit-040 "handle@+0x00=0xF80000B8" interpretation **stands but applies to canary's own use of that VA, not to a record class shared between impls**. There is **no missing/wrong write at +0x00 in our impl**; ours is correctly populating its own pool-slot's std::string. + +## Bug class refinement + +- **NOT a record-layout divergence** (no shared "record" class — disjoint allocations). +- **NOT a missing kernel write** (the +0x00 in canary is canary's own NotifyListener / kernel handle storage at *that* VA; ours legitimately puts a string there). +- **Underlying class = ε (allocator address-space divergence)**: same guest API call returns different host-side VAs in canary (0xBC) vs ours (0x40). + +This invalidates the "record-by-VA cross-comparison" methodology used in audits 037/039/040 for the 0x40542300+ block. **Reading-error ledger update**: add a 12th entry — *VA-equality fallacy*: comparing two emulators' memory at identical guest VAs assumes both allocators return the same VA for the same logical allocation. Sylpheed's pool factory makes this assumption false in general. + +## Recommended audit-044 + +**Pivot**: drop the "record at 0x40542300+" line of investigation entirely. The records do not exist as a shared structure. + +Re-pivot to the **actually-stalled-handle** plan from audit-042: + +1. Trace `0x10A0` (Event/Auto) + `0x10A4` (Semaphore) waited by tid=6. Producer chain via `--trace-handles-focus=0x10a0,0x10a4`. +2. Trace `0x12AC` (Semaphore, 2 waiters tid=14/15). KeReleaseSemaphore source. +3. Trace `0x1004` (Event/Manual on tid=11) — earliest-created stalled handle. + +Each: get creator + signaler PC in our impl, map to canary equivalent via `--log_lr_on_pc` on `KeSetEvent` / `KeReleaseSemaphore` thunks. + +**Discipline reminder**: cross-emulator VA equality is unreliable. When comparing memory contents, compare the *logical* allocation (resolve via the allocator that produced it), not the raw VA. + +## Discipline gate (5/5 PASS) + +1. Hypothesis explicitly tested (writer-of-+0x00 isolated; canary equivalence checked). +2. Canary patch applied (30 LOC audit-030 base) + reverted at session close (`git status` clean, config TOML restored). +3. xenia-rs source unmodified, no commit. +4. Single-step (data-gathering only, no fix attempt). +5. Trace files saved: `audit-runs/audit-043-record-zero-offset/{mem-watch.log, mem-watch.stdout, canary-825f1080-traces.txt.gz, audit-043-canary-poolinit.log}`. + +## Status + +- Tests: 645 (unchanged). +- Lockstep: instructions=100000004 (unchanged — no source mods). +- Master HEAD: `d8766c6` (unchanged). +- Canary HEAD: post-revert clean (working tree restored). + +## Key PCs / LRs (cross-ref) + +| Role | PC | LR | Containing fn | +|------|----|----|---| +| memcpy core (ldu/stdu loop) | `0x825F1080` | (varies) | `sub_825F1000` (memcpy) | +| memcpy_s wrapper | (calls memcpy at `0x825ED604`) | `0x825ED608` (return) | `sub_825ED588` (memcpy_s) | +| std::string assign w/ realloc | `0x8216E200` (call to memcpy_s) | (callers list 19) | `sub_8216E138` | +| pool-slot heap allocator | (entry) | — | `sub_8216D9C8` | +| pool free-list initializer | `0x82152728` | `0x82152634` | `sub_82152728` | +| pool factory (VirtualAlloc + init) | `0x821505D8`-`0x821506B8` | from `0x8280C42C` | `sub_821505D8` | +| std::string default ctor / assign | — | — | `sub_82454498` / `sub_82454580` | + +## Canary pool base address + +`r3 = 0xBC32C880` at sub_82152728 entry — canary's 58 MB pool starts here. Ours' equivalent base address backs the 0x40541xxx region (slot offsets land 0x40542300, 0x40542340, 0x40542400, 0x405424C0). diff --git a/migration/claude-memory/project_xenia_rs_audit_044_m55_cluster_survey_2026_05_09.md b/migration/claude-memory/project_xenia_rs_audit_044_m55_cluster_survey_2026_05_09.md new file mode 100644 index 0000000..e7aa4a5 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_044_m55_cluster_survey_2026_05_09.md @@ -0,0 +1,56 @@ +--- +name: AUDIT-044 M5.5 cluster reachability survey 2026-05-09 +description: M5.5 typed-vptr BFS lifts audit-009 cluster from 0/321 static-reach to 41/321 indirect-reach (12.8%). Lift is surface-level — entry via 4 vtable methods on 2 vtables. Cluster's actual constructors (6 vptr writers) remain dead in BOTH views. Highest-leverage probe target: sub_8228F858 writer at 0x8228FAC8. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-044 (2026-05-09, READ-ONLY, master `7bc9e3a` unchanged)**: post-overhaul cluster reachability survey. Pure DuckDB queries on `/home/fabi/RE Project Sylpheed/xenia-rs/sylpheed.db` against M5.5 typed-vptr indirect BFS view `v_indirect_reachability_from_entry`. No xenia-rs binary executions, no canary patches. + +## Numerics + +- **Cluster `0x82285000-0x82294000`**: 321 fns total (308 pdata_validated) +- **Static-bl-only reach**: **0 / 321** +- **M5.5 indirect-vptr reach**: **41 / 321 (12.8%)** ← M5.5 lift +- **Globally**: static reach 1383 fns → indirect reach 4129 fns → M5.5 newly-reached **2746 fns** (matches MEMORY.md) +- **Audit-009 L1 PCs (6)**: only `sub_822919C8` becomes reachable (2-hop via `sub_82291C90` slot 1 of vtable `0x820a9c28`). Other 5 (`sub_82293448, sub_82288028, sub_82292D80, sub_822851E0, sub_82286BC8`) remain dead. +- **5 sub-buckets totally dead** (0x87/0x88/0x89/0x8d/0x92 in 0x82285000..0x82294000) = 131 fns = 41% of cluster + +## Cluster vtable structure (M3) + +Only **TWO** classes have methods inside the cluster: +- vtable `0x820a9c28` (`ANON_Class_F990DC4E`, length 3): slots 0/1/2 = `sub_82291740 / sub_82291BD0 / sub_82291C90` +- vtable `0x820aa024` (`ANON_Class_443BCBAF`, length 8): slot 0 = `sub_82293ED8` (other 7 outside cluster) + +So 321 cluster fns but only **4 OOP-visible methods** (RTTI stripped, M3 sees anon classes only). + +## How the 41 lift happens + +Cluster bootstraps **transitively** from these 4 vtable methods. M5.5 sees external dispatchers (mostly cand_count=203 noise sites — "any of 203 tracked vtables") nominally landing in slot-1 of `0x820a9c28` or slot-0 of `0x820aa024`, then static-bl follows downstream. The cluster's **actual constructors** — 6 vptr-writer fns `sub_8228F858, sub_82293EC8, sub_82294110, sub_82294898, sub_822A0860, sub_822A0E90` — are **dead in both views** (no static caller, no indirect call). The cluster is reachable from outside but is never *instantiated* through a known top-down path. + +## Audit-033 chain status + +`sub_82451E20 / sub_82450720 / sub_82450638 / sub_821CB968 / sub_821CD458 / sub_821CBEA8` ALL indirect-reachable. Terminates at `sub_821CECF0 → sub_821C4988 → sub_821C4EB0` — **genuine orphans** (7-hop backward BFS finds zero reachable ancestor under any xref kind). Audit-033 saw probes hit `sub_82451E20` 62×/8s in ours vs 28×/50s wall in canary — busy-loop divergence already isolated; loop-exit gate remains the divergence target there. + +## Audit-033 named callers status + +`sub_82172BA0` (caller of `0x8228E138`) is dead in both views — not statically reachable, not indirect-reachable. M5.5 didn't bridge it (no vptr-write inference). Yet audit-033 saw it fire 2× canary / 1× ours. **The plain BFS misses these paths entirely** — likely because LR=`0x82172BF8` is reached via a dispatch that M5.5 marks cand_count=203 (noise) and the BFS prunes. Worth a sharper M5.5 follow-up later. + +## Top-3 probe targets ranked + +1. **`sub_8228F858` writer at PC `0x8228FAC8`** — the missing constructor for vtable `0x820a9c28` (the only fully-cluster-resident vtable). 2 callers `sub_82289FD0/sub_82285838` also dead. **Probe `--log_lr_on_pc=0x8228FAC8` in canary** to capture LRs. If fires: walk upward in DB. If doesn't fire in canary either: the cluster's UI subsystem isn't activated in canary at this boot horizon. +2. **`sub_822F1AA8`** — statically reachable, **685 outgoing ind_calls** including 4 distinct cluster targets (`0x822F1B4C → sub_82293ED8/sub_82291740`, `0x822F1C00 → sub_82291C90`, `0x822F1D58 → sub_82291BD0`). Highest-fanout statically-reachable function feeding the cluster. Probe `--lr-trace` on entry; check if any of 4 ind_call sites fire canary-vs-ours divergent. +3. **`sub_82172BA0` LR `0x82172BF8`** — confirm audit-033's 2× canary / 1× ours divergence holds post-overhaul; pivot to disasm of predicate inside. + +## Cand_count precision note + +M5.5's `indirect_dispatch_sites.candidate_count` is a useful precision indicator. **`cand_count ≤ 10` = actionable**, `cand_count = 203` = "any of 203 tracked vtables" = effectively noise for cluster-target ranking. Only **5 dispatch sites globally** have `cand_count=2` with a cluster target (`0x821E7BE8/D30/DA0/0x821E8344/0x8241F348` → vtable `0x820aa024 slot=5/6`). Future M5.5 work should filter by `cand_count` first. + +## Methodology / ledger + +No new reading-error ledger entry (still 12). Corroborates: cluster identity reframe (UI/save-game), audit-009 framing, audit-031/032 "pure this->vptr" claim, audit-033 chain. + +Trace artifacts at `/home/fabi/RE Project Sylpheed/xenia-rs/audit-runs/audit-044-m55-cluster-survey/` — `survey.py`, `queries.sql`, `schema.txt`, `query_outputs/{q3_newly_reachable,q5_dispatch_sites,q6_cluster_vtables}.csv`. Master HEAD `7bc9e3a` unchanged. swaps=2 draws=0 plateau intact. + +## Recommended AUDIT-045 + +Dispatch with: re-apply audit-030 `--log_lr_on_pc` patch (30 LOC, 4 files) to canary; probe `0x8228FAC8` (writer site) + `0x8228F858` (entry) + `0x82172BF8` (audit-033 LR); also `0x822F1B4C / 0x822F1C00 / 0x822F1D58` (sub_822F1AA8's 4 cluster-targeting ind_calls). Cross-check with our impl `--pc-probe` (-n 500M). Revert patch at session close. diff --git a/migration/claude-memory/project_xenia_rs_audit_045_cluster_ctor_probe_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_045_cluster_ctor_probe_2026_05_10.md new file mode 100644 index 0000000..d22c427 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_045_cluster_ctor_probe_2026_05_10.md @@ -0,0 +1,64 @@ +--- +name: AUDIT-045 cluster ctor probe + 13th reading-error class 2026-05-10 +description: Falsified audit-044's "missing cluster constructor" hypothesis — sub_8228F858 vptr-write at 0x8228FAC8 fires 0× in canary too (50s wall) AND in ours (-n 500M). Cluster genuinely not instantiated at this boot horizon in either engine. T6 (audit-033 LR 0x82172BF8) confirms gateway chain entry→sub_8216EA68→sub_822F1AA8→sub_82172BA0 runs in both. New 13th reading-error class: probe-firing-granularity divergence — ours --pc-probe fires only at basic-block entry, canary --log_lr_on_pc per-instruction. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-045 (2026-05-10, READ-ONLY, master `7bc9e3a` unchanged, canary `6de80dffe` patch reverted clean)**: cluster constructor activation probe via re-applied audit-030 `--log_lr_on_pc` canary patch (30 LOC, 4 files). 6 target PCs probed in both engines. + +## Headline (4 of 6 results) + +| Tag | PC | Site | Canary | Ours | Class | +|-----|----|----|----|----|----| +| T1 | 0x8228FAC8 | vptr-write of vtable 0x820a9c28 in sub_8228F858 | **0** (50s) | **0** (-n 500M) | (c) neither — REFUTES audit-044 hypothesis | +| T2 | 0x8228F858 | entry of cluster ctor sub_8228F858 | **0** | **0** | (c) neither | +| T3-T5 | 0x822F1B4C/0x822F1C00/0x822F1D58 | ind_call sites in sub_822F1AA8 | 1/5/4646 | 0/0/0 | (b) but **artefactual** — reading-error #13 | +| T6 | 0x82172BF8 | post-bl PC inside sub_82172BA0 (audit-033 LR) | 2 (90s) | 1 | (a) **both fire** — gateway chain runs | + +## Decisive falsifications + +1. **T1=T2=0 in CANARY** — refutes audit-044's "missing cluster ctor is the bootstrap divergence" hypothesis. Cluster's 6 vptr-writer ctors (`sub_8228F858, sub_82293EC8, sub_82294110, sub_82294898, sub_822A0860, sub_822A0E90`) fire 0× in canary at 50s — not just in ours. Consistent with RECONCILE-B: Linux Vulkan/XCB host-presenter stalls before front-end-UI state machine advances; Lutris Windows reaches frame 72/186 vs Linux 42/186. The cluster genuinely **isn't activated at this boot horizon in either engine**. + +2. **T6 fires in both** with exact LR-walk verified-direct chain in ours: +``` +frame 0: pc=0x82172BF8 in sub_82172BA0 (post-bl) +frame 1: lr=0x821744cc +frame 2: lr=0x822f1d5c in sub_822F1AA8 +0x2B4 (post T5 ind_call) +frame 3: lr=0x8216ee14 in sub_8216EA68 +0x3AC (post bl 0x822F1AA8) +frame 4: lr=0x824ab8e0 in entry_point +0x198 +``` +Static edges verified: `0x8216EE10 bl 0x822F1AA8` direct from sub_8216EA68; entry_point→sub_8216EA68 direct. **Gateway path entry→sub_8216EA68→sub_822F1AA8→sub_82172BA0 runs in both engines.** What doesn't run is the cluster-ctor branch off this gateway. The ind_call at 0x822F1D58 dispatches on r3=`0xBC22C910` singleton (canary, fires 4646× hot) whose vtable is NOT 0x820a9c28 (cluster). Cluster vtables are statically 2-of-203 candidates at each ind_call but at this horizon never materialise as live targets. + +## Reading-error ledger — 13th entry: probe-firing-granularity divergence + +xenia-rs `--pc-probe` fires inside worker_prologue at **basic-block entry only** (single `kernel.fire_ctor_probe_if_match(hw_id, mem)` check in `crates/xenia-app/src/main.rs:2200`). Canary's audit-030 `--log_lr_on_pc` patch emits `Trap(100)` **inline in HIR** for every guest instruction whose PC matches — fires per-execution. Comparing fire counts at mid-block PCs (ind_call instructions like 0x822F1D58, store-multiple, mid-loop bodies) systematically yields ours=0 even when code executes equivalently. **Mitigation**: prefer function-entry PCs (always block-entry in both engines) or post-bl return PCs (block-entry by JIT construction); for mid-block PCs, validate via back-chain reachability of the containing function rather than fire-count parity. Joins existing 12 ledger entries. + +## DB usage caveat (NOT a data bug) + +`v_call_graph` filters by `xrefs.source` (= instruction PC of the BL) rather than `source_func`. Querying "callees of sub_8216EA68" via the view yields 0; querying via `xrefs.source_func` correctly yields 24 static callees including `sub_822F1AA8`. **Future audits**: prefer `xrefs.source_func` for caller-set queries. + +## Strategic position + +Audit-009 cluster activation is genuinely past Linux Debug canary's reach (3 horizons confirmed: VERIFY-A 35s, audit-034 5min, audit-039 15min, audit-045 50s on more PCs — all 0 fires for cluster ctors). To capture cluster activation in canary, we'd need to bypass the Linux Vulkan/XCB host-presenter — that means rebuilding canary under Wine/Lutris (large investment). + +ALTERNATIVE pivot: focus on the audit-034/035 frame-chain divergence at sub_82450720 — that's a concrete, reachable, divergent code path (5/5 ours iter vs 3.75/5 canary). Audit-035's slot-table writer divergence (canary 0xBC3xxxxx physical-heap pointers vs ours 0x4024xxxx v40-bump) is a candidate. **Probe predicate `0x82450904` in both engines** to capture loop-exit cr0.eq=1 firing (audit-046 path B). + +## Recommended AUDIT-046 (path B preferred) + +Probe loop-exit `bne` predicate at PC `0x82450904` (audit-034's identified divergence in `sub_82450720+0x160..+0x1F4`) with `--log_lr_on_pc=0x82450904` in BOTH engines, capturing iteration count via cycle delta within each loop entry. + +Sharp 4-dim cascade prediction: +- A: predicate at 0x82450904 fires with `cr0.eq=1` on iter 3-4 in canary, never in ours +- B: stalled-tid state unchanged (renderer plateau is upstream of audit-009) +- C: sub_82451E20 over-firing ratio 5.5× (62× ours / 28× canary in audit-033) drops toward 1× +- D: `draws>0` unlikely in this single-step fix (audit-009 cluster is upstream-blocked) + +Cost: 1 PC × 50s canary + 1 ours run = ~5 min runtime + analysis. Modest. + +Path A (fallback): rebuild canary under Wine/Lutris (Windows host stack) to bypass Vulkan/XCB block; rerun T1/T2 there. Larger investment. + +## Discipline + +xenia-rs HEAD `7bc9e3a` unchanged; canary HEAD `6de80dffe` clean (`git status --short` empty for tracked); patch saved at `audit-runs/audit-045-cluster-ctor-probe/canary-patch.diff`; raw probe logs preserved. Tests count untouched. + +Trace at `audit-runs/audit-045-cluster-ctor-probe/` (8 canary logs + 12 ours logs + findings.md + canary-patch.diff). diff --git a/migration/claude-memory/project_xenia_rs_audit_046_loop_exit_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_046_loop_exit_2026_05_10.md new file mode 100644 index 0000000..eefca13 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_046_loop_exit_2026_05_10.md @@ -0,0 +1,48 @@ +--- +name: AUDIT-046 loop-exit predicate falsification 2026-05-10 +description: Falsifies audit-035's slot-pointer-region divergence as causally responsible AND audit-034's "canary 3.75/5 vs ours 5/5" loop iter divergence. Both engines run 5/5 iters at sub_82450720+0x160..+0x1F4 and fall through to no-match exit. Slot-table region divergence (canary 0xBC3xxxxx vs ours 0x4024xxxx) is REAL but behaviorally inert — predicate compares within each engine's own heap region. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-046 (2026-05-10, READ-ONLY, master `7bc9e3a` unchanged, canary `6de80dffe` clean)**: probed loop-exit predicate at PC `0x82450904` inside `sub_82450720+0x160..+0x1F4`. Re-applied audit-030 patch verbatim from audit-045's saved diff. + +## Falsifications + +1. **Audit-035's slot-pointer heap-region divergence as causal**: REFUTED. Slot table at `0x828F3BD4` in ours has `0x4024xxxx` v40-bump pointers (slots 1/2/4: 0x4024A240/0x4024AEE0/0x4024A300, all size=8); slot 0 zero; slot 3 zero (audit-035's "over-cycled 0..0xB" stabilized). Canary has `0xBC3xxxxx` physical-heap pointers per audit-035. The predicate at 0x82450904 compares sub_82451E20's vptr-walk return (LHS) vs slot-local sum (RHS). **Both LHS and RHS originate from the same heap region within a given engine**, so the cross-engine address-space difference is internally consistent and **cancels as a divergence cause**. The region split is REAL but BEHAVIORALLY INERT here. + +2. **Audit-034's "canary 3.75/5 vs ours 5/5" loop-iteration divergence**: REFUTED at current revision. Canary 70s probe `0x82450904` = 140 fires; canary 70s probe `0x82450918` (loop-completion no-match exit) = 22 outer-call completions; ratio 140/22 = **6.4 fires per outer call** ≈ 5+ iters/call when accounting for fallthrough fires. Ours single-run probe `0x82450908` (post-bne block-entry, fires every iter that fails) = **80 fires** = 16 outer calls × 5 iters/call exactly. **Both engines exhibit identical 5/5 loop shape with 0 early matches**. Audit-034's earlier measurement was either pre-overhaul or stale. + +## Reading-error class 13 reconfirmed (mid-block PC unprobeable in ours) + +`0x82450904` is the `bne` instruction itself (mid-block); ours `--pc-probe` only fires at basic-block entry → 0 fires in ours despite hot execution. Workaround used: probe block-entry alternatives `0x82450890` (loop-back top), `0x82450908` (post-bne fall-through), `0x8245092C` (predicate-match exit), `0x82450918` (loop-completion no-match exit). Cross-validated: `0x82450908` = 80 fires (post-fail-increment) = 16×5; `0x82450918` = 16 fires (outer completions) = 16×1; ratio 5/1 confirms 5/5 iters/call. This handles AUDIT-045's reading-error #13 properly — a confirmed working pattern. + +## What this means strategically + +The audit-035/036 frame-chain divergence at sub_82450720 is now triple-refuted as a causal target: +- Audit-036 already refuted "η-class record-layout divergence as stated" (refined to ε-class heap-region per audit-035) +- Audit-043 refuted ε-class heap-region as "missing/wrong write at +0x00" (12th reading-error class: VA-equality fallacy) +- Audit-046 refutes ε-class heap-region as causally responsible at predicate 0x82450904 + +The slot-table divergence is real DATA but doesn't gate any predicate behavior. **Drop sub_82450720 chain entirely as critical-path target.** + +## Recommended AUDIT-047 + +**Option C (preferred): γ-cluster handle wedges per audit-042**. Concrete stalled handles in our impl at end-of-run: +- `0x10A0+0x10A4` (worker Event+Sema pair, tid=6) +- `0x12AC` (Semaphore with 2 waiters, tid=14,15) +- `0x1004` (Event/Manual, tid=11) +- `0x1020`, `0x1040`, `0x1544`, `0x1578` (additional NO_SIGNALS_DESPITE_WAITS) + +These are γ-class missing-signaler bugs — concrete and reachable. Approach: dump full handle table state at -n 500M; for each wedge entry, capture create-LR + wait-LR + nearest-expected-signaler from caller chain. Cross-check against canary's `KeSetEvent`/`NtSetEvent`/`KeReleaseSemaphore` calls (need handle-namespace δ-class mapping per audit-040: ours `0x12AC` ↔ canary `0xF8000xxx` for the corresponding object). + +Sharp cascade prediction: +- A: signal_attempts on the wedged handle 0 → ≥1 +- B: stalled tid state Blocked → Ready +- C: `` count drops ≥2 +- D: `swaps>2` OR `draws>0` UNKNOWN — γ-cluster typically plateaus on a sister wedge, low cascade probability per audit-042 + +Cost: 1 ours run + 1-2 canary probes = ~10 min runtime + analysis. + +## Discipline + +xenia-rs HEAD `7bc9e3a` unchanged; canary HEAD `6de80dffe` clean tree confirmed. Patch reverted. Tests count untouched. Trace at `audit-runs/audit-046-loop-exit/{canary-0x82450904,canary-0x82450918,canary-0x82450934,ours-loop-probe,ours-loop-detailed,ours-entry-probe}.{log,err}` + `slot-table-dump-ours.log` + `canary-patch.diff`. diff --git a/migration/claude-memory/project_xenia_rs_audit_047_gamma_wedges_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_047_gamma_wedges_2026_05_10.md new file mode 100644 index 0000000..390181f --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_047_gamma_wedges_2026_05_10.md @@ -0,0 +1,68 @@ +--- +name: AUDIT-047 γ-cluster handle wedges + signaler reachability 2026-05-10 +description: 10 NO_SIGNALS_DESPITE_WAITS handles inventoried; per-wedge create-LR/wait-LR/expected-signaler tabulated. Best near-reachable signaler `sub_8245AD00` is statically reachable BUT its callers sit in audit-009 unreachability island. KeReleaseSemaphore = 0 ours / 73,914 canary, all from PC `0x824D229C` = sub_824D21F0+0xAC on r3=0x828A3230 (XAudio mixer) — already-known AUDIT-032 audio host-pump gap. Discipline 3/5 PASS, D dim = NO. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-047 (2026-05-10, READ-ONLY, master `7bc9e3a` unchanged, canary `6de80dffe` clean)**: γ-cluster handle wedge inventory + signaler reachability cross-check. End-of-run dump from xenia-rs `-n 500M --trace-handles`; canary KeReleaseSemaphore probe via re-applied audit-030 patch. + +## Per-wedge table (10 wedges + XAudio mixer) + +| Handle | Kind | tids | wait-fn | reach (s/ind) | best-sig nearby | sig reach | priority | +|---|---|---|---|---|---|---|---| +| 0x1004 | Event/M | 11 | sub_82178960 | N/N | sub_82174AF8 | N/Y | medium | +| 0x1020 | Event/M | 3 | sub_82181838 | Y/Y | sub_821819A8 | N/Y | medium | +| 0x1040 | Event/A | 5 | sub_82450A68 | N/N | **sub_82450218** | **Y/Y** | high | +| 0x10A0 | Event/A | 6 | sub_82458B90 | N/N | **sub_8245AD00** | **Y/Y** | high | +| 0x10A4 | Sema | 6 | sub_82458B90 | N/N | sub_8245AD00 | Y/Y (8 sigs / 5 wakes — wake-eligibility issue) | high | +| 0x12AC | Sema | 14,15 | sub_822C6878 | N/Y | sub_822C8B50 | N/Y | medium | +| 0x1530 | Timer/A | 16 | sub_824560F8 | Y/Y | sub_8245AD00 | Y/Y | high | +| 0x1534 | Event/A | 16 | sub_824560F8 | Y/Y | sub_8245AD00 | Y/Y | high | +| 0x1544 | Event/M | 17 | sub_82170438 | N/N | sub_82174AF8 | N/Y | medium | +| 0x1578 | Event/A | 19 | sub_823DDB58 | N/N | (none in ±32KB) | – | low | +| 0x828A3230 | Sema (XAudio) | 10 | sub_824D2940 | N/N | sub_824D21F0 | N/N | already named (AUDIT-032) | + +**Of 125 total signal-source fns** (11 direct + 114 via wrappers `sub_824AA2F0` NtSetEvent / `sub_824AB158` NtReleaseSemaphore), only `sub_8245AD00` and `sub_82450218` are statically reachable AND nearby to wedge wait-fns. **sub_8245AD00 covers 4 wedges** (0x10A0, 0x10A4, 0x1530, 0x1534) — highest priority by coverage. + +## Signaler call-count comparison (90s canary vs ours 500M-instr) + +| API | Ours | Canary | Note | +|---|---|---|---| +| **KeReleaseSemaphore** | **0** | **73,914** | All from PC `0x824D229C` = sub_824D21F0+0xAC; all r3=0x828A3230 (XAudio) | +| KeSetEvent | 1 | (low) | – | +| NtSetEvent | 3,334 | 1 | inverse — wrapper-path active in ours | +| NtReleaseSemaphore | 393 | 1 | inverse — wrapper-path active in ours | +| NtWaitForSingleObjectEx | 1,489,791 | – | – | + +The 0-vs-73,914 KeReleaseSemaphore divergence is on the XAudio mixer Semaphore, signaled from `sub_824D21F0`. **This is exactly the AUDIT-032 audio host-pump gap** — canary uses a host-side WorkerThreadMain to drive `sub_824D21F0` directly via `processor_->Execute(callback_pc=0x824D6640, args)` without a guest call site. Ours's `XAudioRegisterRenderDriverClient` correctly registers the callback but does NOT spawn a host worker thread to pump it. This is a known correctness gap, NOT a new finding. + +## Convergence pattern (CRITICAL methodological observation) + +All HIGH-priority wedges (`sub_8245AD00` covering 0x10A0/0x10A4/0x1530/0x1534) share a single fact: **the signaler is reachable, but its CALLERS are unreachable**. They sit in the same audit-009 unreachability island that all prior renderer-hunt audits (009/016/017/020/021/026/027/029/031/032/033/034/035/039/044/045/046) converge on. + +**Implication**: γ-cluster wedge fixes can't open up renderer/draws-cascade independently. They drop NO_SIGNALS counts (B+C cascade dims succeed) but D=draws>0 fails because the unreached island is upstream of every concrete wedge. + +## Discipline gate verdict (sub_8245AD00 candidate) + +- (a) Named bug class with concrete evidence: γ (signaler reachable, callers in unreached island). **PASS w/ caveat**. +- (b) LOC budget ≤ 80: **FAIL** — making sub_8245AD00 fire requires unsticking the front-end-UI cluster = audit-009 plateau. Not a bounded fix. +- (c) Sharp 4-dim cascade: A=signal_attempts 0→8+, B=tid 6/16 Blocked→Ready (high conf), C=NO_SIGNALS 10→6 (4 wedges share sub_8245AD00); **D=swaps>2 OR draws>0: NO** (front-end UI cluster ≠ renderer per RAPID-SURVEY-Q4 reframe). **PARTIAL.** +- (d) No renderer/GPU mods: PASS. +- (e) Lockstep determinism preserved: PASS. + +3/5 PASS, 2/5 FAIL/PARTIAL. + +## Strategic floor reached + +Linux Debug canary's reach genuinely caps at "intro-video frame ≈ 42/186; cluster activation never triggered." Four autonomous-mode audits (044-047) converge on this. Audits 045 confirmed the cluster ctors don't fire in canary either; audit 046 confirmed the slot-table chain is behaviorally inert; audit 047 confirms γ-cluster wedges all hit the same island. + +**To make further progress on draws>0**, ONE of these is required: +1. **Wine/Lutris canary build** to bypass Vulkan/XCB Linux presenter block → reach post-intro state machine and capture cluster activator naming +2. **Audio host-pump fix landed** (AUDIT-032) — won't open renderer but closes the largest known canary-only export gap (KeReleaseSemaphore 0→non-zero); ~60-120 LOC mirroring canary `apu/audio_system.cc:84-159` +3. **Different probing technique** — e.g., guest-thread injection to force-call cluster ctor entry points and observe what guard predicates fail (high-risk, prone to causing HW-thread hijacks per APUBUG-PRODUCER-001) + +Within Linux Debug + READ-ONLY discipline, autonomous mode has reached diminishing returns. Recommended: hand-back to user for path selection. + +## Discipline + +xenia-rs HEAD `7bc9e3a` unchanged; canary HEAD `6de80dffe` clean (`git status --short` empty). Patch reverted via `git checkout` of 4 files. Tests count untouched. Trace at `audit-runs/audit-047-gamma-wedges/{wedge-analysis.csv, wedge-analysis.txt, ours-end-state.log, ours-end-state.err, canary-KeReleaseSemaphore-0x8284E49C.log (32MB), canary-patch.diff}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_048_audio_host_pump_fix_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_048_audio_host_pump_fix_2026_05_10.md new file mode 100644 index 0000000..1e74102 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_048_audio_host_pump_fix_2026_05_10.md @@ -0,0 +1,112 @@ +--- +name: AUDIT-048 audio host-pump fix LANDED 2026-05-10 +description: Path 2 of post-autonomous handback. Implemented dedicated audio worker thread per client (mirroring canary apu/audio_system.cc:84-159 in xenia-rs's threading model). 75 net LOC across 4 files. Cascade A/B/D from AUDIT-032 prediction CONFIRMED. swaps regressed 2→1 (degenerate splash lost) but boot phase advanced past audio-wait stall into Stfs/Xam content/crypto init. draws unchanged at 0 (expected per AUDIT-032 — audio≠renderer). Working tree dirty; user re-baselines goldens separately. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-048 (2026-05-10, IMPLEMENTATION LANDED in working tree, NOT COMMITTED)**: AUDIT-032 audio host-pump fix per Plan B (dedicated worker thread, not victim hijack). Mirrors xenia-canary's `apu/audio_system.cc:84-159` `WorkerThreadMain` pattern in xenia-rs's threading model. + +## Files modified (4 crate files, no goldens, no canary) + +``` +crates/xenia-app/src/main.rs | 94 +- (try_inject_audio_callback rewrite + flag-default semantics) +crates/xenia-kernel/src/exports.rs | 82 +- (xaudio_register_render_driver spawns dedicated worker) +crates/xenia-kernel/src/state.rs | 24 +- (xaudio_tick_enabled default true) +crates/xenia-kernel/src/xaudio.rs | 41 + (worker_refs field + helpers) +``` + +107 insertions, 32 deletions = **75 net non-comment LOC**, well under AUDIT-032's 60-120 budget. + +## Design (Plan B — chosen over Plan A "just enable flag") + +The pre-existing `--xaudio-tick` flag worked via `try_inject_audio_callback` which **hijacked a random Ready/Blocked guest thread** as victim. Documented in flag's docstring as "shifts boot trajectory enough to regress swaps=2→1" — flagged as APUBUG-PRODUCER-001 per AUDIT-032. + +**Plan B**: dedicated guest worker thread per registered client, never hijack other threads. + +1. `XAudioRegisterRenderDriverClient` (exports.rs:3074): after `xaudio.register()` succeeds, also `allocate_thread_image` + `state.scheduler.spawn(SpawnParams{ entry=callback_pc, start_context=wrapped_callback_arg, create_suspended=true, ... })`. Immediately mark spawned thread `Blocked(WaitAny { handles: vec![0xF000_0000 | i], deadline: None })` — synthetic handle outside normal allocator range (≥0x1000), not registered in state.objects, so wake_eligible_waiters never finds it. Stores `ThreadRef` in new field `XAudioState::worker_refs[i]`. + +2. `try_inject_audio_callback` (main.rs:3261): replace random-victim selection with `let target_ref = kernel.xaudio.worker_refs[client_index]?`. Existing `SavedCallbackCtx::capture` + PC injection + `lr=LR_HALT_SENTINEL` + `gpr[3]=wrapped_arg` unchanged. Existing restore path (main.rs:2218) re-blocks worker on synthetic when callback returns to LR_HALT. + +3. `state.rs:309`: `xaudio_tick_enabled` default `true`. CLI flag becomes force-on/off override. + +## Cascade verification (500M-instr validation) + +Pre-fix baseline = `audit-runs/audit-047-gamma-wedges/ours-end-state.log` (-n 500M, ~3 min wall). +Post-fix log = `audit-runs/audit-048-audio-host-pump/validation-500m.log`. + +| Dim | Predicted | Observed | Pass? | +|-----|-----------|----------|-------| +| A | tid=9 leaves Blocked(WaitAny [0x828A3254]) | tid=9 was Blocked at `pc=0x824d28d0` → now Ready at `pc=0x824d1404 lr=0x824d22b4` | ✅ | +| B | tid=10 leaves Blocked(WaitAny [0x828A3230]) | tid=10 was Blocked at `pc=0x824d2990` → now Ready same `pc/lr` as tid=9 | ✅ | +| C | XAudioSubmitRenderDriverFrame 0→non-zero | `XAudioSubmitRenderDriverFrame=0` (Sylpheed hasn't reached submit phase in 500M); `XAudioGetVoiceCategoryVolumeChangeMask=1` NEW — mixer setup path executed | ◐ partial | +| D | KeReleaseSemaphore 0→non-zero | **KeReleaseSemaphore=1** (was 0); xaudio.callback.delivered=1 | ✅ | + +**Bonus**: audit-042's "real wedge" handles `0x10A0+0x10A4` worker pair on tid=6 ALSO went Blocked→Ready as a downstream effect. + +## Boot trajectory shift (the meaningful-progress signal) + +Kernel call counts changed dramatically post-fix vs pre-fix audit-047 baseline: + +| Counter | Pre-fix | Post-fix | Delta | +|---------|--------:|---------:|-------| +| NtWaitForSingleObjectEx | 1,489,791 | 30 | **-99.998%** | +| NtSetEvent | 3,334 | 68 | **-98%** | +| NtCreateEvent | 411 | 104 | -75% | +| NtReleaseSemaphore | 393 | 101 | -74% | +| KeReleaseSemaphore | 0 | 1 | NEW | +| StfsCreateDevice | 0 | 1 | NEW (Secure Transacted File System mount) | +| ObCreateSymbolicLink | 0 | 1 | NEW (VFS symlink) | +| XamContentCreateEnumerator | 0 | 1 | NEW (savegame discovery) | +| XamEnumerate | 0 | 1 | NEW (content enum) | +| XamTaskSchedule | 0 | 1 | NEW (task scheduler) | +| ExCreateThread | 0 | 10 | NEW (10 new threads) | +| KeSetAffinityThread | 0 | 7 | NEW | +| NtCreateSemaphore | 0 | 4 | NEW | +| NtWaitForMultipleObjectsEx | 0 | 94 | NEW | +| XeCryptSha | 0 | 1 | NEW (cryptography) | +| XeKeysConsolePrivateKeySign | 0 | 1 | NEW (console keys) | + +The system **left the audio-wait busy loop** (1.4M waits → 30) and **entered the savegame/content/crypto init phase**. New executables/tasks/threads spawning. Cryptographic operations firing. Storage device created. + +## swaps + draws + +- **swaps: 2 → 1**. Documented regression. Pre-fix swaps=2 came from a degenerate splash repeat (system idle on audio waits, GPU producing splash frames in tight loop). Post-fix loses one splash repeat (splash period interrupted by audio worker activity) but main thread advances PAST splash entirely. New stall: tid=1 main `Blocked(WaitAny [4736=0x1280])` at `pc=0x824ac578` — different wedge from pre-fix's tid=1 stall. +- **draws: 0 → 0**. EXPECTED per AUDIT-032 methodology correction. Audio gate ≠ renderer gate. The audit-009 cluster (front-end UI / save-game / mission-select / HUD) remains independently blocked. Renderer plateau not crossed. + +## Build/test status + +- `cargo build --release`: succeeds (clean 14.06s, incremental 6.57s). No new warnings. +- `cargo test --release -p xenia-kernel --lib`: 127/127 pass (incl. 14 xaudio tests + 4 register-driver export tests). +- `cargo test --release -p xenia-app --lib`: 5/5 non-ignored pass. +- Lockstep goldens (`sylpheed_n2m.json`, `sylpheed_n50m.json`): `#[ignore]`-gated, NOT re-baselined this session per task brief. They WILL drift on this fix; user re-baselines as separate commit. + +## Open follow-ups (NOT in scope this session) + +1. **Re-baseline lockstep goldens** under audio-on default. Single commit, isolated. +2. **Probe new wedge** — tid=1 main now Blocked on handle 0x1280 (=4736) at `pc=0x824ac578`. Identify what 0x1280 is, what's expected to signal it. Likely the next γ-class wedge per audit-047 pattern. +3. **Renderer cluster (audit-009)** — with audio unblocked, longer-horizon traces (≥1B instr) may now reach previously-gated cluster L0 PCs. Worth a 5B-instr run to see if cluster activates within ours's reach now. +4. **XAudioSubmitRenderDriverFrame** — currently a no-op stub (`exports.rs:3127-3133`). Would need real impl for game audio to render. Not blocking renderer, but a known gap. +5. **Worker hygiene** — worker thread is left parked on synthetic handle when client unregisters; Sylpheed never unregisters during boot but multi-stream titles might hit this. +6. **Multi-client untested** — Sylpheed registers exactly 1 client. Per-slot data layout should compose to 8 but verify on a multi-stream title. + +## Strategic position post-AUDIT-048 + +Path 2 (audit-032 audio fix) **DONE**. Cascade A/B/D succeed. Boot trajectory advanced past audio stall into a fundamentally new phase (Stfs/Xam content/crypto). swaps regressed 2→1 (degenerate splash) but main thread advanced past splash. draws unchanged (expected). + +User decision: did audit-048 produce "meaningful progress"? Argument for YES — boot phase fundamentally advanced; previously-blocked tids (6/9/10/12) are now Ready; new kernel paths active. Argument for NO — `swaps>2 / draws>0` plateau still uncrossed; one of the two splash swaps lost. + +If "yes": next critical-path target is the new tid=1 stall on handle 0x1280 OR a longer-horizon run to see if renderer cluster activates. If "no": queue Path 1 (Wine/Lutris canary build) for a new oracle. + +## Trace artifacts + +- `audit-runs/audit-048-audio-host-pump/findings.md` (worker agent's writeup) +- `audit-runs/audit-048-audio-host-pump/validation-500m.log` (95 KB, 879 lines) +- `audit-runs/audit-048-audio-host-pump/diff-summary.txt` (17 KB, full diff of 4 files) + +## Discipline + +- xenia-rs HEAD `7bc9e3a` UNCHANGED — working tree dirty (4 crate files + audit-findings.md + audit-runs/audit-048-*). +- xenia-canary HEAD `6de80dffe` not touched (this session's work was xenia-rs only). +- No commits. No goldens re-baselined. +- Tests count: kernel 127/127, app 5/5 non-ignored. +- Lockstep digest WILL move (intentional fix); goldens deferred. diff --git a/migration/claude-memory/project_xenia_rs_audit_049_tid1_stall_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_049_tid1_stall_2026_05_10.md new file mode 100644 index 0000000..9568b9e --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_049_tid1_stall_2026_05_10.md @@ -0,0 +1,83 @@ +--- +name: AUDIT-049 tid=1 stall on 0x1280 — same island one layer deeper 2026-05-10 +description: Post-AUDIT-032 the new front-line stall (tid=1 main on handle 0x1280 = THREAD handle for tid=13) is a thread-join. tid=13 itself stalls on event 0x1288 created in sub_821CB030+0x128 (silph::GamePart_Title::UImpl::*) — front-end UI cluster, audit-009 island. Worker cluster's 5 NtSetEvent callers all statically unreachable. Discipline gate 3/5 PASS, same as AUDIT-047. 5th consecutive audit (044/045/046/047/049) converging on the same unreachability island. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-049 (2026-05-10, READ-ONLY, master `25704c5` unchanged)**: post-AUDIT-032 wedge analysis — investigates the new tid=1 main stall that emerged after the audio host-pump fix unblocked the boot trajectory. + +## Wedge identity + +- **Handle 0x1280**: NOT an event/sema/timer — it's a **Thread handle** for `tid=13` (`Thread(id=13, exit=None)`). tid=13 was created by `ExCreateThread` with entry `0x821748F0` and start_ctx `0x4024a640`. +- **tid=1's wait**: at `pc=lr=0x824ac578` inside `sub_824AC540` = the **NtWaitForSingleObjectEx thunk**. Reached via alias `sub_824AA330` (KeWaitForSingleObject, alertable=0). Caller chain: `sub_82173990+0x2D0` (a thread-join helper) ← `sub_822F1AA8` ← `sub_8216EA68` (vtable=`0x820a183c`, class `AVSilph@silph`, top-level `silph::Silph`). +- **tid=1 is doing a THREAD JOIN** — waiting for tid=13 to exit. + +## Sub-stall (the actual bug) + +tid=13 itself stalls on **handle 0x1288 = Event/Auto**, created at `sub_821CB030+0x128 (0x821CB158)` via NtCreateEvent thunk `sub_824A9F18`. Pattern: +1. tid=13 creates event 0x1288, stores ptr at `[r31+92]` and `[r31+96]` +2. Submits work via `bl 0x82452DC0` (work-submitter, passes event-handle ptr in r6) +3. Waits on 0x1288 INFINITE +4. The work-submitter's callback should `NtSetEvent(0x1288)` to wake tid=13 +5. The callback never fires + +tid=13 create-chain (top-down): +- thread entry `sub_821748F0` +- ← `sub_821C4EB0` (vtable `0x820a3e00` = `UImpl@GamePart_Title@silph`) +- ← `sub_821CC3F8` (vtable `0x820a3dc8` = `AVGamePart_Title@silph`) +- ← `sub_821CBA08` +- ← `sub_821CB030` — creates event 0x1288, submits work, waits + +**ALL of this chain is in the audit-009 cluster** (`0x82285000-0x82294000` neighbors per RAPID-SURVEY's reframe — front-end UI / save-game / mission-select / HUD). + +## Static signaler enumeration + +**5 NtSetEvent direct callers in worker cluster `0x82450000-0x8245C000`** are the signaler candidates: + +| Candidate | static reach | indirect reach | priority | note | +|---|---|---|---|---| +| `sub_82452DC0` (work-submitter) | YES | YES | high | depth-5 transitive call set INCLUDES `sub_824AA2F0` (NtSetEvent wrapper) — work chain *should* signal | +| `sub_82458A70` | NO | YES | medium | called from sub_82450550 / sub_82450B68 worker pool | +| `sub_824566D0` | NO | YES | medium | only via indirect dispatch | +| `sub_82458B90` / `sub_824500E8` | NO | NO | low | unreachable; sub_82458B90 is itself a wait-fn for AUDIT-047 wedges 0x10A0/0x10A4 | +| `sub_82453910` (post-wait pulse) | YES via sub_82173990 | NO | not-applicable | called from tid=1's join AFTER wait completes — can't be the signaler | + +The tight tid=13 region (`0x821C0000..0x821CD000`) contains **ZERO callers** of any signaling kernel API. All `silph::*` / `GamePart_Title::*` / `UImpl::*` callers are statically AND indirectly **unreachable from `entry_point`** — same audit-009 island that has blocked every renderer-hunt audit. + +## Discipline gate verdict + +- (a) Bug class: **γ-cluster (missing-signaler) compounded with η-island (caller unreachable)**. PASS w/ caveat. +- (b) LOC budget ≤ 80: **FAIL** — making the worker chain fire requires unsticking `silph::Silph::*` → `silph::GamePart_Title::UImpl::*` upstream state-machine bring-up. Not bounded. +- (c) Sharp 4-dim cascade prediction: A=signal_attempts on 0x1288 0→≥1 (IF worker fires); B=tid=1 unblock probable on A; C=NO_SIGNALS drops 1+; **D=swaps>1 OR draws>0: NO** (same RAPID-SURVEY-Q4 reframe — front-end UI cluster ≠ renderer). PARTIAL. +- (d) No renderer/GPU mods: PASS. +- (e) Lockstep determinism preserved: PASS. + +**3/5 PASS, 2/5 FAIL/PARTIAL** — same verdict as AUDIT-047. + +## Strategic position + +This is the **5th consecutive autonomous-mode audit** (044, 045, 046, 047, 049) to converge on the same audit-009 unreachability island. The AUDIT-032 audio fix advanced the boot trajectory ONE LAYER deeper: +- Pre-fix: spinning on audio waits, splashing twice +- Post-fix: spawned `silph::GamePart_Title::UImpl` init thread → blocked on its work-submit event + +But the new front-line stall is in the **same cluster** RAPID-SURVEY-Q4 named, just one step further in the boot. The worker-cluster activator that should fire `sub_82452DC0`'s callback is not running for the same reason audit-009's L1 PCs don't fire — the cluster isn't bootstrapped. + +**AUDIT-047's strategic-floor conclusion stands: Linux Debug canary's reach is the gate.** To make further progress, ONE of: +1. **Wine/Lutris canary build** for new oracle (path 1 from autonomous-run synthesis) +2. Different probing technique (HW-thread injection, high risk per APUBUG-PRODUCER-001) + +## Discipline + +xenia-rs HEAD `25704c5` unchanged. No source modifications. No canary patch applied (skipped per task — static picture was sufficient). `git status --short` showed only pre-existing untracked `audit-runs/*` + `M audit-findings.md` (predates audit-049). + +Trace at `audit-runs/audit-049-tid1-stall-0x1280/{ours-trace.log, signaler-analysis.csv, findings.md}`. + +## Recommended user-decision + +Path 2 (audio-host-pump) made trajectory progress (1 layer deeper into boot) but didn't cross the renderer plateau. The 5-consecutive-converge signal indicates **path 1 (Wine/Lutris canary)** is now justified per the user's prior framing: "save path 1 for after it, when we did not make any meaningful progress." + +"Meaningful" is the judgement call: +- Trajectory: YES (4 wedges unblocked, 16 new exports, +1 layer) +- Plateau: NO (swaps ≤ 2, draws = 0, same cluster blocks) + +Hand back to user. diff --git a/migration/claude-memory/project_xenia_rs_audit_050_reframe_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_050_reframe_2026_05_10.md new file mode 100644 index 0000000..7d8c9c4 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_050_reframe_2026_05_10.md @@ -0,0 +1,63 @@ +--- +name: AUDIT-050 GENERAL-AUDIT + REVISIT — methodological reframe (2026-05-10) +description: 11 prior audits' "audit-009 cluster unreachable" claim is a STATIC-ANALYZER ARTIFACT, not a runtime fact. Runtime probing via --ctor-probe shows the cluster IS reached: CRT static-init driver sub_824ACB38 (called from entry_point directly) iterates 14 fnptr arrays at 0x82870xxx including 24 GamePart factory regs; tid=13's full chain fires (sub_821C4EB0/sub_821CC3F8/sub_821CBA08/sub_821CB030); work-submitter sub_82452DC0 fires on tid=13 at cycle 8127. Real bug is γ-class missing-signaler INSIDE sub_82452DC0's descendant tree. Angle B (long-horizon 5B) DEFINITIVELY FALSIFIED — bit-identical counters at 500M and 5B. 14th reading-error class added. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🔍 AUDIT-050 META-FINDING (2026-05-10, READ-ONLY, master `25704c5` unchanged)** — two parallel subagents (general-audit + revisit-injection) produced a methodological reframe that invalidates the central premise of audits 044-049. + +## The reframe + +**11 prior audits (009/016/017/020/021/026/027/029/031/032/033/034/035/044/045/046/047/049) claimed "audit-009 cluster unreachable" based on static `bl` BFS + M5.5 typed-vptr indirect BFS.** Direct runtime probing via `--ctor-probe` falsifies this: + +- CRT static-init driver `sub_824ACB38` (statically reachable from `entry_point=0x824ab748`) iterates `0x82870010..0x828708C4` = **557 fnptr slots** with 82 non-NULL fnptrs at boot, including 24 GamePart factory registrations. +- `sub_8280E148` = `RegisterToFactory<0,silph::GamePart_Title>` **fires at cycle 1,396,320**. +- ALL 4 of audit-049's "unreachable" tid=13 chain functions (`sub_821C4EB0`, `sub_821CC3F8`, `sub_821CBA08`, `sub_821CB030`) **DO fire on tid=13** at cycles 1882-3242. +- Work-submitter `sub_82452DC0` **fires on tid=13** at cycle 8127 from `lr=0x821cb1d0` (= `sub_821CB030+0x19C`, the call site audit-049 named). + +**The cluster IS reached at runtime. The actual bug is γ-class missing-signaler INSIDE sub_82452DC0's descendant tree** — work submission completes but the callback path that would `NtSetEvent(0x1288)` and unpark tid=13 doesn't fire. + +## 14th reading-error class + +**BFS-only reachability claims are insufficient when the binary uses CRT-driven function-pointer arrays.** The CRT static-init driver invokes fnptrs via `bcctrl` from data-resident arrays at module-load time; static `bl` BFS doesn't traverse these. Mitigation: verify with `--ctor-probe` runtime probing BEFORE claiming a function is unreached. + +## Angle B (long-horizon 5B) — DEFINITIVELY FALSIFIED + +The general-audit ran -n 5,000,000,000 in background. **Bit-identical kernel counters** to 500M baseline (NtCreateEvent 104=104, NtSetEvent 68=68, NtWaitForSingleObjectEx 30=30, VdSwap 1=1, all 90+ counters identical). Wall time 234s vs ~12s. System reaches deterministic stall and does literally nothing more. Saved at `audit-runs/audit-050-general-audit/extended-5B.err`. + +## Revisit agent: cluster injection feasibility + +Dedicated-worker pattern from AUDIT-048 IS mechanically reusable but cluster injection is **HIGH-risk crash-likely** for chain roots. The two LOW-risk targets only write ONE vptr each — single-ctor injection cannot bootstrap a multi-ctor chain. Recommend AGAINST cluster injection as a fix; only viable as ~60 LOC diagnostic probe. + +**Critical sub-finding**: cluster is **HALF-bootstrapped** — tid=13 (worker-thread subset) runs, but vtables (writer subset) are never written, so virtual dispatches in the running chain hit garbage. Reframes "entire cluster dead" → "vtable-writer subset specifically dead but worker-thread subset live." + +## Combined hypothesis + +The two subagents converge: **somewhere in `sub_82452DC0`'s descendant tree there is a virtual dispatch (or function-pointer call) that goes to a wrong/null vtable slot** because the vtable writer for that slot didn't fire. The right callback (which would NtSetEvent(0x1288)) is at a vtable slot that wasn't populated. + +## Top recommended angle: Angle I — work-submitter trace + +Probe `sub_82452DC0`'s 9 direct call targets + its 1 ind_call site in BOTH engines: +``` +0x82452DC0 -- entry +0x8245AE50, 0x82452068, 0x82452200, 0x8245B000, 0x8245B078, +0x82454A40, 0x82452AB8, 0x82454918, 0x82452EC4 +``` + +Find the divergent branch. Sharp 4-dim cascade prediction: +- A: signal_attempts on 0x1288 0→≥1 +- B: tid=13 Blocked→Ready→Exit, unparking tid=1's join +- C: NtSetEvent count delta 68→≥69 +- **D: swaps>2 OR draws>0: YES POSSIBLE** (30-50% confidence, highest since AUDIT-032) + +Wine canary is **no longer the only credible route**. Angle I has genuine EV given today's reframe. + +## Strategic position + +Sessions 1-2 of this 7-budget: General-audit + Revisit (PARALLEL, completed). +Sessions 3-7 remaining: drive Angle I to root cause + bounded fix. + +Trace artifacts at: +- `audit-runs/audit-050-general-audit/{extended-5B.err, extended-5B.log, angle-comparison.csv, db-evidence.txt}` + +xenia-rs HEAD `25704c5` unchanged. swaps=2 draws=0 plateau intact (post re-baseline). diff --git a/migration/claude-memory/project_xenia_rs_audit_051_work_submitter_trace_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_051_work_submitter_trace_2026_05_10.md new file mode 100644 index 0000000..d6a6070 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_051_work_submitter_trace_2026_05_10.md @@ -0,0 +1,63 @@ +--- +name: AUDIT-051 work-submitter divergence at sub_8245B078 (2026-05-10) +description: Concrete divergence found inside sub_82452DC0's descendant tree. sub_8245B078 fires 18× canary / 0× ours — CANARY-ONLY. Gate at sub_82452DC0+0x78 (PC 0x82452E2C `beq cr6, 0x82452E88`) controlled by sub_8245B000 which returns 1 iff [r3+0]≠0 AND [r3+4]≠0 for an 80-byte struct at r31+96. Ours has one of those fields NULL — missing-population data divergence. Struct upstream-written by sub_8245AE50/sub_82452068/sub_82452200 (all fire in both, same shape, just ours doesn't write the right fields). Gate-failure explains AUDIT-047's 4 wedges (0x10A0/0x10A4/0x1530/0x1534) via sub_8245AD00 + plausibly tid=13's stall on 0x1288. Cascade D=draws>0 LOW-MEDIUM 20-30% (5 vtable dispatches downstream might re-hit unreachability island). +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-051 (2026-05-10, READ-ONLY, master `25704c5` unchanged, canary `6de80dffe` clean)**: Angle I work-submitter trace per AUDIT-050 reframe. 10 PCs probed in canary + ours. + +## Comparison table + +| PC | Canary | Ours | Class | +|---|---|---|---| +| 0x82452DC0 (entry) | 45 | 8 | CANARY-OVER 5.62× | +| 0x8245AE50 | 84 | 19 | CANARY-OVER 4.42× | +| 0x82452068 | 77 | 17 | CANARY-OVER 4.53× | +| 0x82452200 | 72 | 17 | CANARY-OVER 4.24× | +| 0x8245B000 | 45 | 8 | CANARY-OVER 5.62× | +| **0x8245B078** | **18** | **0** | **CANARY-ONLY (THE BUG)** | +| 0x82452AB8 | canary-crash | 9 | INCONCLUSIVE | +| 0x82454A40 | canary-crash | 2445 | INCONCLUSIVE; OURS-OVER (heavy util) | +| 0x82454918 | canary-core | 152 | INCONCLUSIVE | +| 0x82452EC4 | canary-core | 0 | INCONCLUSIVE; mid-block ind-call (RE class #13) | + +Same-shape ratio ~4.2-5.6× for all reaching PCs is the audit-009 cluster signature (tid=13 runs but fewer iters/sec because cluster is half-bootstrapped per AUDIT-050 reframe). + +## The bug + +PC `0x82452E2C` is `beq cr6, 0x82452E88` immediately after `bl sub_8245B000` (at `0x82452E1C`). `sub_8245B000` is a tiny gate that returns: +``` +return [r3+0] != 0 AND [r3+4] != 0 +``` +for an 80-byte stack-local struct at `r31+96` of `sub_82452DC0`. **In ours, at least one of those two pointers is NULL → predicate fails → branch skips `sub_8245B078`. In canary, both non-zero → predicate passes → 18 fires.** + +The struct is populated upstream by callees `sub_8245AE50` / `sub_82452068` / `sub_82452200`. All three fire in BOTH engines (same shape ratio), but ours writes incomplete data into the struct. + +## Connection to AUDIT-047 + +`sub_8245B000`'s "true" path also calls **`sub_8245AD00`** — which AUDIT-047 already identified as the signal source for 4 of the NO_SIGNALS_DESPITE_WAITS wedge handles: `0x10A0`, `0x10A4`, `0x1530`, `0x1534`. **Single root cause** for those 4 wedges: the work-submitter's local struct is malformed, gating off the signal-payload chain. + +Likely also explains tid=13's stall on event 0x1288 (audit-049) since `sub_8245AD00` may signal `0x1288` too via the same path. + +## Why divergence is direct-bl, not vtable + +`sub_82452DC0` does contain 5 indirect-dispatch sites (`pc=0x82452EC4` slot=5 cands=85; `0x82452EF0` slot=2 cands=203; `0x82452F30`/`0x82452FC4`/`0x82453014` slot=5 cands=85). But ALL of them sit AFTER the divergence at +0x78. The bug stops execution before reaching the vtable dispatches — so M5.5 vtable analysis isn't on the critical edge here. **Refines the AUDIT-050 "vtable-writer-island deadness" framing**: at the work-submitter level, the bug is data population (a stack-local struct), not vtable. The vtable issue MAY re-bite downstream of the gate-fix; that's why cascade D is only 20-30%. + +## Recommended AUDIT-052 + +Dump the 80-byte struct at `r31+96` of `sub_82452DC0` at PC `0x82452E1C` (just before `bl sub_8245B000`) in BOTH engines. Compare first 16 bytes canary vs ours; identify which of `[+0]`/`[+4]` is NULL in ours. Then probe `sub_8245AE50`/`sub_82452068`/`sub_82452200` to find the missing writer. + +Single PC, single dump in each engine. ~10 min wallclock. + +## Sharp 4-dim cascade prediction (post AUDIT-052/053 fix) + +- **A**: `sub_8245B078` fires in ours 0→≥1 — HIGH confidence (direct gate flip) +- **B**: tid=13 Blocked→Ready — MEDIUM (depends on whether `sub_8245AD00`'s tail signals `0x1288`) +- **C**: NO_SIGNALS_DESPITE_WAITS drops ≥2-4 (AUDIT-047's 4 wedges) — MEDIUM-HIGH +- **D**: `swaps>2 OR draws>0` — **20-30% LOW-MEDIUM**. The 5 ind_call sites in `sub_82452DC0+0x104..+0x254` (vtable[5]/[2] cands=85/203) sit downstream; if path enters them, vtable-writer-island deadness re-bites. AUDIT-052/053 fix is necessary but likely not sufficient. + +This is the **first concrete divergence to attack** since AUDIT-032's audio fix. + +## Discipline + +xenia-rs HEAD `25704c5` unchanged. xenia-canary HEAD `6de80dffe` clean (audit-030 patch reverted, `git status --short` empty for tracked). Trace `audit-runs/audit-051-work-submitter-trace/{canary-0x*.log×10, ours-multi-probe.log, findings.md, canary-patch.diff}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_052_cache_root_cause_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_052_cache_root_cause_2026_05_10.md new file mode 100644 index 0000000..9b6e8d8 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_052_cache_root_cause_2026_05_10.md @@ -0,0 +1,70 @@ +--- +name: AUDIT-052 — root cause cache-wipe-on-boot 2026-05-10 +description: AUDIT-051's struct hypothesis REFUTED — struct at sub_82452DC0:r31+96 is bit-identical canary vs ours. The two dwords [r3+0]/[r3+4] are halves of a hash key formatted into `cache:\\\` paths. Real bug — `NtQueryFullAttributesFile` returns -1 for cache:\* in ours because AUDIT-038's per-boot tmpdir cache is WIPED on every startup; canary's persistent cache at `~/.local/share/Xenia/cache/` has pre-built files. **Reading-error class #15**: AUDIT-038's "missing-or-stale ≡ fresh" assumption invalidated. Fix = persistent cache. Cascade A/B/C HIGH confidence; D 30-40%. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-052 (2026-05-10, READ-ONLY, master `25704c5` unchanged, canary clean)**: dumped 80-byte struct at `r31+96` of `sub_82452DC0` at PC `0x82452E1C` in both engines. + +## Hypothesis REFUTED — struct is bit-identical + +AUDIT-051 hypothesized ours had NULL in `[r3+0]` or `[r3+4]`. Both are NON-ZERO in all 8 ours fires (e.g. `d0=d4ea4615 d1=e46ee8ca`), matching canary fire 0 byte-for-byte modulo stack base address. Probing at `0x8245B020` confirms predicate 1 passes 8/8 in ours. **The struct is fine. The writers do their job. AUDIT-051's writer-triage is moot.** + +## What the struct contains + +`[r3+0]` and `[r3+4]` are **two halves of a hash key** that gets formatted into a `cache:\\\` filesystem path by `sub_82459130`: + +``` +ours fire 0 path bytes (BE-decoded): "cache:\d4ea4615\e\46ee8ca\0" + ^^^^^^^^ ^^^^^^^ + = d0 = d1 high bits +``` + +Canary's log shows `RtlInitAnsiString(...,cache:\d4ea4615\e\46ee8ca)` followed by `NtQueryFullAttributesFile(...)` for the same path. + +## Real root cause: NtQueryFullAttributesFile fails in ours + +PC-probes inside `sub_8245AD00`: +- ours probe at PC `0x8245AD94` (success path post-NtQuery): **0 fires** +- ours probe at PC `0x8245ADFC` (failure path): **8 fires** + +`NtQueryFullAttributesFile` returns -1 for **every cache lookup** in ours. This propagates: `sub_8245AD00` returns 0 → `sub_8245B000` returns 0 → gate `bne` at `0x82452E2C` skips → `sub_8245B078` never called. + +## AUDIT-038 cache-clear-on-boot at fault + +`crates/xenia-kernel/src/state.rs:387-398` (`init_cache_root`): creates a per-process tmpdir cache and **wipes it on every emulator startup** (`remove_dir_all` then `create_dir_all`). + +Canary's host cache at `~/.local/share/Xenia/cache/` is **persistent and pre-populated** with runtime-built game files: +``` +/home/fabi/.local/share/Xenia/cache/d4ea4615/e/46ee8ca +/home/fabi/.local/share/Xenia/cache/69d8e45c/... +/home/fabi/.local/share/Xenia/cache/87719002/... +/home/fabi/.local/share/Xenia/cache/aab216c3/... +``` + +These are likely shader/PSO/material caches Sylpheed builds during gameplay and persists. Ours starts fresh every time, so all `cache:\*` lookups return `STATUS_OBJECT_NAME_NOT_FOUND`. + +## Reading-error class #15 + +AUDIT-038's framing: "per-boot tmpdir cache is deterministic and safe because Sylpheed treats missing-or-stale identically to fresh." **REFUTED** — the work-submitter wakeup path is GATED on cache existence; without populated cache, no fall-through that does the work. Class: **opportunistic-cache assumption violation**. + +## Sharp 4-dim cascade prediction (post AUDIT-053 fix) + +Make cache root persistent (and either pre-populate or accept that first-boot won't cross plateau): +- **A**: `0x82452E30` fires > 0 in ours (gate passes) — **HIGH** confidence +- **B**: `sub_8245B078` fires > 0 — **HIGH** +- **C**: AUDIT-047's 4 wedges (`0x10A0/0x10A4/0x1530/0x1534`) get NtSetEvent — **HIGH** +- **D**: `swaps>2 OR draws>0` — **30-40%** (cache may not be only gate; tid=1 on 0x1280 may persist; downstream might need file CONTENTS not just existence) + +## Recommended AUDIT-053 (cheapest test) + +1. Symlink ours's cache root → canary's `~/.local/share/Xenia/cache/` (or copy contents) BEFORE `init_cache_root` wipes; OR temporarily comment out `remove_dir_all` +2. Re-run with `--pc-probe=0x82452E30,0x8245B078` and see if gate passes + +If gate passes with canary's pre-populated cache → existence is sufficient OR contents are sufficient (canary's). Either way, fix = persistent cache + (maybe) pre-population strategy. + +If gate still fails → contents matter MORE than canary's cache provides → bigger problem. + +## Discipline + +xenia-rs HEAD `25704c5` unchanged. xenia-canary HEAD `6de80dffe` clean. All probe extensions reverted via `git checkout`. Trace at `audit-runs/audit-052-struct-dump/{disasm.txt, canary-traces.txt, ours-third-pred.log, canary-0x82452E24.log, ...}`. Largest log ~135MB (canary trap). diff --git a/migration/claude-memory/project_xenia_rs_audit_053_cache_layout_bug_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_053_cache_layout_bug_2026_05_10.md new file mode 100644 index 0000000..1a06307 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_053_cache_layout_bug_2026_05_10.md @@ -0,0 +1,81 @@ +--- +name: AUDIT-053 — cache layout aliasing bug found 2026-05-10 +description: Phase 1 (canary cache override) PASSES cascade A/B — gate at sub_82452DC0+0x78 flips on existence of cache:\\\. BUT cascade C/D FAIL — NtSetEvent dropped 68→63, VdSwap=1 unchanged, draws=0. Phase 2-4 (permanent persistent-cache fix) REVERTED due to deeper layout bug: open_cache_file treats all NtCreateFile as files, but Sylpheed wants cache:\d4ea4615 as a DIRECTORY. 0-byte file d4ea4615 blocks subsequent hierarchical creates on warm-start. 15th reading-error class — VFS layout aliasing (ζ-class). Next: AUDIT-054 fix open_cache_file to honor FILE_DIRECTORY_FILE bit. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-053 (2026-05-10, READ-ONLY at end, master `25704c5` unchanged, source mods reverted)**: persistent-cache experiment + permanent fix attempt. Phase 1 confirmed AUDIT-052's gate hypothesis. Phase 2-4 reverted due to deeper bug. + +## Phase 1: hypothesis test (canary-cache override) + +Override `XENIA_CACHE_ROOT="$HOME/.local/share/Xenia/cache"` (canary's pre-populated cache directory). + +| PC | Pre-fix | Post-fix | Verdict | +|---|---|---|---| +| `0x82452E30` (gate) | 0 | **6** | PASS | +| `0x8245B078` (callback) | 0 | **6** | PASS | +| `0x8245AD94` (cache-found) | 0 | **6** | PASS | +| `0x8245ADFC` (cache-miss) | many | **2** | PASS | + +**Cascade A/B: PASS.** Cache existence IS the gate. + +## Phase 1b: Stats vs master `25704c5` baseline + +| Metric | Baseline | Override | Δ | +|---|---|---|---| +| `VdSwap` | 1 | 1 | 0 | +| `gpu.interrupt.delivered{src=0}` | 54 | 52 | -2 | +| `NtSetEvent` | 68 | **63** | **-5** | +| `NO_SIGNALS_DESPITE_WAITS` count | 6 | 6 | 0 (different wedges) | + +**Cascade C: FAIL** (NtSetEvent went DOWN, not UP). **Cascade D: FAIL** (VdSwap=1, draws=0 unchanged). Gate passes into callback chain but downstream signaling does NOT happen as predicted. + +## Phase 2-4: permanent fix REVERTED + +Implemented `init_cache_root_ext(root, wipe)` + env-var selector — ~50 LOC. Cold-start identical to baseline (cache empty). **Warm-start REGRESSION**: + +| Metric | Cold | Warm | +|---|---|---| +| VdSwap | 1 | **0** | +| NtSetEvent | 68 | **2** | +| ExCreateThread | 10 | **3** | +| NtCreateEvent | 101 | **8** | + +**Root cause**: `crates/xenia-kernel/src/exports.rs:open_cache_file` treats ALL `NtCreateFile` as files. Sylpheed issues two overlapping path kinds: +1. `NtCreateFile cache:\.tmp disp=OVERWRITE_IF` → flat filename, file +2. `NtCreateFile cache:\ disp=CREATE` → meant as a DIRECTORY container + +We create both as files. On warm-start, the 0-byte file `d4ea4615` blocks subsequent hierarchical creates of `cache:\d4ea4615\e\46ee8ca`. Canary correctly stores it as `d4ea4615/e/46ee8ca` directory tree. + +AUDIT-038's wipe-every-boot **masked this for 14 audits**. + +## Reading-error class #15 (NEW) + +**ζ — VFS layout aliasing**. NtCreateFile path semantics: caller-specified `FILE_DIRECTORY_FILE` bit in `CreateOptions` (or path-pattern heuristic) determines whether the create is a file or directory. Treating all as files works only when the cache is wiped before each boot — which is exactly what AUDIT-038 did, masking the bug. + +Distinct from γ (missing-signaler), δ (handle namespace), ε (heap addr), η (record layout). This is host-FS modeling. + +## Recommended AUDIT-054 (Two-track, priority order) + +### Track A (high-confidence, ≤30 LOC) — fix VFS layout + +`crates/xenia-kernel/src/exports.rs:open_cache_file`. Honor `FILE_DIRECTORY_FILE` bit in CreateOptions when set. Quick verification: dump CreateOptions in existing log + grep for bit on the colliding `cache:\d4ea4615 disp=2` request. + +Fallback heuristic: if `disp=FILE_CREATE` AND path has NO extension AND a sibling `` exists → directory. + +After Track A lands, re-apply Track B (persistent cache from this audit's reverted patch). If layout fix correct → warm-start no longer regresses → cache:\

\\

succeeds → cascade C may rise. + +### Track B (if Track A doesn't unstick cascade D) + +Trace inside `sub_8245B078` with `--branch-probe` to find divergent branch on the now-reachable callback path. Sub-PCs to probe: every basic block within sub_8245B078 (need to enumerate via DB + disasm). + +## Sharp 4-dim cascade prediction (post Track A + Track B persistent cache) + +- A: gate passes (already confirmed in Phase 1) — HIGH +- B: sub_8245B078 fires + downstream callbacks runs — HIGH +- C: NtSetEvent rises ≥4 (AUDIT-047's wedges) — MEDIUM-HIGH (depends on Track A success) +- **D: swaps>2 OR draws>0** — STILL 30-40% per AUDIT-052 (cache may not be only gate) + +## Discipline + +xenia-rs HEAD `25704c5` unchanged at session end. All source mods reverted via `git checkout`. `git status crates/`: nothing to commit. cargo test 127/127 pass. Canary unmodified this session. Trace at `audit-runs/audit-053-persistent-cache/{phase1-experiment.log, phase1-canary-trace.log, baseline-master-500m.log, baseline-tmpfs-500m.log, validation-coldstart-500m.log, validation-warmstart-500m.log}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_054_vfs_layout_landed_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_054_vfs_layout_landed_2026_05_10.md new file mode 100644 index 0000000..137df58 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_054_vfs_layout_landed_2026_05_10.md @@ -0,0 +1,72 @@ +--- +name: AUDIT-054 — VFS layout fix (Track A) LANDED 2026-05-10 +description: Track A (FILE_DIRECTORY_FILE bit handling in nt_create_file/open_cache_file) committed `2a8ff95`. Golden re-baselined `ac2f89a`. Track B (persistent cache) implemented as OPT-IN via XENIA_CACHE_PERSIST=1; default remains AUDIT-038 wipe-on-boot to preserve lockstep. Track A correctness fix: cache:\ opens with disp=2 opts=0x4021 → mkdir; subsequent leaf cache:\\\ with opts=0x4020 → file. Cascade A confirmed (hierarchy correctly created on disk with PERSIST). Cascade C/D NO MOVEMENT — confirms AUDIT-053 finding that cache is necessary but not sufficient. Persistent warm-start regresses (cxx_throw=10) due to our cold-start halting at swaps=1 producing half-baked .tmp journals. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-054 (2026-05-10, IMPLEMENTATION LANDED)**: master HEAD advanced `25704c5 → 2a8ff95 → ac2f89a` (2 commits). Closes ζ-class VFS layout aliasing for `cache:\*` paths. + +## Commits + +- `2a8ff95` AUDIT-054: thread CreateOptions through NtCreateFile + opt-in cache persistence (74 net code LOC ≤ 80 budget) +- `ac2f89a` Re-baseline sylpheed_n50m golden post-AUDIT-054 (1-instruction drift `50000002 → 50000001`; all other digest fields bit-identical) + +## Track A — FILE_DIRECTORY_FILE handling + +`crates/xenia-kernel/src/exports.rs` +50 / -5. `nt_create_file` reads `create_options` from `sp + 0x54` (9th-arg slot per canary `xenia/kernel/util/shim_utils.h:49-50`); `nt_open_file` forwards `open_options` from `r8`. `open_vfs_file`/`open_cache_file` thread the options through. When `FILE_DIRECTORY_FILE` (bit 0x1) is set on a `cache:\*` path, runs `std::fs::create_dir_all` instead of `File::create`. Stored handle's `path` ends with `/` so `nt_query_information_file` reports Directory=1. + +Discriminator on disk: +- `cache:\d4ea4615` opens with `disp=2 opts=0x4021` (FILE_CREATE+FILE_DIRECTORY_FILE) → mkdir ✓ +- `cache:\d4ea4615\e` opens same way → mkdir ✓ +- `cache:\d4ea4615\e\46ee8ca` opens with `opts=0x4020` (no DIR bit) → file ✓ +- `cache:\d4ea4615e46ee8ca.tmp` (flat) opens with `opts=0x4020` → file ✓ + +The 0-byte sentinel that masked AUDIT-038's wipe behavior for 14 audits is gone. + +## Track B — opt-in persistent cache + +`crates/xenia-kernel/src/state.rs` +90 / -9. New `resolve_default_cache_root()` returns `(PathBuf, wipe: bool)`. Defaults preserved (AUDIT-038 wipe-on-boot for lockstep). Persistence opt-in: +- `XENIA_CACHE_ROOT=` — explicit user override, no wipe +- `XENIA_CACHE_PERSIST=1` — `$XDG_DATA_HOME/xenia-rs/cache` or `$HOME/.local/share/xenia-rs/cache`, no wipe +- New helper `KernelState::set_cache_root(path)` for tests/oracle setups + +## Cold/warm-start observations + +| Mode | Boot | swaps | draws | cxx_throw | Notes | +|---|---|---|---|---|---| +| default (wipe) | cold | 1 | 0 | 0 | bit-identical to master baseline | +| `XENIA_CACHE_PERSIST=1` | cold | 1 | 0 | 0 | same digest + correct hierarchy on disk | +| `XENIA_CACHE_PERSIST=1` | warm | **0** | 0 | **10** | REGRESSION — Sylpheed appends to existing .tmp; version header stale; throws `runtime_error 0xe06d7363` | +| `XENIA_CACHE_ROOT=$HOME/.local/share/Xenia/cache` | warm | 1 | 0 | ? | (per AUDIT-053 Phase 1: gate passes, sub_8245B078 fires 6×, NtSetEvent 68→63, no D progress) | + +The warm-start cxx_throw is a SECONDARY problem caused by our cold-start halting at swaps=1 producing half-baked .tmp journal files. Sylpheed's pattern: cold boot writes 400 bytes to `cache:\.tmp`; warm boot seeks to end-of-file, appends another 400 bytes; reads version header from `cache:/access` which is now stale. Real fix would be: let cold boot complete the cache cycle before quitting (or implement journal-truncate-on-mismatch, or compute the version header correctly). + +## Cascade A/B/C/D verdict + +- **Cascade A** (gate flips on cache existence): CONFIRMED in AUDIT-053 Phase 1; CORRECTED for layout via Track A +- **Cascade B** (sub_8245B078 fires): CONFIRMED in AUDIT-053 Phase 1 (6 fires with canary cache override) +- **Cascade C** (NtSetEvent rises): FAIL — went 68→63 with canary cache; downstream signaling does not happen +- **Cascade D** (swaps>2 OR draws>0): FAIL — VdSwap=1 unchanged + +**The cache fix is necessary but not sufficient.** Next gate is INSIDE `sub_8245B078`'s body or one of its callees. + +## Recommended AUDIT-055 (final session of 7-budget run) + +Probe `sub_8245B078`'s body in BOTH engines with `XENIA_CACHE_ROOT` set to canary's pre-populated cache (so the gate passes in ours and sub_8245B078 fires). Compare: +- Inner basic-block fire counts via `--pc-probe` (or `--branch-probe` if available) +- Identify the divergent branch +- Trace LR chain at divergence + +`sub_8245B078`'s callees per AUDIT-051: `sub_82459130, sub_8217FA08, sub_82454498, sub_82454580, sub_825F3C48`. Probe their entries. + +Sharp 4-dim cascade prediction (post AUDIT-055 + bounded fix): +- A: divergent branch correctly taken in ours — HIGH +- B: downstream callees fire — MEDIUM +- C: NtSetEvent rises — MEDIUM +- D: draws>0 — STILL 20-40% (per AUDIT-052 prediction; may need MULTIPLE wedge fixes) + +## Discipline + +xenia-rs HEAD `ac2f89a`. Tests: kernel 127/127, app 5/5 + sylpheed_n50m oracle pass. Goldens re-baselined cleanly. No hooks skipped. Commits authored cleanly with Co-Authored-By trailer. + +Trace `audit-runs/audit-054-vfs-layout-fix/{cold-start-500m.log, warm-start-500m.log, persist-cold-500m.log, persist-warm-500m.log, baseline-master-500m.log}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_055_subB078_internal_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_055_subB078_internal_2026_05_10.md new file mode 100644 index 0000000..e027b83 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_055_subB078_internal_2026_05_10.md @@ -0,0 +1,50 @@ +--- +name: AUDIT-055 — sub_8245B078 internal probe (final autonomous session) 2026-05-10 +description: Probed sub_8245B078's body in both engines with XENIA_CACHE_ROOT=canary. Body executes correctly — internal calls have ~98% parity (sub_8217FA08 2449/2411). Divergence is UPSTREAM: sub_82452DC0 fires 5.6× less in ours per AUDIT-051. Sharpest specific divergence: sub_8217FA08 from LR=0x82455E60 (=sub_82455DF0+0x70), canary 20 fires / ours 0. Bug class refined to δ-throughput: producer of work items doesn't fire at canary rate. Recommended AUDIT-056: probe sub_82452DC0 entry, aggregate LR distribution, identify divergent caller; recurse upward. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-055 (2026-05-10, READ-ONLY, master `ac2f89a` unchanged, canary clean post-revert)**: FINAL session of 7-session autonomous run. Probed `sub_8245B078`'s body with `XENIA_CACHE_ROOT=/home/fabi/.local/share/Xenia/cache` override on ours (so gate passes and sub_8245B078 fires). + +## Comparison table (6 PCs) + +| PC | Function | Canary | Ours | Ratio | Class | +|---|---|---|---|---|---| +| 0x8245B078 | entry | 19 | 6 | 3.2× | uniform shortfall | +| 0x82459130 | string-format | 44 | 13 | 3.4× | uniform shortfall | +| 0x8217FA08 | string-set+register | 2449 | 2411 | **1.02×** | parity | +| 0x82454498 | registry lazy-init | 3840 | 2771 | 1.39× | mixed | +| 0x82454580 | registry insert | 3101 | 2587 | 1.20× | mixed | +| 0x825F3C48 | stack-cookie check | 6743 | 5334 | 1.26× | broad | + +Plus arm-entries: `0x8245B0CC` (THEN) ours=6, `0x8245B0EC` (ELSE) ours=0. **sub_8245B078 always takes THEN in both engines.** + +## Verdict + +`sub_8245B078`'s BODY executes correctly. Internal call parity confirms cluster behavior is correct **once entered**. The remaining 3.2× shortfall is **upstream** — fewer parents activate the cluster's worker paths. + +Sharpest specific divergence: `sub_8217FA08` called from `LR=0x82455E60` (= `sub_82455DF0+0x70`). Canary 20 fires / Ours 0 fires (CANARY-ONLY at this call-site). `sub_82455DF0` is a path-walking helper with 332+ static callers spread across renderers `0x822A*-0x823C*`, audio, kernel-shim. Missing fires suggest missing resource-name-resolution pathway in sister code path. + +## Combined hypothesis (audits 051+053+055) + +Chain: `producer → sub_82452DC0 → sub_8245B000 (cache check) → sub_8245B078 → sub_8217FA08`. AUDIT-054's cache fix closed the gate at sub_8245B000 (cascade A/B). The remaining 3-4× shortfall is upstream of sub_82452DC0 — the **producer of work items** isn't firing at canary's rate. + +Bug class refined to **δ-throughput** (signal/work generator firing less than canary; not content/correctness divergence at this layer). + +## Recommended AUDIT-056 (post-7-budget, user-triggered) + +**Work-submitter parent identification**. Probe `sub_82452DC0` entry in BOTH engines; aggregate LR distribution; identify top divergent caller. Recurse upward 2-3 levels until divergence dilutes. Per audit-051 we know LR=0x82448120 is dominant in ours; canary's distribution at sub_82452DC0 entry not yet aggregated. Effort: 30 min wallclock, ≤4 PC probes. + +Sharp 4-dim cascade prediction (if AUDIT-056 identifies and fixes the divergent producer): +- A: sub_82452DC0 fires ours 8 → ≥30 (close half the gap) +- B: sub_8245B078 fires ours 6 → ≥15 +- C: sub_82455DF0 fires ours 0 → ≥5 +- **D: draws>0** 20-30% (cluster fed at parity may advance state machine; Linux Vulkan/XCB still blocks intro display). **swaps>2** 40-50% (audio + cluster + cache could combine to advance past splash). + +Alternative if AUDIT-056 finds producer is in audit-009 island: **AUDIT-057 = M5.5 alias-aware static dispatch resolution** to enumerate vtable entry paths into the cluster, then targeted probe/inject. + +## Discipline + +xenia-rs HEAD `ac2f89a` unchanged. Canary HEAD `6de80dffe` clean post-revert. No source mods this session, no commits. Trace `audit-runs/audit-055-sub_8245B078-internal/{canary-0x*.log, ours-multi-probe.log, canary-0x8245B078-redo.log}` ~15 MB. + +Note: during canary build, lost ~10 min recovering from hardware-induced cache corruption (`executable_addr_flags.bin` truncated to 0 bytes by SATA bus-error per `dmesg`); deleted to allow canary rebuild. Documented for recurrence. diff --git a/migration/claude-memory/project_xenia_rs_audit_056_producer_trace_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_056_producer_trace_2026_05_10.md new file mode 100644 index 0000000..d1af831 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_056_producer_trace_2026_05_10.md @@ -0,0 +1,58 @@ +--- +name: AUDIT-056 — sub_82452DC0 producer trace 2026-05-10 +description: Aggregated LR distribution at sub_82452DC0 entry in canary (45 fires/60s) vs ours (14 fires/26s). Two distinct CANARY-ONLY divergence introducers: (a) sub_821C4EB0 calls sub_821CEDF8 5× in canary, 0× in ours — internal cascade missing despite identical entry conditions; (b) sub_824AFF88 thread-trampoline fires 5× in canary, 0× in ours (12 vs 30 XThread count gap). The 3.21× sub_82452DC0 throughput ratio matches the 3.0× thread-count ratio. Reading-error #13 limits internal probing inside sub_821C4EB0; need --lr-trace granularity extension. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-056 (2026-05-10, READ-ONLY, master `ac2f89a` unchanged, canary clean post-revert)**: producer-trace per AUDIT-055 recommendation. + +## LR distribution at sub_82452DC0 + +| LR | Canary | Ours | Resolved fn | +|---|---|---|---| +| `0x82452E64` | 19 | 6 | sub_82452DC0+0xA4 (self-recursion) | +| `0x82460CC8` | 7 | 2 | sub_82460B70+0x158 (0 static callers — vtable entry) | +| `0x821CBF7C` | **7** | **0** | sub_821CBEA8+0xD4 | +| `0x821C4C98` | **5** | **0** | sub_821C4AE0+0x1B8 (0 static callers — vtable entry) | +| `0x82448120` | 4 | 4 | sub_824480D0+0x50 (parity) | +| `0x821790B8` | 0 | 1 | sub_82178F60+0x158 | +| `0x821CB1D0` | 0 | 1 | sub_821CB030+0x1A0 (audit-049 chain) | + +Total: canary 45 fires / ours 14 = **3.21×** ratio. + +## Two divergence introducers + +### (a) sub_821C4EB0 internal cascade + +- Level 3 entry (sub_821C4EB0): canary 1 / ours 1 with **identical caller LR=0x82174A80** = sub_82174828+0x258. **PARITY at entry.** +- sub_821C4EB0 contains 5 unconditional `bl 0x821CEDF8` calls at offsets +0x198, +0x1C4, +0x1F0, +0x218, +0x240 +- Canary takes ALL 5; ours takes 0 +- **Ours's sub_821C4EB0 returns early before +0x198** — static disasm shows early-exit candidates at +0x44/+0x48 (r3==0 from sub_82150EF8), +0x88/+0x8C (state-byte gate), +0xBC/+0xC0 (r3==0 from sub_82172370), +0xD8..+0xE0 (jump-table on r11) +- Reading-error #13 prevents direct internal-PC verification in ours (`--pc-probe` block-entry-only) + +### (b) sub_824AFF88 thread trampoline + +- Ends with `bl __imp_xboxkrnl.ExTerminateThread` — thread entry trampoline +- Canary fires 5× (lr=0xBCBCBCBC, fresh-stack pattern); ours 0× +- Ground truth: **canary 30 XThreads (tids 1..0x1E), ours 12 XThreads — 18 missing threads** +- 3.0× thread-count ratio matches the 3.21× sub_82452DC0 throughput ratio + +## Recommended AUDIT-057 + +**Probe sub_82174828 entry** (caller of sub_821C4EB0 at LR=0x82174A80) + `--lr-trace` at the post-bl basic-block-entry PCs inside sub_821C4EB0: `0x821C4F2C, 0x821C5014, 0x821C5048`. Block-entry-only granularity will at least tell us where ours's path forks. + +If granularity insufficient (reading-error #13 reconfirmed), extend `--lr-trace` to per-instruction granularity matching canary's `--log_lr_on_pc`. + +Sharp 4-dim cascade prediction (post AUDIT-057 fix on early-return predicate): +- A: sub_82452DC0 fires 14 → 25+ in ours +- B: thread count rises 12 → 18+ +- C: sub_821CBEA8 fires 0 → ≥3 +- D: draws>0 — **20-30%** (worker-thread subset live but vtable-writer subset dead per AUDIT-050 — fixing one early-return may not bridge that) + +**Fallback** if no clean predicate: pivot to direct audit of which 18 ExCreateThread/NtCreateThread calls fail to land. Compare canary 30 vs ours 12 thread-create entry-PCs. + +Wine canary not yet justified — Linux Debug producing actionable signal. + +## Discipline + +xenia-rs HEAD `ac2f89a` unchanged. Canary HEAD `6de80dffe` clean (patch reverted). Wallclock ~30 min. Trace at `audit-runs/audit-056-producer-trace/{canary-sub*.log, ours-{parents,ggparents,internal,disp,spawnapi,path}.{jsonl,log}, lr-distribution.csv, final-summary.txt}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_057_thread_gap_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_057_thread_gap_2026_05_10.md new file mode 100644 index 0000000..ff14182 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_057_thread_gap_2026_05_10.md @@ -0,0 +1,53 @@ +--- +name: AUDIT-057 thread-creation gap characterization 2026-05-10 +description: Canary 23 thread spawns / ours 10 / 13 missing in 60s wall + 500M instr respectively. 8 distinct missing-thread spawner fns. Top is sub_825070F0 (4 missing, initializes 4 workers with shared ctx 0xBCE25340, entries 0x82506528/58/88/B8). 11 of 13 missing threads from spawners that don't fire at all in ours — same audit-050 CRT-fnptr-array unreachability pattern. AUDIT-058 = drill into sub_825070F0 activation chain. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-057 (2026-05-10, READ-ONLY, master `ac2f89a` unchanged, canary clean)**: thread-count gap characterization per AUDIT-056 recommendation. + +## Thread-spawn diff + +Canary 23 fires / ours 10 fires = **13 missing threads**. All have caller-LR=`0x824AC5F0` (kernel-internal `sub_824AC5A8` ExCreateThread wrapper); diagnostically meaningful "spawner" is the originating game fn (r5 = entry-PC of new thread, r4 = ctx). + +## Top missing-thread spawners + +| Rank | Spawner fn | Missing | Notes | +|---|---|---|---| +| 1 | **`sub_825070F0`** | **4** | Sequential entries `0x82506528/58/88/B8` (+0x30 stride, vtable-shape), shared ctx `0xBCE25340` — 4-worker pool init | +| 2 | `sub_822C6630` | 2 | Spawns `0x822C6870` twice, ctx `0x828F3300` | +| 2 | `sub_823DD838` | 2 | Spawns `0x823DDB50` twice, ctx `0x828F3C88` | +| — | `sub_821C4EB0` | 1 | AUDIT-056 producer (early-return) | +| — | `sub_821746B0` | 1 | Spawns sub_821748F0 twice in canary, once in ours | +| — | `sub_824569C0`, `sub_821701C8`, `sub_823DDD18` | 1 each | Singletons | + +8 distinct missing-thread spawner functions total. + +## Probe verification (ours --pc-probe over 500M) + +- `sub_821746B0`: **fires 1×** (tid=1, frame → sub_82173990 → sub_822F1B50 → 0x8216EE14 → entry_point). Spawns 1 thread but should spawn 2 — missed loop/branch tail. +- `sub_821C4EB0`: **fires 1×** (tid=13 root). Hits entry but early-returns before 5×-cascade (AUDIT-056 finding). +- `sub_825070F0`, `sub_822C6630`, `sub_823DD838`, `sub_824569C0`, `sub_821701C8`, `sub_823DDD18`: **0 fires** + +**11 of 13 missing threads** from spawners that don't run at all in ours. + +## Activation pattern + +None of the 8 missing-thread spawners is statically reachable from entry (`v_reachability_from_entry` = 0 for 7 of them, 1 for sub_824569C0 which still doesn't fire). Same pattern as AUDIT-050 cluster — activated via CRT-driven fnptr arrays (sub_824ACB38 driver enumerates 0x82870xxx arrays). Per AUDIT-050: 14 fnptr arrays / 82 non-NULL fnptrs detected; we know 24 GamePart factory registrations fire via `sub_8280E148`, but other fnptr arrays may not be enumerated correctly. + +## Recommended AUDIT-058 + +**Drill into `sub_825070F0`** (largest single cluster, 4 threads = highest leverage). Probe canary `--log_lr_on_pc=0x825070F0` to capture activation caller. Trace the activation chain backward to find the fnptr array entry that drives it. Compare with ours's CRT-driver enumeration to identify the missing entry. + +## Sharp 4-dim cascade prediction (post AUDIT-058 fix) + +- A: thread-count 12 → ≥16 (4 new threads spawn) — HIGH confidence +- B: tid=13's wait-on-handle 0x1218 may unblock IF new threads signal 0x1218 — MEDIUM +- C: KeReleaseSemaphore counter rises — MEDIUM +- **D: draws>0** — LOW 10-20% (multiple independent spawner gaps; one fix yields B/C but probably not draws — need several accumulated fixes) + +Honest framing: this is a **thread-creation cluster activation problem with multiple independent gaps**. AUDIT-058 likely first of a cluster of cluster-activator fixes. + +## Discipline + +xenia-rs HEAD `ac2f89a` unchanged. Canary HEAD `6de80dffe` clean post-revert. No stale processes (`pgrep -x` empty after explicit pkill). Trace `audit-runs/audit-057-thread-gap/{canary-ExCreateThread.stdout, ours.stderr, ours-pc-probe.stdout, thread-diff-analysis.csv}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_058_sub825070F0_activation_2026_05_10.md b/migration/claude-memory/project_xenia_rs_audit_058_sub825070F0_activation_2026_05_10.md new file mode 100644 index 0000000..6fcd176 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_058_sub825070F0_activation_2026_05_10.md @@ -0,0 +1,50 @@ +--- +name: AUDIT-058 sub_825070F0 activation chain — wedge upstream of cluster 2026-05-10 +description: Canary fires sub_825070F0 1× at ~60s wallclock after DiscImageDevice::ResolvePath(\\dat\\movie). Static caller ladder: sub_825070F0 ← sub_824F7800 (vtable bctrl @ 0x824F7B20) ← sub_824F7CD0 ← sub_824F8398 ← sub_821B55D8 ← sub_821B6DF4 (top, 0 callers). ALL 6 ladder fns fire 0× in ours. Break is NOT a CRT-fnptr-array short-circuit — it's the AUDIT-049 main-thread wedge (handle 0x12A4) blocking the entire post-intro phase before sub_821B6DF4 can be activated. Missing piece: vtable 0x8200A208/0x8200A928 has ZERO vptr_writes in DB — the ctor that writes it is in an unreachability island. AUDIT-059 should pivot to the wedge handle rather than continue chasing the static caller ladder. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🎯 KRNBUG-AUDIT-058 (2026-05-10, READ-ONLY, master `ac2f89a` unchanged, canary clean post-revert)**: activation chain analysis for sub_825070F0 (AUDIT-057's top missing-thread spawner, 4 missing threads). + +## Canary activation ladder + +Canary fires `sub_825070F0` exactly 1× at ~60s wallclock, immediately after `DiscImageDevice::ResolvePath(\dat\movie)` (post-intro file open). Captured: `pc=0x825070F0 lr=0x824F7B24 r3=BCE25340 r4=701CF3C0 r5=BCE25AC0`. Shared-ctx `r3=0xBCE25340` matches AUDIT-057 prediction. + +Static caller ladder via `xrefs`: + +| Level | Fn | Caller xref | Reach-from-entry | +|---|---|---|---| +| L0 | `sub_825070F0` (vtable[1] of ANON_Class_713383D7) | 0 static callers — vtable-only | False | +| L1 | `sub_824F7800` | `0x824F7B20 bctrl` (computed call); only direct caller sub_824F7CD0 | False | +| L2 | `sub_824F7CD0` | `sub_824F8398 @ 0x824F83D4 bl` | False | +| L3 | `sub_824F8398` | `sub_821B55D8 @ 0x821B5B5C bl` | False | +| L4 | `sub_821B55D8` | `sub_821B6DF4 @ 0x821B6E34 b` (tail) | False | +| L5 | `sub_821B6DF4` | **0 static callers — top of chain** | False | + +`bctrl` at 0x824F7B20 has 62 indirect-dispatch candidates in DB but none is sub_825070F0 — static analyzer missed this vtable dispatch. Class `ANON_Class_713383D7` lives at vtables `0x8200A208` (and clone `0x8200A928`); both 7-method tables with sub_825070F0 at slot 1 and **zero recorded vptr_writes** — ctor that writes this vtable is in an unreachability island. + +## Where the ladder breaks in ours + +Probed all 6 ladder fns in ours via `--ctor-probe` at -n 500M: **ALL 6 fire 0×**. End-state confirms AUDIT-049 main-thread stall (PC 0x824ac578, 5 threads all Blocked on handles 0x12A4/0x12AC/0x1028/0x12B8/0x1020). No `\dat\movie` ResolvePath issued in ours. The break is **upstream of L5** — the entire post-intro phase doesn't activate. + +## Three falsifications + +(a) **NOT a CRT-fnptr-array short-circuit** of the AUDIT-050 0x82870xxx set. DB has no array in 0x82870000–0x82871000 cataloged at all (CRT driver enumerates raw .data bytes the static disassembler didn't classify). None of the 6 ladder fns is referenced from any cataloged array. + +(b) **The break is a wedge, not a short-circuit**. Even sub_821B6DF4 (deepest static-caller-less node) doesn't fire — AUDIT-049's wedge (handle 0x12A4 wait at 0x824ac578) blocks the main thread before activation propagates. Canary's fire happens AFTER intro-movie machinery completes; ours never reaches that phase. + +(c) **The actual missing piece is the vtable instance ctor**. sub_825070F0 reached via `r3->vtable[1]` at bctrl `0x824F7B20`. DB has **zero vptr_writes** recording vtables 0x8200A208/0x8200A928 — the ctor that writes them is computed-store-only OR in the unreachability island. Matches AUDIT-050: cluster is half-bootstrapped, vtable-writer subset dead. + +## Recommended AUDIT-059 + +**Pivot from caller-ladder chasing to unblocking the upstream wedge**. The 6-fn ladder isn't actionable in isolation — it's downstream of the AUDIT-049 main-thread stall. + +- **AUDIT-059A (surgical)**: extend canary patch to log writes of `0x8200A208`/`0x8200A928` into vptr slots — find the writer ctor. Cross-check ours via `--mem-watch` on the vtable instance address. Cascade: A=writer fires (likely if cluster reaches that phase), B=value matches canary, C=ctor PC identified, **D=draws>0 ~10-20%** (fixing one vtable dispatch unlocks one branch but underlying wedge persists). + +- **AUDIT-059B (γ-pivot, preferred)**: per AUDIT-049, handle `0x12A4` is the main-thread wedge handle (created by `sub_821CB030+0x128` NtCreateEvent on tid=13). Audit the producer of the matching NtSetEvent(0x12A4) in canary. This is a γ-cluster pivot. + +The most likely productive direction is **treating AUDIT-049 + AUDIT-057 as ONE unified γ-wedge investigation**. The vtable writer in 059A and the missing signaler in 059B may turn out to be the same fn that doesn't execute because the main thread is wedged. + +## Discipline + +xenia-rs HEAD `ac2f89a` unchanged. Canary HEAD `6de80dffe` clean post-revert. No stale processes. Trace `audit-runs/audit-058-sub825070F0-activation/{canary-sub825070F0.stdout, ours-ladder-probe.{stdout,stderr}}`. diff --git a/migration/claude-memory/project_xenia_rs_audit_2026_05_02.md b/migration/claude-memory/project_xenia_rs_audit_2026_05_02.md new file mode 100644 index 0000000..bf8dd4b --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_2026_05_02.md @@ -0,0 +1,104 @@ +--- +name: xenia-rs comprehensive audit (2026-05-02) — 197 IDs, swap regression solved +description: Complete 13-milestone read-only audit produced 197 finding IDs across 9 prefixes; isolated the post-P8 swap regression to PPCBUG-001 (addi truncation in commit bf8208e); identified VdSwap PM4 ring bypass + 5 P0 GPU shader bugs as renderer-blocker root causes +type: project +originSessionId: a3091846-1196-4ce0-b8b8-b0e57126f1aa +--- +## Headline outcome (2026-05-02 session) + +**Comprehensive 13-milestone audit of xenia-rs vs xenia-canary as ground truth. Read-only audit (M11 run-matrix had a one-time carve-out for git-bisection, HEAD restored). 197 finding IDs filed across 9 prefixes.** + +**Plan**: `/home/fabi/.claude/plans/we-just-did-a-cozy-honey.md` +**Charter**: `xenia-rs/audit-2026-05-charter.md` +**Final report**: `xenia-rs/audit-2026-05-final.md` ← **start here next session** +**Per-milestone reports**: `xenia-rs/audit-out/m{01..11}-*.md` (12 main + 4 M07 sub + 3 M09 sub + 5 M10 sub = ~24 files) +**Master tracker**: `xenia-rs/audit-findings.md` (3789 lines, prior session's PPCBUG entries + new M01-M11 sections) + +## SWAP regression SOLVED (2026-05-02) + +`SWAPBUG-001` (P0): the post-P8 swaps=2→1 regression is caused by **PPCBUG-001 (addi 32-bit truncation)** in commit `bf8208e` ("Phase 4 batch 3: PPCBUG-001/002/003/004/005/007 4b immediate ALU truncation"). The single `as u32 as u64` cast at `crates/xenia-cpu/src/interpreter.rs:114-118` is canary-divergent — canary does NOT truncate addi. Reverting just that one line restores swaps=2. + +Bisection trail (mechanically isolated at -n 100M lockstep): +- pre-P1, P1, P2, P3 → swaps=2 +- P4 (commit `d945aea`) → swaps=1 ← regression introduced +- Within P4: commits `f424132`, `e18a0a4`, `145a7a4` → swaps=2; **`bf8208e`** → swaps=1 +- Within bf8208e (hunk-level): only PPCBUG-001 revert restores swaps=2; PPCBUG-002/003/004/005/007 reverts leave swaps=1. + +Anomaly: PPCBUG-004 (mulli truncation) revert drops `interrupts_delivered` from 629 → 101 but doesn't change swaps. Filed as `SWAPBUG-002` (P2). + +## Renderer plateau (`draws=0`) explained (multi-causal) + +NOT a memory-subsystem bug. M06 verdict: **same-thread write-visibility is mechanically sound** (`heap.rs` derives `*mut u8` from same `membase` mapping; no per-thread cache; `bump_page_version` Release-store after byte store). + +Renderer plateau caused by a multi-component failure at GPU pipeline + kernel↔GPU seam: + +1. **VdSwap kernel-bypass** (P0, triple-confirmed): KRNBUG-Vd-04 (M07c) ↔ GPUBUG-001 (M09a) ↔ XMODBUG-013 (M10-X3). Our `vd_swap` zero-fills the reserved 64-dword ring slot with NOPs and calls `state.gpu.notify_xe_swap` directly, bypassing the ring. Canary writes a real PM4 sequence (Type-0 fetch-constant patch + PM4_XE_SWAP). Missing fetch-constant slot 0 means frontbuffer descriptor stays stale; Sylpheed's bloom/blur "sample frame N for frame N+1" path reads garbage. + +2. **WGSL shader interpreter operand-decode bugs** (3× P0): GPUBUG-100 (operand modifiers swizzle/abs/neg never read from word-1 — every ALU instr unmodified), GPUBUG-101 (`c#` constant-register selector bit masked off, every shader reads r[low7] instead of constants like WVP matrix), GPUBUG-102 (vertex fetch never applies GpuSwap endian, big-endian VBs decode as garbage). + +3. **draw_state register address bugs** (3× P0): GPUBUG-103/104/105. 8 of 26 register addresses misdecoded: VGT_DRAW_INITIATOR, VGT_DMA_BASE, VGT_DMA_SIZE, PA_SC_WINDOW_SCISSOR_TL/BR (reading SCREEN_SCISSOR), RB_COLOR_INFO_1/2/3, PA_SU_VTX_CNTL, index_size from bit 8 instead of bit 11. + +**Combined fix queue must land coherently** — partial fixes likely won't unblock visible rendering. See `audit-2026-05-final.md` recommended next sprint. + +## Parked-waiter handles (still unexplained) + +The 4 worker threads parked on `mr=true,sig=false` events (handles 0x1004, 0x100c, 0x15e4, 0x42450b5c) remain unsolved. M07a confirmed the wake pipeline (`nt_set_event` / `ke_set_event` / `wake_eligible_waiters`) is correct; producer side never fires. M10-X2 ruled out XAMBUG-001 (XamTaskSchedule) as the cause — only called once at startup, before workers spawn. + +**Recommended next investigation**: `--trace-handles` audit at -n 5B to confirm zero signals on these handles, then pivot to PPC-level trace of the singleton-handle storage addresses (e.g., 0x828F3F38 for tid=10's event handle in the singleton at +120). Likely intersects M03 PPCBUG-720..735 (control-flow corruption candidates). + +## ID summary (all 197) + +| Prefix | Total | P0 | P1 | P2 | P3 | Source | +|--------|-------|----|----|----|----|--------| +| ORACBUG | 8 | 1 | 3 | 2 | 2 | M01 (CIRCULAR goldens; sylpheed_n2m.json oracle blind) | +| PPCBUG (new, 701..735) | 32 | 1 | 6 | 7 | 18 | M02-M05 (decoder sound; M03 found regression candidates) | +| MEMBUG | 9 | 0 | 1 | 4 | 4 | M06 (write-visibility NOT broken) | +| KRNBUG | 77 | 3 | 11 | 28 | 35 | M07a/b/c/d (kernel HLE; VdSwap bypass + Mm cluster + Kf no-op) | +| XAMBUG | 16 | 2 | 1 | 7 | 6 | M08 (XamTaskSchedule + 13 async stubs + signin state) | +| GPUBUG | 33 | 6 | 12 | 8 | 7 | M09a/b/c (5 P0 shader/draw + VdSwap confirm + 3 P1 RT/texture) | +| XMODBUG | 22 | 1 | 6 | 5 | 10 | M10-X1..X5 (atomics race + write_bulk skip + IRQ timing seam) | +| ANLBUG | 1 | 0 | 0 | 1 | 0 | M11 (xenia-rs dis doesn't create SQL views by default) | +| SWAPBUG | 2 | 1 | 0 | 1 | 0 | M11 (PPCBUG-001 = swap regression cause; mulli IRQ anomaly) | +| **Total** | **197** | **15** | **40** | **63** | **82** | | + +## Key methodology wins + +1. **M01 oracle audit FIRST** caught ORACBUG-004 (sylpheed_n2m.json all-zero rendering metrics at -n 2M) before M11 needed to bisect — so M11 used -n 100M instead of -n 2M and the regression was visible. +2. **Hunk-level revert bisection** turned a 6-fix commit into 1-of-6 culprit identification in 12 minutes (~1 build + run per revert). +3. **Cross-module sub-audits independently re-derived findings** — VdSwap-bypass triple-confirmed (kernel/GPU/seam) raised confidence. +4. **M06 ruled out write-visibility** which redirected priority away from a multi-day rabbit hole. + +## Run matrix observations (-n 100M) + +- **Lockstep reproducibility**: noise floor zero except `packets` (±2.5% from GPU thread race). +- **lockstep + --reservations-table**: ~zero impact on digest at -n 100M. +- **--parallel** crashes interrupts_delivered from 629 → 2 (314× fewer vsync events) AND boosts packets to 3.87B (15× more PM4 work). Confirms KRNBUG-D08 / XMODBUG-011 (VSYNC_INSTR_PERIOD calibration broken under `--parallel`). +- **--parallel + --reservations-table**: pathologically slow (>32 min for -n 100M). Avoid this combo. + +## DuckDB state at audit close + +- Old DB archived: `xenia-rs/sylpheed.db.apr18.bak` (268 MB, Apr 18). +- Regenerated: `xenia-rs/sylpheed.db` (279 MB, May 2). Row counts identical to Apr 18 — static analysis path is deterministic across the entire P1-P8 audit-fix delta. +- ANLBUG-001 (P2): regenerated DB has NO application views (`v_branch_xrefs`, etc.). The schema-golden test creates them; the user-facing CLI does not (gated on `--analyze=Sql` or `--analyze=Both`, default is `Rust`). + +## Workspace state at audit close + +- HEAD: `caa37fc` (unchanged from session start). +- Tests: **551 passed, 0 failed**. +- Files modified: only audit output files (`audit-2026-05-charter.md`, `audit-2026-05-final.md`, `audit-out/m*.md`, `audit-runs/`, `audit-findings.md`). Zero `crates/` modifications. `sylpheed.db` regenerated; `sylpheed.db.apr18.bak` archived. + +## Recommended next sprint (from audit-2026-05-final.md) + +1. **Hour 0**: Revert SWAPBUG-001 (1 line at interpreter.rs:114-118). Confirm swaps=2 returns. +2. **Hour 1-2**: Add sylpheed_n50m.json golden (ORACBUG-004 fix) — protects future fixes from oracle blindness. +3. **Day 1-2**: Renderer P0 batch — VdSwap PM4 ring rewrite (KRNBUG-Vd-04 + GPUBUG-001 + XMODBUG-013). +4. **Day 3-5**: Shader P0 batch — GPUBUG-100/101/102 + GPUBUG-103/104/105 (operand modifiers + constant-reg selector + vertex endian + 8 register addresses). After Day 5, **renderer plateau should break** — Sylpheed should reach first visible frame. +5. **Day 6-7**: KRNBUG-Mm cluster + XAMBUG-001/002. +6. **Day 8+**: VSYNC timing recalibration + remaining P1s. + +## Cross-references to prior memory + +- Builds on: `project_xenia_rs_ppc_audit_2026_04_29.md` (the prior PPC audit; PPCBUG-001 from THAT audit is now SWAPBUG-001 cause). +- Builds on: `project_xenia_rs_addis_signext_root_cause_2026_04_29.md` (the addis fix; PPCBUG-001 was an over-extension of the same pattern to addi). +- Builds on: `project_xenia_rs_sylpheed_event_chain_2026_04_29.md` (parked-waiter chain; M10-X2 confirmed wake pipeline correct, ruled out XamTaskSchedule). +- Builds on: `project_xenia_rs_sylpheed_stage3_2026_04_29.md` (4-handle parked-waiter map; still unexplained at this audit's depth). diff --git a/migration/claude-memory/project_xenia_rs_audit_2026_05_followup_session.md b/migration/claude-memory/project_xenia_rs_audit_2026_05_followup_session.md new file mode 100644 index 0000000..82a3e00 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_audit_2026_05_followup_session.md @@ -0,0 +1,114 @@ +--- +name: audit-2026-05 follow-up session 2026-05-03 +description: post-fix-sprint follow-up — 3 audit IDs landed (GPUBUG-DRAIN-001, KRNBUG-AUDIT-001, KRNBUG-D08); parked-waiter producer-trace confirms hypothesis (A) — producer is genuinely missing, not a wake-eligibility bug +type: project +originSessionId: 6e3902ad-3b3c-44e9-9261-badd25e38ae8 +--- +**🎯 FOLLOW-UP SESSION COMPLETE (2026-05-03)** — 3 audit IDs closed +across 3 commits, all merged to master with `--no-ff`. + +**Why**: The 2026-05-03 fix sprint left two visible blockers — the +"PM4_XE_SWAP not consumed by drain" warning under +`--parallel --reservations-table`, and 4 parked-waiter handles +gating `draws=0`. This session diagnosed and fixed the warning, +landed a focused diagnostic that **decisively distinguishes +"missing producer" from "wake-eligibility bug"**, and converted +v-sync to wall-clock under `--parallel`. + +**How to apply**: Master HEAD post-session: `b54aa48`. Stable-digest +lockstep -n 100M digest BIT-IDENTICAL to pre-session aa3f1d3. + +## Headline data points + +- **Tests**: 556 → 561 (+3 wallclock-vsync, +2 ghost-trail) +- **Lockstep `--stable-digest` -n 100M**: bit-identical to master HEAD `aa3f1d3` + (`{instructions:100000002, imports:987685, swaps:2, draws:0, ...}`) +- **`--parallel` PM4 warning**: 2 → **0** +- **`--parallel` interrupts_delivered (-n 30M)**: ~2 → **17** (FIFO cap=4 still throttles) +- **Parked-waiter signal_attempts**: confirmed **0** for all 4 handles after 500M lockstep instructions + +## Three commits landed (master post-session HEAD `b54aa48`) + +1. **`7a1b6b3`** — `GPUBUG-DRAIN-001` (vd_swap PM4 fallback warning) + - New `GpuSystem::drain_until_wptr(target, time_budget)` mirrors canary `WorkerThreadMain` predicate (ring read != target). Inline `drain_to_current_wptr` switches to it. + - DrainFence handler in worker now publishes the digest mirror BEFORE replying so the CPU's post-drain `digest_snapshot` sees latest stats (was racing the outer-loop publish at line 619). + - **vd_swap NO LONGER injects PM4 packets into the ring**. Tail-injection is unreliable: under `--parallel` either backend, the ring backs up past 4096 / 900 ms before vd_swap can drain to its injected PM4_XE_SWAP packet at the tail. Direct `notify_xe_swap` is now the canonical path. Documented `GPUBUG-FETCH-PATCH-001` as deferred (slot-0 fetch-constant patch — bloom/blur N+1; no observable effect while draws=0). + +2. **`d1105aa`** — `KRNBUG-AUDIT-001` (focused parked-waiter ghost-trail diagnostic) + - New `--trace-handles-focus=` CLI flag (hex/decimal, comma-separated). Implies `--trace-handles`. + - `HandleAudit` gains `focus: HashSet` and `ghost_trails: HashMap`. `record_signal` auto-falls-through to ghost-trail capture when no primary trail exists AND handle is in focus. + - `dump_thread_diagnostic` emits a "=== Handle audit (focus) ===" section with per-handle DIAGNOSIS conclusions classified by GuestExport / KernelInternal source. `` marker for handles where `waiter_count > 0 && primary_waits == 0` (waiter parked via non-audited path). + - +2 unit tests in `audit::tests` covering ghost-trail behavior. + +3. **`27d3608`** — `KRNBUG-D08` (wall-clock v-sync under --parallel) + - `tick_vsync_instr(instr_count)` (legacy, used by lockstep) and `tick_vsync_wallclock()` (new, used by `--parallel`). `KernelState::parallel_active` flag selects. + - Wall-clock fires `floor(elapsed / VSYNC_PERIOD)` v-syncs and advances anchor by full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call. + - Lockstep determinism preserved (instr-count proxy is bit-stable; goldens unchanged). + - +3 unit tests covering anchor seeding, single-period fire, burst cap. + +## Parked-waiter trace — DECISIVE FINDING + +Run: `xenia-rs exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e4,0x42450b5c -n 500_000_000` (lockstep, 19 s wallclock). + +``` +handle=0x00001004 kind=Event/Manual waiters=1 signaled=false + signal_attempts=0 (primary=0, ghost=0) waits=1 wakes=0 + created cycle=0 tid=1 lr=0x824a9f6c src=NtCreateEvent + timeline: cycle=0 tid=10 lr=0x824ac578 src=do_wait_single[wait] + GuestExport=0 KernelInternal=0 waits=1 + => producer is a missing kernel signal source (or BST-paradox upstream) +``` + +Same shape for 0x100c (tid=2) and 0x15e4 (tid=16). Same creator +`lr=0x824a9f6c` for all 3 — single function creates them. Same wait-call +wrapper `lr=0x824ac578` — single wait wrapper. 3 sibling worker threads +all parked on "work-available" notifications. + +`0x42450b5c` shows `kind=` + ``. It's a +heap-pointer object (per +[stage3 memory](file:///home/fabi/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/project_xenia_rs_sylpheed_stage3_2026_04_29.md)) +and the waiter parks via a non-`do_wait_single`/`do_wait_multiple` path. + +**KEY CONCLUSION**: For the 3 Event/Manual handles, **the producer is +genuinely missing from the entire 500M-instruction execution**. +Hypothesis (A) confirmed; hypothesis (B) (PPC-vs-Rust BST-paradox) +RULED OUT for these specific handles. The renderer plateau is +**NOT** a wake-eligibility bug — it's a **missing kernel signal +source**. + +## Engineering gotchas saved across the sprint + +- **`drain_until_wptr` time budget** can fire under `--parallel`+inline (logged at debug because expected). Inline path was the user's `--ui --parallel` invocation; under `--parallel` workloads any inline drain will see 8-10M backlog packets. Time-budget exhaustion is normal there; the warning is for runaway IBs only. +- **DrainFence digest publish race**: outer-loop digest publish at handle.rs ~619 is gated by `did_work` from the per-iteration drive loop; DrainFence handler does NOT set that flag. Without the explicit publish at the end of the DrainFence body, CPU's post-drain `digest_snapshot` returns stale values. +- **Lockstep `interrupts_delivered` was already non-deterministic** — runs of -n 100M produce instructions ∈ {100000001..100000007} and imports ∈ {407665..407669}. The `--stable-digest` view excludes `interrupts_delivered`; `instructions`/`imports` are also excluded from oracle comparison via `--stable-digest`. So Phase 3's wall-clock change under `--parallel` is invisible to the oracle. +- **`KernelState.parallel_active` was added** as a runtime flag set at startup. It now drives `coord_pre_round`'s ticker selection and is the canonical place for any future "behave-differently-under-parallel" decisions. +- **`record_signal` auto-fallthrough to ghost-trail**: a single-line change (in audit.rs) avoided having to wire 8 separate hook sites in exports.rs. Reviewer confirmed this catches all signal sources via `state.audit_signal`. +- **3-handle creator `lr=0x824a9f6c`**: single producer location. Disassemble around that PC to find what function creates the 3 events; that function is also where the producer-trigger code SHOULD be wired but isn't. +- **GPUBUG-FETCH-PATCH-001** (deferred): re-enabling slot-0 fetch-constant patch (canary + `xboxkrnl_video.cc:438-521`) requires a side-channel (e.g. new `GpuCommand::PatchFetchConstant`) + rather than ring injection. Defer until draws > 0 (then bloom/blur N+1 starts to matter). + +## Recommended next session + +1. **Producer hunt for the 3 Event/Manual handles**. Identify guest function at the shared + wait-call wrapper `lr=0x824ac578`; walk its callers. Find what kernel signal source + SHOULD be wired for each handle. Likely candidates: file I/O completion + (`signal_io_completion_event`), XamTaskSchedule callback (deferred F2), XAudio + buffer-complete (`XAudioRegisterRenderDriverClient` is a one-shot stub), Timer DPC + delivery (KeSetTimer real impl but APC routing may be wrong). +2. **Raise INTERRUPT_QUEUE_CAP** for `--parallel`: 3044 dropped vsyncs at -n 30M --parallel + shows the FIFO is the next bottleneck once wall-clock fires correctly. +3. **F2/F3** (XAM async completion) — especially if Phase 2 of next session pinpoints a + missing XAM producer. +4. **KRNBUG-Mm cluster** still deferred from prior sprint. + +## Files of note + +- [`xenia-rs/crates/xenia-gpu/src/gpu_system.rs`](xenia-rs/crates/xenia-gpu/src/gpu_system.rs) — new `drain_until_wptr` +- [`xenia-rs/crates/xenia-gpu/src/handle.rs`](xenia-rs/crates/xenia-gpu/src/handle.rs) — inline drain rewire + DrainFence digest publish +- [`xenia-rs/crates/xenia-kernel/src/exports.rs`](xenia-rs/crates/xenia-kernel/src/exports.rs) — vd_swap simplified (no PM4 inject; direct notify) +- [`xenia-rs/crates/xenia-kernel/src/audit.rs`](xenia-rs/crates/xenia-kernel/src/audit.rs) — focus/ghost_trails +- [`xenia-rs/crates/xenia-kernel/src/interrupts.rs`](xenia-rs/crates/xenia-kernel/src/interrupts.rs) — `tick_vsync_instr`/`tick_vsync_wallclock` +- [`xenia-rs/crates/xenia-kernel/src/state.rs`](xenia-rs/crates/xenia-kernel/src/state.rs) — `parallel_active` field +- [`xenia-rs/crates/xenia-app/src/main.rs`](xenia-rs/crates/xenia-app/src/main.rs) — CLI surface + ticker dispatch + DIAGNOSIS report +- [`xenia-rs/audit-findings.md`](xenia-rs/audit-findings.md) — appended "Follow-up session 2026-05-03 — outcome" diff --git a/migration/claude-memory/project_xenia_rs_autonomous_run_synthesis_2026_05_10.md b/migration/claude-memory/project_xenia_rs_autonomous_run_synthesis_2026_05_10.md new file mode 100644 index 0000000..05407d6 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_autonomous_run_synthesis_2026_05_10.md @@ -0,0 +1,72 @@ +--- +name: Autonomous-mode run 2026-05-09/10 — synthesis (audits 044-047) +description: User granted 10 autonomous sessions on 2026-05-09. Sessions 5-8 used (audits 044-047, READ-ONLY). All four refuted speculative hypotheses; no fix landed. Methodological floor reached: every concrete wedge or probe target converges on the audit-009 cluster unreachability island, which is past Linux Debug canary's reach (RECONCILE-B host-presenter block). Sessions 9-10 deliberately NOT consumed — diminishing returns within Linux Debug + READ-ONLY discipline. User hand-back for path selection. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**📋 AUTONOMOUS RUN SYNTHESIS (2026-05-10, hand-back to user)** + +User grant on 2026-05-09: "After this session finished I permit you to complete up to ten more sessions on your own, without having to pause and ask for my approval." Sessions consumed in autonomous mode: 044, 045, 046, 047 (counted as 5-8 of 10; sessions 1-4 covered the M1-M12 + 5 closers overhaul work + audit-038 cache fix + audit-041/042/043 wedge probing). Remaining unused: 9-10. + +## What the autonomous run produced + +### Refuted hypotheses (4) + +1. **AUDIT-044 hypothesis "missing cluster constructor sub_8228F858 is bootstrap divergence"** → REFUTED by AUDIT-045 (T1/T2=0/0 in canary at 50s — cluster ctors don't fire in canary either at this horizon). + +2. **AUDIT-035 hypothesis "slot-pointer heap-region divergence (canary 0xBC3xxxxx vs ours 0x4024xxxx) is causally responsible for predicate failure"** → REFUTED by AUDIT-046 (predicate at 0x82450904 compares within each engine's own heap region; both engines run identical 5/5 iters and fall through to no-match exit). + +3. **AUDIT-034 hypothesis "canary 3.75/5 vs ours 5/5 loop iteration divergence"** → REFUTED by AUDIT-046 at current revision (both engines exhibit identical 5/5 loop shape with 0 early matches). + +4. **Implicit "γ-cluster wedge fix can independently open draws cascade"** → REFUTED by AUDIT-047 (best near-reachable signaler `sub_8245AD00` covers 4 wedges, but its callers sit in same audit-009 unreachability island that has blocked every renderer-hunt since audit-009 itself). + +### New reading-error class (13th) + +**Probe-firing-granularity divergence** (AUDIT-045): xenia-rs `--pc-probe` fires only at basic-block entry; canary `--log_lr_on_pc` fires per-instruction inline in HIR. Mid-block PCs systematically yield ours=0 even when the code executes equivalently. Mitigation: prefer function-entry or post-bl block-entry PCs. Workaround validated in AUDIT-046 (probed `0x82450908` post-bne block-entry instead of mid-block `0x82450904`). + +### Confirmed observations (no behavior change but structural truth) + +- **Cluster `0x82285000-0x82294000` activation is past Linux Debug canary's 50s horizon** — confirmed by 0/0 fires on cluster ctor PCs in canary AND ours. +- **Audio host-pump gap (AUDIT-032) is dramatic**: KeReleaseSemaphore = 0 ours / 73,914 canary in 90s; all from PC `0x824D229C = sub_824D21F0+0xAC`, all r3=0x828A3230 (XAudio mixer). Already named in AUDIT-032 — a known correctness issue. +- **125 signal-source fns total** (11 direct + 114 wrapper-routed). Only **2 of those 125 are statically reachable AND near a wedge wait-fn**: `sub_8245AD00` (covers 0x10A0/0x10A4/0x1530/0x1534) and `sub_82450218` (covers 0x1040). Both have unreachable callers. +- **Wedge convergence**: 10 NO_SIGNALS_DESPITE_WAITS handles inventoried; per-handle create-LR + wait-LR + expected-signaler tabulated in `audit-runs/audit-047-gamma-wedges/wedge-analysis.csv`. + +## What the autonomous run did NOT produce + +- No fix landed. +- No `swaps>2` or `draws>0` cascade. +- No new oracle (e.g., Wine/Lutris canary build). + +## Methodological floor reached + +Within (a) Linux Debug canary as oracle + (b) READ-ONLY discipline + (c) `--pc-probe` and `--log_lr_on_pc` as primary probes, autonomous mode has hit diminishing returns. Every concrete wedge or probe target converges on the same fact: **the audit-009 cluster activation gate is upstream of all visible divergence in Linux Debug canary at the 50s/15min horizons we can reach**. + +Linux Debug canary stops at intro-video frame ≈ 42/186 due to Vulkan/XCB host-presenter block (RECONCILE-B); Lutris Windows reaches frame 72/186 but is not instrumentable from our toolset. The cluster activation event happens **after** Linux's stall point. + +## Three paths forward (user hand-back) + +1. **Wine/Lutris canary build** (HIGH effort, HIGH potential): rebuild xenia-canary under Wine/Lutris on Linux to bypass the Vulkan/XCB host-presenter; rerun audit-045 T1/T2 probes there to capture cluster ctor activation chain post-intro. Requires non-trivial Linux-Wine dev environment setup; canary's CMake/Ninja toolchain must be portable to Wine cross-compile. Risk: high. Reward: only path to NEW evidence within current methodology. + +2. **Audio host-pump fix (AUDIT-032 known)** (MEDIUM effort, LOW renderer impact): implement host-side audio worker thread (60-120 LOC mirroring `xenia-canary/src/xenia/apu/audio_system.cc:84-159`) — closes the KeReleaseSemaphore 0-vs-73914 gap. Sharp cascade per AUDIT-032: A=tid 9 unparks on first sub_824D29F0:KeSetEvent(0x828A3254); B=tid 10 unparks on next sema release; C=XAudioSubmitRenderDriverFrame >0; D=KeReleaseSemaphore non-zero. **D=draws>0 explicitly NO** per AUDIT-032 methodology correction. Lands as canary-correctness restoration. + +3. **Different probing technique** (HIGH risk): guest-thread injection to force-call cluster ctor entry points and observe what guard predicates fail. Per APUBUG-PRODUCER-001 history, this caused HW-thread hijacks; high risk of regression. + +## Recommended user-decision + +If the goal is **draws > 0 cascade**, path 1 (Wine/Lutris canary) is the only credible route. Path 2 closes a known gap but explicitly doesn't reach draws. Path 3 is unsafe. + +If the goal is **closing canary-correctness gaps without draws focus**, path 2 is bounded and well-characterized. AUDIT-032's sister memory file has the fix sketch. + +The autonomous-mode run has done all the diagnostic work it can within its bounds. Sessions 9-10 of the 10-budget remain unconsumed — held in reserve for whichever path the user picks. + +## Trace continuity + +All four autonomous audits left clean (READ-ONLY) traces: +- `audit-runs/audit-044-m55-cluster-survey/` (DB queries + survey) +- `audit-runs/audit-045-cluster-ctor-probe/` (canary + ours probes) +- `audit-runs/audit-046-loop-exit/` (canary + ours probes) +- `audit-runs/audit-047-gamma-wedges/` (wedge inventory + canary signaler probe) + +Memory files: `project_xenia_rs_audit_044_*`, `_045_*`, `_046_*`, `_047_*`. MEMORY.md index updated. + +xenia-rs HEAD `7bc9e3a` unchanged across all four audits. Canary `6de80dffe` reverted clean each session. Tests count 645 (preserved from audit-038 baseline). swaps=2 draws=0 plateau intact. diff --git a/migration/claude-memory/project_xenia_rs_cli.md b/migration/claude-memory/project_xenia_rs_cli.md new file mode 100644 index 0000000..8971a3a --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_cli.md @@ -0,0 +1,178 @@ +--- +name: xenia-rs CLI Reference +description: Complete CLI commands, arguments, and environment variables for the xenia-rs tool — update this when CLI changes +type: project +originSessionId: 08576735-74b4-4180-994a-2eb93dc60997 +--- +> **Update trigger**: Whenever the xenia-rs CLI changes (new commands, flags, env vars), update this file and the MEMORY.md index entry. Last documented: 2026-04-22 (added `exec --halt-on-deadlock` + `XENIA_HALT_ON_DEADLOCK` env var for deadlock investigation — bypasses the force-wake recovery path so the ctx snapshot survives). + +## Binary +`xenia-rs` — Xbox 360 XEX/XISO reverse-engineering toolchain + +**CLI framework**: Clap 4.x with derive macros +**Entry point**: `xenia-rs/crates/xenia-app/src/main.rs` +**Observability module**: `xenia-rs/crates/xenia-app/src/observability.rs` + +--- + +## Global flags (apply to every subcommand) + +| Flag | Type | Effect | +|------|------|--------| +| `--log-json` | bool | Force JSON on the console fmt layer (default is compact text, stderr) | +| `--log-file ` | path | Additionally write logs to file via non-blocking appender. `.json` extension → JSON formatter; else text | +| `--log-filter ` | env-filter | Overrides `RUST_LOG`. Precedence: `--log-filter` > `RUST_LOG` > default (`warn` for `exec --quiet`, else `info`) | +| `--trace-chrome ` | path | Emit Chrome `about:tracing` JSON of all spans (uses `tracing-chrome`). Loadable in `chrome://tracing` / Perfetto | +| `--profile ` | path | Start pprof sampling profiler at 100 Hz. `.svg` → flamegraph; `.pb` → pprof protobuf. Requires `profiling` Cargo feature (on by default) | + +**Cargo features** on `xenia-app`: +- `profiling` (default): pulls `pprof = { features = ["flamegraph", "protobuf-codec"] }`. Disable with `--no-default-features` for minimal release builds; `--profile` then fails at startup with a clear error. + +--- + +## Commands + +### `disasm` — Disassemble XEX from entry point +``` +xenia-rs disasm [-n ] +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX file | +| `-n, --count` | usize | 64 | Number of instructions to disassemble | + +--- + +### `exec` — Load and execute XEX with tracing +``` +xenia-rs exec [-n ] [--ips-limit ] [--db ] [--trace-instructions] [--trace-imports] [--trace-branches] [--quiet] [--ui] [--halt-on-deadlock] +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX file | +| `-n, --max-instructions` | Option\ | none | Max instructions before stop (unlimited if omitted) | +| `--ips-limit` | Option\ | none | Throttle to N instructions per second (unlimited if omitted). Check runs once per scheduler round at `run_execution`'s outer loop — anchor is `Instant::now()` at function entry | +| `--db` | Option\ | none | SQLite DB path; includes full static analysis + opt-in trace tables | +| `--trace-instructions` | bool flag | false | Log each instruction to `exec_trace` table | +| `--trace-imports` | bool flag | false | Log kernel/import calls to `import_calls` table | +| `--trace-branches` | bool flag | false | Log taken branches to `branch_trace` table | +| `--quiet` | bool flag | false | Suppress banners, kernel logs, register dump | +| `--ui` | bool flag | false | Open winit+wgpu window for dynamic analysis; backs XamInputGetState with gilrs; presents guest frontbuffer on VdSwap; CPU runs on worker thread. HUD shows swap count + last frontbuffer addr + pad state. Phase 1: no PM4/shader execution, so the frontbuffer is typically black for real games — HUD remains live. | +| `--halt-on-deadlock` | bool flag | false | At the hard-deadlock branch in `run_execution` (all live HW threads `Blocked` on handle waits, no pending timer), emit a per-HW-slot `warn!` with `tid`/`state`/`pc`/`lr`/`sp` and break instead of force-waking waiters with `STATUS_TIMEOUT`. Increments `scheduler.deadlock_halts` metric; sets the UI shutdown flag so the window closes alongside the worker. Default is force-wake (preserved probe-run behaviour — counts as `scheduler.deadlock_recoveries`). Also settable via `XENIA_HALT_ON_DEADLOCK=1`. | + +--- + +### `browse` — Browse XISO disc image contents +``` +xenia-rs browse +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XISO file | + +--- + +### `info` — Display XEX header information +``` +xenia-rs info +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX file | + +--- + +### `extract` — Extract PE image and metadata from XEX +``` +xenia-rs extract [-o ] [--db ] +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX or ISO file | +| `-o, --output` | Option\ | input dir | Output directory | +| `--db` | Option\ | none | SQLite DB; writes `metadata`, `sections`, `imports` tables | + +--- + +### `dis` — Full disassembly with function detection and xrefs +``` +xenia-rs dis [-o ] [--db ] [--quiet] +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX or ISO file | +| `-o, --output` | Option\ | stdout | Output .asm file | +| `--db` | Option\ | none | SQLite DB; includes extract tables + `functions`, `labels`, `instructions`, `xrefs` | +| `--quiet` | bool flag | false | Suppress assembly text output (DB-only mode) | + +--- + +### `check` — Deterministic run digest + golden-diff regression detector (P8) +``` +xenia-rs check [-n ] [--out ] [--expect ] +``` +| Arg | Type | Default | Description | +|-----|------|---------|-------------| +| `path` | String (positional) | — | Path to XEX or ISO file | +| `-n, --max-instructions` | u64 | 2_000_000 | Instructions to execute before computing the digest | +| `--out` | Option\ | stdout | Write the 14-field JSON digest to this path | +| `--expect` | Option\ | none | Golden digest JSON; byte-for-byte (trimmed) diff against the run's output. Exits non-zero on mismatch with `expected vs actual` on stderr | + +**Digest fields** (stable order, one `u64` per line): `path`, `instructions`, `imports`, `unimpl`, `packets`, `draws`, `swaps`, `resolves`, `unique_render_targets`, `shader_blobs_live`, `interrupts_delivered`, `interrupts_dropped`, `texture_cache_entries`, `texture_decodes`. + +**Typical use**: Run once on a known-good build → commit the output as `run.digest.json`. CI re-runs `xenia-rs check … --expect run.digest.json`; non-zero exit blocks the PR on drift. + +--- + +## Environment Variables + +### `XENIA_DB_BATCH_SIZE` +- **Source**: `xenia-rs/crates/xenia-analysis/src/db.rs` (lines 35-45) +- **Type**: u64 +- **Default**: `100_000` +- **Validation**: Must be > 0; invalid values fall back to default +- **Effect**: Rows per streaming commit / trace buffer flush. `import_calls` always flushes at 1,000 (not configurable). + +### `RUST_LOG` +- **Source**: standard `tracing-subscriber` env filter +- **Default behavior**: `"warn"` when `exec --quiet`, otherwise `"info"` +- **Override**: Set to any tracing filter string (e.g. `debug`, `xenia_analysis=trace`) +- **Note**: `--log-filter` takes precedence over `RUST_LOG`. + +### `XENIA_FAKE_PAD` +- **Source**: `xenia-rs/crates/xenia-ui/src/input.rs` (`fake_pad_policy`) +- **Default**: enabled (simulated pad when no physical controller) +- **Disable with**: `XENIA_FAKE_PAD=0` (or `false` / `off`) +- **Effect**: when no gilrs pad is attached, `XamInputGetState` still returns `STATUS_SUCCESS` with an all-zero `X_INPUT_STATE` so games don't bail with `ERROR_DEVICE_NOT_CONNECTED`. Set to `0` to get truthful "no controller" reporting. + +### `XENIA_SCHED_ORDER` / `XENIA_SCHED_SEED` +- **Source**: `xenia-rs/crates/xenia-cpu/src/scheduler.rs` (`OrderMode::from_env`) +- **Default**: fixed 0..=5 +- **Effect**: `random` (with optional u64 `XENIA_SCHED_SEED`) shuffles round order for fuzzing thread interleavings. + +### `XENIA_HALT_ON_DEADLOCK` +- **Source**: `xenia-rs/crates/xenia-app/src/main.rs` (resolved at the top of `run_execution`) +- **Values**: `1` or `true` (case-insensitive) to enable; anything else = disabled +- **Default**: disabled (force-wake waiters with `STATUS_TIMEOUT`, increment `scheduler.deadlock_recoveries`) +- **Effect**: equivalent to `exec --halt-on-deadlock`. OR'd with the flag — either source is sufficient to trip the halt path. Use from shells / recipes without rewiring CLI args. + +### `RUST_LOG_SPAN_EVENTS` +- **Source**: parsed by `observability::parse_span_events` +- **Values**: `full | close | active | enter | exit | new` (anything else = none) +- **Effect**: Controls the `FmtSpan` setting on fmt layers. `close` is the most useful — every span emits a line with `time.busy`/`time.idle` elapsed, giving per-phase timing in the console without reading a Chrome trace. + +--- + +## Database Table Layering + +| Command | Tables | +|---------|--------| +| `extract --db` | `metadata`, `sections`, `imports` | +| `dis --db` | above + `functions`, `labels`, `instructions`, `xrefs` | +| `exec --db` | above + `exec_trace`*, `import_calls`*, `branch_trace`* | + +*Only written when corresponding `--trace-*` flag is passed to `exec`. + +**Why:** Cumulative schema lets analysis tools query across all levels without joining separate DBs. + +**How to apply:** When suggesting DB workflows, recommend the appropriate command tier for the user's analysis goal. diff --git a/migration/claude-memory/project_xenia_rs_concurrency_m1_progress.md b/migration/claude-memory/project_xenia_rs_concurrency_m1_progress.md new file mode 100644 index 0000000..4f64a94 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_concurrency_m1_progress.md @@ -0,0 +1,80 @@ +--- +name: xenia-rs concurrency rollout — M1 complete (2026-04-26) +description: All 10 M1 sub-steps landed. Default GPU backend is now threaded (worker thread on its own); `--gpu-inline` is the rollback. 395 workspace tests pass; sylpheed -n 2M golden matches in both modes; VdSwap=1/=2 fire end-to-end under threaded mode. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## What's landed (M1.1–M1.10) + +All 10 M1 sub-steps complete. Default GPU backend at runtime is **threaded** (`GpuBackend::Threaded`); `--gpu-inline` (or `--ui`, or `XENIA_GPU_INLINE=1`) selects the legacy synchronous path. + +### Key types and modules + +- **`xenia_gpu::GpuBackend`** — enum `Inline(GpuSystem) | Threaded(GpuHandle)`. Forwarding methods: `mmio()`, `as_inline[_mut]()`, `initialize_ring_buffer`, `enable_rptr_writeback`, `extend_write_ptr_by`, `drain_to_current_wptr`, `notify_xe_swap`, `has_pending_interrupts`, `take_pending_interrupts`, `digest_snapshot`. ([crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs)) + +- **`GpuCommand`** — `InitializeRing`, `EnableRptrWriteback`, `DrainFence{target_wptr, reply_tx}`, `NotifyXeSwap{frontbuffer_phys, width, height}`, `Shutdown`. + +- **`GpuHandle::send_cmd(cmd)`** wraps the raw `cmd_tx.send` with M1.7 parker discipline (set `wake_pending=true` Release + `unpark()` worker thread). + +- **`GpuWorker::run(Arc)`** — registers self as wake target, drains commands, syncs MMIO + executes packets in batches of 64, refreshes `Arc>` for the CPU-side digest, drains `pending_interrupts → int_tx`, parks via `park_timeout(16ms)` when idle. + +- **`spawn_gpu_worker(worker, Arc) -> JoinHandle`** spawns the worker; `shutdown_and_join_with_timeout` joins with 1 s defensive timeout. + +### Memory model + +- **`GuestMemory.page_table: Vec`** with per-page Acquire/Release. `alloc`, `is_mapped`, `page_entry`, `write_bulk`, `translate_virtual_mut` all `&self`. +- **`GuestMemory.writes_total: AtomicU64`** + **`page_versions: Vec`** with Release on bump, Acquire on read. +- **`MemoryAccess::write_u32_fence` / `read_u32_fence`** (M1.8) — Release fence before the write / Acquire fence after the read. Migrated `EVENT_WRITE_SHD` and `writeback_read_ptr` to use the fenced variants. +- **All `MemoryAccess` writes take `&self`** post the M1.4(b) handoff. ~140 `&mut GuestMemory` callsites swept across 10 files. `GuestMemoryPcr<'_>` callsites use `&mut` because `PcrWriter::write_pcr_id(&mut self, ...)`. + +### Concurrency primitives (live in production) + +- **MMIO mailboxes** (`Arc` × 5): `cp_rb_wptr`, `cp_rb_rptr`, `cp_int_status`, `cp_int_ack`, `d1mode_vblank_vline_status`. Release on writer / Acquire on reader. +- **`GpuMmio.wake_pending: Arc`** + **`worker_thread: Arc>>`**. WPTR write callback sets+`unpark()`s; worker swaps→park. +- **`crossbeam_channel::unbounded`** for cmd_tx/cmd_rx and int_tx/int_rx. +- **`bounded(1)`** reply channels for `DrainFence` (CPU's `recv_timeout(1s)` + worker's `Instant`-based 900 ms internal deadline). +- **`Arc>`** refreshed once per worker iteration; CPU reads via `digest_snapshot()`. + +### CLI / env defaults + +``` +default → threaded +--gpu-inline (or XENIA_GPU_INLINE=1) → inline +--gpu-thread (or XENIA_GPU_THREAD=1) → threaded (explicit) +--ui → forces inline (UI worker not yet shared-mem-aware) +``` + +### Verification (all green) + +| Check | Result | +|---|---| +| `cargo build --workspace` | clean | +| `cargo test --workspace` | 395 passed, 0 failed | +| `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded) | matches | +| Same with `--gpu-inline` | matches | +| `xenia-rs exec sylpheed.iso -n 30_000_000 --halt-on-deadlock` (default = threaded) | exit 0 | +| VdSwap=1 + VdSwap=2 under threaded mode | both fire (~18M + ~28M cycles) | +| GPU worker shutdown clean within 1 s | yes | + +Beyond ~50M instructions both threaded and inline modes hit the same `RtlRaiseException` pre-existing bug (unrelated to concurrency rollout). + +### Known limitations / deferred + +- **`--ui` + threaded backend**: `cmd_exec_inner` panics if both are set; `--ui` auto-forces inline. Rationale: `run_with_ui` consumes `GuestMemory` by value; migrating it to `Arc` is a separate work item. +- **Inline path retained**: kept as the rollback rail and the `--ui` path. M1.10 cleanup deferred to post-M3 per plan. +- **Beyond ~50M instructions**: both modes hit a pre-existing `RtlRaiseException`. Not a regression. + +### Next milestone (M2) + +`KernelStateInner + Arc>` refactor, per-slot `Mutex`, `ThreadRef` generation packing, `ReservationTable` for `lwarx`/`stwcx.`. Some M2 work was pulled forward by M1.4 (page_table atomization) — that's already complete. + +### Files of note + +- [crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs) — `GpuBackend`, `GpuCommand`, `GpuHandle::send_cmd`, `GpuWorker::run`, `GpuDigestSnapshot`, parker +- [crates/xenia-gpu/src/gpu_system.rs](xenia-rs/crates/xenia-gpu/src/gpu_system.rs) — `GpuMmio` with `wake_pending` + `worker_thread`; `EVENT_WRITE_SHD` / `writeback_read_ptr` use fenced writes +- [crates/xenia-gpu/src/mmio_region.rs](xenia-rs/crates/xenia-gpu/src/mmio_region.rs) — `CP_RB_WPTR` write callback sets `wake_pending` + `unpark()`s worker +- [crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs) — `Vec` page table, `&self` writes +- [crates/xenia-memory/src/access.rs](xenia-rs/crates/xenia-memory/src/access.rs) — `write_u32_fence` / `read_u32_fence` +- [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `KernelState::with_gpu(GpuBackend)` +- [crates/xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — `vd_swap` rewritten to use `GpuBackend` accessors; UI publish gated on `as_inline_mut()` +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — backend selection, worker spawn+join, `Arc` wrap diff --git a/migration/claude-memory/project_xenia_rs_concurrency_m2_progress.md b/migration/claude-memory/project_xenia_rs_concurrency_m2_progress.md new file mode 100644 index 0000000..53ef2bc --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_concurrency_m2_progress.md @@ -0,0 +1,123 @@ +--- +name: xenia-rs concurrency rollout — M2 substantively complete (2026-04-26) +description: M2.1–M2.5 + M2.8 landed; M2.6 (KernelStateInner) + M2.7 (per-slot Mutex) deferred to M3 because they only matter when host threads exist. Page versions atomic, ReservationTable built (with stress test), ThreadRef carries generation, bump allocators atomic, per-slot pending_local_irq[6] AtomicU8 wired, --reservations-table CLI flag flips runtime atomic. 405 tests pass, sylpheed -n 2M golden matches under all flag combos. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## What landed + +### M2.1 — atomic page versions + Acquire/Release for cache invalidation + +Already complete from M1.4 — `page_table: Vec`, `writes_total: AtomicU64`, `page_versions: Vec` in [crates/xenia-memory/src/heap.rs](xenia-rs/crates/xenia-memory/src/heap.rs). Block cache and texture cache call `mem.page_version(...)` which is `Acquire` load on the live `GuestMemory` impl. + +### M2.2 — `ReservationTable` for lwarx/stwcx + +New module: [crates/xenia-cpu/src/reservation.rs](xenia-rs/crates/xenia-cpu/src/reservation.rs). Banked `Vec` (4096 banks × 8 B = 32 KiB), `(line_addr, generation, hw_id)` packed per slot. Hash collisions invalidate conservatively (matches Xenon L2 associativity). Memory ordering: `AcqRel` on the line CAS / swap; `Relaxed` on the active-reserver counter. + +API: +- `reserve(addr, hw_id) -> u32` — claim a slot, returns the generation stamped. +- `try_commit(addr, my_gen, my_hw_id) -> bool` — CAS-clear the slot if it still matches. +- `invalidate_for_write(addr)` — plain-store hook to invalidate the line. +- `has_active_reservers() -> bool` — fast-path skip on writes when zero. + +9 unit tests including an 8-thread stress test (`concurrent_lwarx_stwcx_serializes`) that proves only one stwcx can win per round. Lives behind `--reservations-table` flag (M2.8); the interpreter's `lwarx`/`stwcx.` arms still use the legacy per-`PpcContext` fields. M3 will hook the table into the interpreter when host threads spawn. + +### M2.3 — `ThreadRef` generation packing + +[crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs:52-79): +```rust +pub struct ThreadRef { + pub hw_id: u8, + pub generation: u8, + pub idx: u16, +} +``` + +Total 4 bytes, no padding. 256 reuses per slot before wraparound; `PRUNE_DEPTH_THRESHOLD = 4` keeps slots shallow so this is plenty. M2 leaves generation at `0` on every spawn — no concurrent `swap_remove` happens before M3 so ABA can't occur. M3's migration-fixup site will bump generations. + +Constructors `ThreadRef::new(hw_id, idx)` and `ThreadRef::with_generation(hw_id, idx, generation)`. ~30 existing literal sites adapted (several converted to `ThreadRef::new(...)`, others got an explicit `generation: 0` field). + +### M2.4 — bump allocators to atomics + +In [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs): +- `next_handle: AtomicU32` (start `0x1000`, `fetch_add(4, Relaxed)`) +- `next_thread_id: AtomicU32` +- `next_tls_index: AtomicU32` +- `heap_cursor: AtomicU32` +- `stack_cursor: AtomicU32` + +`heap_alloc` / `stack_alloc` use `fetch_add(size, Relaxed)` then verify post-bump invariants. A failed alloc near the limit leaves the cursor advanced (matches pre-M2 behavior — game-over either way). New unit test `concurrent_alloc_handle_distinct` (10 threads × 100 allocations → 1000 distinct handles). + +### M2.5 — per-slot `pending_local_irq` (preview) + +In [crates/xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs): +```rust +pub type PendingLocalIrq = [AtomicU8; HW_THREAD_COUNT]; + +pub struct InterruptState { + // ... existing fields ... + pub pending_local_irq: PendingLocalIrq, +} +``` + +Field exists, `Default::default()` initializes to all zeros. Unused in M2's lockstep path; M3 will set bits Release on the target slot's atomic and the target T_cpu_i will Acquire-load at quantum boundary. + +### M2.8 — reservation table activation flag + +`--reservations-table` CLI flag on `Exec` and `Check`. `XENIA_RESERVATIONS_TABLE=1` env var fallback. When set, `kernel.reservations_enabled` flipped to `true` (Release). Always-allocated `kernel.reservations: Arc` (every kernel has one; it's free until used). + +Interpreter wiring is M3 work — for now the flag is observable but doesn't change `lwarx`/`stwcx.` semantics. Verified: golden matches under `--reservations-table`, `--gpu-inline --reservations-table`, etc. + +## Deferred to M3 (with rationale) + +### M2.6 — `KernelStateInner` + `Arc>` + +The plan calls this "the big mechanical step" — change ~98 export signatures from `&mut KernelState` to `&mut KernelStateInner`. **Deferred to M3 because:** + +1. Under M2's single-threaded execution, the lock would never contend — the refactor delivers zero observable benefit. +2. The locking discipline (lock per HLE call vs. lock per round) only becomes load-bearing once multiple host threads exist. Designing it without those callers risks designing the wrong API. +3. The M3 spawn work *is* the natural integration point: spawning per-HW-thread workers and granting them concurrent kernel access are inseparable. + +Bundled with M3 work on the next session. + +### M2.7 — per-slot `Mutex` + `SchedulerTopology` RwLock + +Same rationale: the per-slot mutex is invisible until multiple T_cpu_i exist. The lock-ordering proof from the plan (ascending `hw_id`, topology RwLock above per-slot Mutex) only becomes verifiable under genuine parallelism. Bundled with M3. + +## Verification (all green) + +| Check | Result | +|---|---| +| `cargo build --workspace` | clean | +| `cargo test --workspace` | 405 passed, 0 failed | +| `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded) | matches | +| Same with `--gpu-inline` (rollback) | matches | +| Same with `--reservations-table` | matches | +| Same with `--gpu-inline --reservations-table` | matches | +| `concurrent_lwarx_stwcx_serializes` (8 threads × 1000 rounds, ReservationTable stress) | passes | +| `concurrent_alloc_handle_distinct` (10 threads × 100 allocs, AtomicU32 next_handle) | passes | +| `write_u32_fence_publishes_prior_writes` (M1.8 fence test, hardened with AtomicU32 storage) | passes | + +Tests grew from 395 (post-M1) to 405 (+10 new substep tests). + +## Files of note + +- [crates/xenia-cpu/src/reservation.rs](xenia-rs/crates/xenia-cpu/src/reservation.rs) — banked `ReservationTable` with stress tests +- [crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs) — `ThreadRef` gen-packed, `ThreadRef::new` constructor +- [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — atomic bump allocators, `reservations: Arc`, `reservations_enabled: AtomicBool` +- [crates/xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs) — `pending_local_irq: [AtomicU8; 6]` +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — `--reservations-table` flag wiring, kernel construction +- [crates/xenia-gpu/src/handle.rs](xenia-rs/crates/xenia-gpu/src/handle.rs) — fence test fixture rewritten to use `AtomicU32` slots (was flaky `Cell`) + +## Next milestone (M3) + +The M3 spawn work bundles: +1. **`KernelStateInner` split** (carryover from M2.6). `Arc>`. ~98 export sigs. +2. **Per-slot `Mutex`** (carryover from M2.7). Lock order ascending by `hw_id`. `SchedulerTopology` RwLock for cross-slot ops. +3. **Phaser primitive** for quantum-based barrier sync. +4. **6 `HwHostThread`s** spawned. Wakeup-on-signal via `slot_wake[6]` + `unpark()`. +5. **IRQ injection routed through `pending_local_irq[6]`** — target T_cpu_i self-injects. +6. **Reservation table activation** in `lwarx`/`stwcx.` arms. +7. **Sylpheed parallel boot** verification + 100x stress test. + +The plan's verification matrix at M3 done: golden matches under `--lockstep` (single-host-thread; M2 still matches). Parallel mode reaches VdSwap=2 with `deadlock_halts == 0` (digest *will* differ from lockstep — expected and documented per the existing thread-interleaving-divergence note in the perf memory). diff --git a/migration/claude-memory/project_xenia_rs_concurrency_m3_progress.md b/migration/claude-memory/project_xenia_rs_concurrency_m3_progress.md new file mode 100644 index 0000000..d927df0 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_concurrency_m3_progress.md @@ -0,0 +1,84 @@ +--- +name: xenia-rs concurrency rollout — M3.1 + per-thread block-cache substrate landed (2026-04-26); M3.2–M3.8 deferred +description: Phaser primitive + per-HW-slot block caches landed (M3.1, M3.2a). The remaining seven substeps (per-slot Mutex, KernelStateInner split, host-thread spawn, slot wakeups, IRQ routing, reservation interpreter wiring, parallel stress test) are interdependent and require focused dedicated sessions to land safely with per-step verification. Deferred work is precisely scoped below for the follow-up. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## What landed this session + +### M3.1 — Phaser primitive + +[crates/xenia-cpu/src/phaser.rs](xenia-rs/crates/xenia-cpu/src/phaser.rs). Custom barrier-with-skip; 6 unit tests pass: + +- `n_arrivers_all_advance` — basic barrier semantics +- `skip_counts_toward_advance` — skipping participants count toward advance +- `shutdown_wakes_arrivers` — clean tear-down via `Phaser::shutdown()` +- `timeout_fires_when_peer_hangs` — defensive timeout returns `PhaserOutcome::Timeout` +- `multi_phase_progress` — 6 threads × 1000 phases, no deadlock, generation counter consistent +- `mixed_skip_and_arrive_random` — pseudo-random skip/arrive across 200 phases + +Memory ordering: phase counter is `Release`/`Acquire`. Participant count under `Mutex` + `Condvar`. The skip-counts-toward-advance design lets idle slots park on their own wake mechanism without stalling the phaser. + +### M3.2a — Per-HW-slot block caches + +[crates/xenia-app/src/main.rs:1228](xenia-rs/crates/xenia-app/src/main.rs#L1228): +```rust +let mut block_caches: [BlockCache; HW_THREAD_COUNT] = + std::array::from_fn(|_| BlockCache::new()); +``` + +Dispatch site at [main.rs:1651](xenia-rs/crates/xenia-app/src/main.rs#L1651) routes through `block_caches[hw_id as usize]`. Bit-identical correctness in single-threaded mode (it's just 6 independent caches on one thread); eliminates cross-slot races for the eventual host-thread spawn. + +Lockstep golden at -n 2M: matches. + +## Verification + +- `cargo build --workspace`: clean +- `cargo test --workspace`: 411 passed, 0 failed (was 405 post-M2; +6 from phaser tests) +- `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default = threaded GPU): matches +- Same with `--gpu-inline`, `--reservations-table`, `--gpu-inline --reservations-table`: all match + +## Why M3.2b–M3.8 are deferred + +The remaining substeps are individually invasive and **interdependent** — none of them deliver observable end-to-end value without the others. Splitting them across separate sessions with focused verification is more responsible than racing through them in a single pass. + +| Substep | Why it's a focused session of its own | +|---|---| +| **M3.2b** Per-slot `Mutex` | The scheduler holds `slots: [HwSlot; 6]`; many internal accesses are `&mut self` patterns that don't compose with `MutexGuard` lifetimes. Refactor touches ~30 callsites in `scheduler.rs` + several external accessors that hold borrows across method boundaries. | +| **M3.3** `Arc>` wrap | Either wrap the whole struct (~98 export sigs unchanged but every callsite needs guard threading) or split into `KernelStateShared` + `KernelStateInner` (the plan's design — ~98 export sig changes mechanical but workspace-wide). Either path is a substantial single-purpose session. | +| **M3.4** Spawn 6 host threads | Requires M3.2b + M3.3 as substrate. The spawn body itself is a 200–400 line replacement of the per-round portion of `run_execution`. | +| **M3.5** Idle-slot wakeups | Requires M3.4. Adds `slot_wake[6]: AtomicBool` + Thread handles + `unpark()` calls at every `KeSetEvent`/`KeReleaseSemaphore` site. | +| **M3.6** IRQ via `pending_local_irq` | Requires M3.4. M2.5 already wired the AtomicU8 array; M3.6 changes the producer side (T_main / GPU thread sets bits) and consumer side (T_cpu_i checks bits at quantum boundary, self-injects). | +| **M3.7** Activate reservations in interpreter | Requires threading `hw_id` + `Arc` reference into the interpreter dispatch. PpcContext doesn't currently carry `hw_id`, and `step`/`step_cached`/`step_block` don't take a table. Each path needs a parameter, and there are many test callers. | +| **M3.8** 100× parallel stress test | Requires M3.4–M3.7. | + +## What's already in place from M1+M2 that M3 will use + +- **Page versions atomic** (M1.4/M2.1): `Vec`, Release/Acquire on per-page slots. +- **Page table atomic** (M1.4): `Vec`, lock-free `alloc(&self)`. +- **`MemoryAccess::write_u32_fence` / `read_u32_fence`** (M1.8): Release/Acquire fence helpers used by EVENT_WRITE_SHD / RPTR writeback. +- **GPU on its own host thread** (M1.4–M1.10): `Arc>` + parker via `Arc` `wake_pending` + `unpark()` from MMIO callback. +- **`ReservationTable`** (M2.2): banked AtomicU64, 4096 banks, `(line, generation, hw_id)`. Stress-tested with 8 concurrent host threads. Lives at `kernel.reservations: Arc`. Activation flag at `kernel.reservations_enabled: AtomicBool` (settable via `--reservations-table` or `XENIA_RESERVATIONS_TABLE=1`). +- **`ThreadRef` generation packing** (M2.3): 1+1+2 = 4 bytes; `::new(hw_id, idx)` constructor. +- **Atomic bump allocators** (M2.4): `next_handle`, `next_thread_id`, `next_tls_index`, `heap_cursor`, `stack_cursor` all `AtomicU32`. +- **`pending_local_irq: [AtomicU8; 6]`** (M2.5): wired in `InterruptState`; M3.6 will start using bits. +- **Phaser primitive** (M3.1): `arrive_and_wait` / `skip` / `shutdown` / `arrive_and_wait_timeout` API. +- **Per-HW-slot block caches** (M3.2a): `[BlockCache; 6]` indexed by `hw_id`. + +## Next session resumes at M3.2b + +The natural ordering for the deferred work: + +1. **M3.2b** Per-slot `Mutex`. The scheduler internally locks per slot; external API stays method-based (no guard leakage). Verify lockstep golden bit-identical. +2. **M3.3** `Arc>` wrap (start coarse — single mutex around the whole struct; refine later if needed). Verify lockstep golden. +3. **M3.4** Spawn N host threads under `--parallel` flag. Each acquires the kernel lock for HLE, drops for instruction execution, syncs at the phaser. Verify sylpheed boots; halts==0. +4. **M3.5** Slot wakeup primitives. M3.4 will park on idle; M3.5 unparks on signal. +5. **M3.6** IRQ routing per slot. +6. **M3.7** Reservation interpreter wiring. PpcContext gets `hw_id` field + `Option>`. +7. **M3.8** 100× sylpheed stress run. + +## Files of note + +- [crates/xenia-cpu/src/phaser.rs](xenia-rs/crates/xenia-cpu/src/phaser.rs) — phaser primitive (M3.1) +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — per-thread block caches (M3.2a) at lines 1228 and 1651 +- [crates/xenia-cpu/src/lib.rs](xenia-rs/crates/xenia-cpu/src/lib.rs) — re-exports `Phaser`, `PhaserOutcome` diff --git a/migration/claude-memory/project_xenia_rs_cpp_runtime_audit_2026_05_06.md b/migration/claude-memory/project_xenia_rs_cpp_runtime_audit_2026_05_06.md new file mode 100644 index 0000000..84a5837 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_cpp_runtime_audit_2026_05_06.md @@ -0,0 +1,44 @@ +--- +name: C++ runtime audit (CPPBUG-AUDIT-001) 2026-05-06 +description: Read-only audit of MSVC C++ runtime support in xenia-rs vs canary. Top-3 candidates for explaining audit-011's vtable=0 finding REFUTED by audit-012 (vtable is correctly initialized; misread on audit-011's part). Independent correctness gaps remain as background-work backlog. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🔍 CPPBUG-AUDIT-001 (2026-05-06, READ-ONLY)** — comprehensive audit of MSVC C++ runtime support. Spawned in parallel with audit-012 to investigate "missing C++ runtime features" hypothesis for the audit-011 vtable=0 symptom. Audit-012 falsified the vtable=0 premise itself, so this audit's Top-3 candidates are moot for THAT specific bug — but several real correctness gaps were identified as background work. + +## Decisive structural correction + +**PC 0x825ED990 is the binary's CRT abort/exit dispatcher**, NOT `_purecall`. Disasm at 0x825ED990..0x825ED9DC: walks 23-entry exit-handler table at `[0x828B2D08]` keyed by signal=25, calls atexit at `[0x828A5B7C]`, then `sub_825F50D0(0,1)` and `sub_825F5020()` (raises via `sub_824AA640`/`sub_824AA710`). MSVC `abort()` / `_amsg_exit` equivalent. Corrects audit-010's "apparent __purecall/abort handler" attribution carried since. + +**Sylpheed has its CRT statically linked.** Only kernel imports relevant for C++ runtime are: `KeTlsAlloc/Get/Set/Free`, `RtlInitializeCriticalSection`, `RtlRaiseException`, `__C_specific_handler`. The C++ runtime question is narrower than initially feared. + +## Top-3 candidates (REFUTED by audit-012) + +1. `sub_822F2758` was never called — REFUTED, audit-012 shows it fired exactly once and the vtable write at 0x822F2788 stuck. +2. Ctor ran but `stw` silently dropped (`nt_allocate_virtual_memory` + `heap.rs:465` silent-drop combo) — REFUTED, write transitions show monotonic 0 → 0x820AD894 → 0x820A183C. +3. Throw inside ctor bypasses unwind (`RtlRaiseException` stub) — REFUTED, no zeroing event observed across 500M. + +## Independent correctness gaps (real, not blocking renderer hunt) + +| Area | Issue | File:line | +|------|-------|-----------| +| `nt_allocate_virtual_memory` | Returns SUCCESS on alloc failure for non-overlap reasons (page-misalign, out-of-range) | exports.rs:622-625 | +| `heap.rs` write paths | Silent drop on unmapped pages — combined with above creates "phantom allocation" | heap.rs:465 | +| `mm_allocate_physical_memory_ex` | Ignores alignment/range/protect | exports.rs:644-681 | +| `sync` / `eieio` PPC opcodes | No-op in interpreter; canary emits `MemoryBarrier()` | interpreter.rs:1697 vs canary ppc_emit_memory.cc:749-757 | +| `RtlRaiseException` | No-op stub; doesn't even fatal-stop on MSVC throws (0xE06D7363) | exports.rs:2218-2221 | +| TLS storage | Uses `Vec`; canary uses u32. Functionally OK. | xboxkrnl_threading.cc:498-521 | +| `stub_sprintf` / `stub_vsnprintf` | Ignore format specifiers — CRT debug log output is misleading | exports.rs | +| Heap | Bump-only, no free | state.rs:701-719 | + +## Top-leverage diagnostic for future use + +TRACE-gated log on unmapped writes in `heap.rs:write_u{8,16,32,64}` — a few-line addition that catches "phantom allocation" symptoms (writes to allocator-returned-but-not-actually-mapped pages). Should be standing infrastructure given the silent-drop class of bugs. + +## How to use this memory + +When audit-012's listener fix lands and the cascade resumes, the renderer-side bugs that surface may interact with the gaps above (esp. memory ordering / `sync` semantics for cross-thread GPU-CPU). Treat this as a checklist for "first things to suspect" once draws > 0 lands. + +These items are NOT urgent for the swap=2 / draws=0 plateau. Track as background-work backlog. + +Master HEAD `50a4887` unchanged. No commits. No code modified. diff --git a/migration/claude-memory/project_xenia_rs_current_state.md b/migration/claude-memory/project_xenia_rs_current_state.md new file mode 100644 index 0000000..86d5bc9 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_current_state.md @@ -0,0 +1,76 @@ +--- +name: xenia-rs current state — boot/render progress + active blockers +description: Where Sylpheed boot sits now (2026-04-24 — IRQ-injection stack-pad + full-volatile register save fix lands; second VdSwap fires) +type: project +originSessionId: 5465978c-b9ad-47fb-ab6d-e8e3053646af +--- +## What works end-to-end (2026-04-24) + +Sylpheed **now reaches its second `VdSwap` (first real frame)**. Previous sessions stopped at the splash frame (`VdSwap=1`) because our graphics-interrupt injection was stomping the interrupted thread's stack-saved LR — see "Root cause" below. + +Observed after the fix, at 3 B instructions: + +- `VdSwap frame=1` splash at ~18 M cycles, `VdSwap frame=2` at ~28 M cycles +- `scheduler.deadlock_halts = 0`, `deadlock_recoveries = 0` (clean) +- All 351 workspace tests green +- tid=5 stays alive past cycle 7.5 M (was exiting there pre-fix) +- `RtlEnterCriticalSection` / `LeaveCriticalSection` dropped ~1300× versus pre-fix (was the symptom of the corruption, not the cause) + +## Root cause — IRQ injection was stomping `[r1 - 8]` on the interrupted thread's stack + +The graphics-interrupt injector (`try_inject_graphics_interrupt` in [main.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-app/src/main.rs)) overwrote `pc`/`lr`/`r3`/`r4` on whichever thread it picked, but left `r1` (SP) untouched. The ISR callback's prologue immediately does `mflr r12; bl __savegprlr_N` where `__savegprlr_N` stores `r12` (= `LR_HALT_SENTINEL`, just set by injection) at `[r1 − 8]`. That slot is **exactly where the interrupted function's own prologue had saved its caller's return address** (standard PPC savegprlr layout). When the interrupted function eventually ran its `__restgprlr_N` tail → `bclr`, it loaded `SENTINEL` into LR and jumped there, silently terminating the thread through the halt-sentinel path rather than the intended return. + +Observed concretely: tid=5 hit `LR_HALT_SENTINEL` via `from_pc=0x825f0ff0` (the shared `restgprlr` bclr) with `r12=0xBCBCBCBC` — i.e. the value it just read from the stack at `[r1 - 8]`. Six normal vsync-ISR returns had `ctr=0x821753c8` (ISR-path resolved correctly); the 7th exit had `ctr=0x00000000` — this one was `sub_82458B90` returning with the stack-saved LR clobbered. Matches canary's workaround at [`Processor::Execute`](../../../RE%20Project%20Sylpheed/xenia-canary/src/xenia/cpu/processor.cc#L383) (lines 381–394) which decrements `r[1]` by 64 + 112 = 176 before calling the ISR callback and restores after — the comment says "games seem to overwrite the caller by about 16 to 32b," with the pad sized generously. + +## The fix (2026-04-24) + +[`CALLBACK_STACK_PAD = 176`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/interrupts.rs) applied in two places: + +1. **`SavedCallbackCtx`** now captures/restores **all PPC volatile GPRs** (`r0`, `r2`–`r12`) plus `r1` (SP), `pc`, `lr`, `ctr`, and `cr`. The non-volatile set (`r13`–`r31`) is preserved by the callback's own `__savegprlr_N` prologue/epilogue per the PPC ELF ABI, so it doesn't need stashing. +2. `try_inject_graphics_interrupt` decrements `ctx.gpr[1]` by `CALLBACK_STACK_PAD` **after** `SavedCallbackCtx::capture` (so the saved `r1` is the pre-inject value) and **before** setting `pc = callback_pc`. The callback now prologues into `[injected_r1 − 176 − 8]` instead of stomping `[injected_r1 − 8]`. On return, `SavedCallbackCtx::restore` puts `r1` back. + +Thin unit-test coverage: the existing `inject_restore_roundtrip_smoke` test in [interrupts.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/interrupts.rs) still passes (just a smoke test for pc/lr/r3-r4 roundtrip); extending it to cover the new SP + r0/r2/r7-r12 paths would be a cheap follow-up. + +## Concrete next-session blockers (post-fix) + +tid=5 is now alive, progresses through multiple work items, and drives the data-stream decoders (`sub_8280AD40` = inflate, `sub_828085E0` = Adler-32, `sub_82807AB8` = CRC-32 — all around 0x82807-0x8280C). Observed behaviour at 3 B instructions: + +1. **Sylpheed boot is CPU-bound on stream decode.** At 10 MIPS interpreter throughput, the per-asset inflate + Adler/CRC passes eat multi-seconds of wall time each. Second `VdSwap` fires at ~28 M cycles (~3 s wall). For first-pixels to be visually obvious (dozens of frames), we likely need the Tier-4 JIT or at least threaded-code dispatch. Order of magnitude: real HW boots Sylpheed to menu in ~2–3 s at ~200 MIPS; we're ~20× slower. +2. **wgpu→ShadowEdram RT readback** (P1 from prior memory) — frame-2+ blocker once draws fire. See [edram-resolve-gap memory](project_xenia_rs_edram_resolve_gap.md). +3. **Keep verifying with `exec --halt-on-deadlock -n 500_000_000`** — still clean post-fix. Any regression here means a new sync bug. + +## Investigation tools available + +- **`dump_thread_diagnostic`** (from 2026-04-23b) — prints per-thread state + handle/CS waiter maps at normal `-n N` exit. Now also dumps r0–r13 for every thread (expanded 2026-04-24). +- **`disasm --at -n N`** — unchanged. +- **DuckDB xrefs** — see [project_xenia_rs_duckdb.md](project_xenia_rs_duckdb.md). +- **PC → LR_HALT_SENTINEL tracer pattern** — reference impl in `2026-04-24` diff on [main.rs](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-app/src/main.rs); was instrumental for this fix. Reverted after use. +- **Adler/CRC entry probes** — one-shot `tracing::warn!(target: "adler_probe", ...)` at the `pc == 0x828085E0 && tid == 5` site. Logs lr/r3/r4/r5 at entry. Reverted after use. + +## Confirmed NOT the issue (verified this session) + +- `VdCallGraphicsNotificationRoutines` stub — canary matches, Sylpheed doesn't register notifications. +- `NtSetEvent` / `KeSetEvent` return-value semantics — match canary. +- Graphics-interrupt injection per-vsync — fires correctly, delivered counter scales with `VSYNC_INSTR_PERIOD = 150_000`. +- Ring-buffer write-back — correct. +- PKEVENT shadow refresh — correct. +- Event/semaphore handle table — correct; the pre-fix "main stuck on 0x10fc" was a *symptom* of tid=5 dying before producing the signal, not a handle-table bug. + +## Architectural patterns (stable, don't re-derive) + +- **Scheduler + HW slots + ThreadRef** — see [project_xenia_rs_scheduler.md](project_xenia_rs_scheduler.md). +- **UI bridge + GPU pipeline + MMIO + HUD** — see [project_xenia_rs_ui.md](project_xenia_rs_ui.md). +- **PKEVENT shim** — `ensure_dispatcher_object` reads DISPATCHER_HEADER type on first touch. +- **IRQ injection stack discipline** (new 2026-04-24): the injected callback runs on a **176-byte-padded extension** of the interrupted thread's stack. `SavedCallbackCtx` captures/restores r0, r1, r2–r12 + pc/lr/ctr/cr. Non-volatile regs (r13–r31) are not in the save set because the callback prologue handles them. Canary's Processor::Execute uses the same 64+112 pad. +- **Main thread return ≠ emulator halt** — unchanged. + +## Memory-model caveats + +- `pending_timer_fires` is keyed by handle (u32). `NtClose` / `NtCancelTimer` / `NtSetTimerEx` manage lifecycle. (Sylpheed doesn't use timers on the boot path.) +- `waiters_mut()` on `KernelObject` returns `None` for `File` and `Some` for the 5 sync variants. +- Handle allocator starts at `0x1000`, bumps by 4. + +## Files touched in the 2026-04-24 session + +- `xenia-kernel/src/interrupts.rs` — `SavedCallbackCtx` expanded to `gprs: [u64; 13]` (r0–r12), added `CALLBACK_STACK_PAD = 176` constant with docs citing canary as ground-truth. +- `xenia-app/src/main.rs` — `try_inject_graphics_interrupt` now `ctx.gpr[1] -= CALLBACK_STACK_PAD` after capture, before setting callback PC. `dump_thread_diagnostic` expanded to print r0/r3–r13. diff --git a/migration/claude-memory/project_xenia_rs_desktop_app.md b/migration/claude-memory/project_xenia_rs_desktop_app.md new file mode 100644 index 0000000..3377624 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_desktop_app.md @@ -0,0 +1,10 @@ +--- +name: xenia-rs desktop app design +description: UI/UX design decisions for the xenia-rs desktop app — feature groupings and view layout +type: project +originSessionId: c12e8acc-6326-4933-9263-32745c4b1219 +--- +The disassembler, debugger, analyzer, and executor share a single unified view in the desktop app. Their purposes and underlying data are related/overlapping, so they are not treated as separate panels or tabs. + +**Why:** The user explicitly stated this during the overview drafting phase. +**How to apply:** When writing UI copy, feature descriptions, or design docs, describe these four as one combined "Analysis Workspace" rather than four separate features. diff --git a/migration/claude-memory/project_xenia_rs_disasm_unify_phase1.md b/migration/claude-memory/project_xenia_rs_disasm_unify_phase1.md new file mode 100644 index 0000000..cdae555 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_disasm_unify_phase1.md @@ -0,0 +1,79 @@ +--- +name: Disassembler unification — Phase 1 complete (2026-04-27) +description: Single source of truth for PPC text formatting now lives in xenia-cpu/disasm.rs. xenia-analysis/ppc.rs is a thin shim. DecodedInstr stays at 8 bytes. VMX128 bug fixed. +type: project +originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec +--- +**Phase 1 of disassembler unification is COMPLETE** (2026-04-27, single session). + +## What's in place + +- **[crates/xenia-cpu/src/disasm.rs](crates/xenia-cpu/src/disasm.rs)** is the single source of truth for PPC text formatting (~1100 LOC, was 313). Hosts: + - `pub struct DisasmText { mnemonic, operands, disasm, ext_mnemonic, ext_operands, ext_disasm, branch_target }` — all `String` / `Option` / `Option`. Owns its strings. + - `pub fn format(&DecodedInstr) -> DisasmText` — the canonical formatter. Dispatches via match on `PpcOpcode` to ~70 per-class helpers. + - `pub fn disassemble(&DecodedInstr) -> String` — back-compat: returns `format(...).display().to_string()`. + - `pub fn disassemble_block(...)` — back-compat range walker. + - 8 unit tests covering nop/li/mr/blr/branch-target/rlwinm-dot. + +- **[crates/xenia-cpu/src/lib.rs](crates/xenia-cpu/src/lib.rs)** re-exports `DisasmText`, `disassemble`, `format as disasm_format`. + +- **[crates/xenia-analysis/src/ppc.rs](crates/xenia-analysis/src/ppc.rs)** collapsed from 1374 LOC → ~30 LOC. Pure shim: + ```rust + pub struct Decoded { pub base: String, pub ext: Option } + pub fn disasm(instr: u32, addr: u32) -> Decoded { ... } + ``` + Delegates to `xenia_cpu::decoder::decode` + `xenia_cpu::disasm::format`. + +- **[crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs)** call sites use `xenia_cpu::disasm::format` directly. The `split_disasm` helper at the bottom is **deleted** — `DisasmText` exposes mnemonic/operands separately. + +- **[crates/xenia-analysis/src/formatter.rs](crates/xenia-analysis/src/formatter.rs)** unchanged — uses the `crate::ppc::disasm` shim's `display()` method. + +- **[crates/xenia-analysis/Cargo.toml](crates/xenia-analysis/Cargo.toml)** now depends on xenia-cpu. + +## Constraint #1 honored: DecodedInstr unchanged + +- `DecodedInstr` is still 8 bytes (`opcode: PpcOpcode`, `raw: u32`, `addr: u32`), `Copy`, no allocations. +- Decode cache at [decoder.rs:228](crates/xenia-cpu/src/decoder.rs) still 64K × 20 bytes = 1.3 MiB. +- `DisasmText` is the new struct that owns the formatted strings, allocated only when `format()` is called from a sink. + +## Silent bug fixed: VMX128 register extractors + +The pre-Phase-1 `xenia-analysis/src/ppc.rs` had **wrong bit positions** for `va128`/`vb128`/`vd128` compared to `decoder.rs`. The interpreter (which executes guest code) used decoder.rs's correct extractors, so guest behavior was correct — but `.asm` text output and DuckDB rows could show wrong VMX128 register names. Phase 1 routes all VMX128 formatting through `instr.va128()`/`vb128()`/`vd128()` accessors (decoder.rs canonical). Fixed silently. + +## Other extended forms now in xenia-cpu + +Ported from ppc.rs onto `DecodedInstr` accessors: li/lis/subi/subis, nop, mr/not, slwi/srwi/clrlwi/clrrwi/rotlwi/extlwi/extrwi, clrldi/clrrdi/srdi/sldi/rotldi, insrdi, inslwi/insrwi, cmpwi/cmpdi/cmplwi/cmpldi, cmpw/cmpd/cmplw/cmpld, mflr/mfctr/mfxer, mtlr/mtctr/mtxer, mtcr, mftb/mftbu, blr/blrl/bctr/bctrl, b{cond}{l}{a} (eq/ne/lt/le/gt/ge/so/ns), bd{n}z{cond}, b{cond}lr, b{cond}ctr, sub/subc, crnot/crclr/crset/crmove, lwsync, trap, td{cond}/tw{cond}, td{cond}i/tw{cond}i. + +## Behavior changes visible to users + +1. `xenia disasm` (the simple subcommand) now emits **extended/simplified mnemonics**. Before Phase 1 it only emitted base forms. Smoke test confirmed: `mr`, `subi`, `nop`, `lis`, `li`, `cmpwi`, `beq`, etc. all appear correctly. +2. VMX128 register names printed in `.asm` and DB are now correct (silent bug fix). +3. MD-form rotate `sh` value display matches decoder.rs's bit layout (was different in old ppc.rs — likely also a silent bug, since the decoder is what runs on guest code). + +## Verification done + +- `cargo build --workspace` clean (no new warnings; pre-existing warnings in block_cache.rs and vmx.rs unchanged). +- `cargo test -p xenia-cpu` — 166 + 8 new disasm + 9 audit = all pass. +- `cargo test -p xenia-analysis` — audit pass. +- Smoke `xenia disasm -n 30` — produces clean extended-mnemonic output. +- Full `xenia dis --db` end-to-end deferred to next session (release build was slow; functional path verified by passing cargo tests). + +## LOC delta (Phase 1) + +- xenia-cpu/src/disasm.rs: +780 (313 → ~1093) +- xenia-analysis/src/ppc.rs: −1344 (1374 → 30) +- xenia-analysis/src/db.rs: −18 (deleted split_disasm + simplified call sites) +- xenia-cpu/src/lib.rs: +1 (re-export) +- xenia-analysis/Cargo.toml: +1 dep +- **Net: −580 LOC** plus single-source-of-truth. + +## What's next (Phases 2-4) + +Per [/home/fabi/.claude/plans/ok-execute-your-proposed-refactored-dolphin.md](plan): +- **Phase 2**: Iterator + sinks (`iter_disasm` in xenia-cpu, RichDisasmItem enrichment + 3 sinks: text, JSON, DuckDB in xenia-analysis). +- **Phase 3**: Split db.rs into ingest + analyze; add SQL views layer (`v_branch_xrefs`, `v_call_graph`, `v_reachability_from_entry`, etc.) and `--analyze=rust|sql|both` flag. Keep Rust passes as default. +- **Phase 4**: Replace println-only audits with assert-based fixture goldens (`base_mnemonics.json`, `extended_mnemonics.json`, `vmx128_registers.json`, `db_schema_golden.rs`, ISO-gated `disasm_first_1000.json`). + +**Format style locked**: Phase 1 reproduces ppc.rs's padded comma-space style (`"addi r3, r4, 16"`). Phase 4 fixtures should lock this. + +**LOC budget remaining**: Phase 2 ~+250/-250 (net 0), Phase 3 ~+280, Phase 4 ~+395 (mostly tests/fixtures). diff --git a/migration/claude-memory/project_xenia_rs_disasm_unify_phase2.md b/migration/claude-memory/project_xenia_rs_disasm_unify_phase2.md new file mode 100644 index 0000000..a9475ca --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_disasm_unify_phase2.md @@ -0,0 +1,97 @@ +--- +name: Disassembler unification — Phase 2 complete (2026-04-27) +description: Iterator + 3 sinks (text/JSON/DuckDB) layered over Phase 1's format(). New `xenia dis --json` subcommand. db.rs and formatter.rs both drive through enrich_section. +type: project +originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec +--- +**Phase 2 of disassembler unification is COMPLETE** (2026-04-27, same session as Phase 1). + +## What's in place + +### xenia-cpu (decoder + iterator) +- **[crates/xenia-cpu/src/disasm.rs](crates/xenia-cpu/src/disasm.rs)** adds: + - `pub struct DisasmItem { addr, raw, opcode, text }` — yielded by the iterator. + - `pub fn iter_disasm(image, image_base, va_start, va_end) -> impl Iterator` — walks bytes in PPC big-endian, decodes via `decoder::decode`, formats via `format`, yields one `DisasmItem` per 4-byte word. Stops on truncated tail. + - 2 new unit tests: `iter_disasm_walks_byte_slice_in_order`, `iter_disasm_stops_on_truncated_tail`. +- **[crates/xenia-cpu/src/lib.rs](crates/xenia-cpu/src/lib.rs)** re-exports `DisasmItem`, `iter_disasm`. + +### xenia-analysis (enrichment + sinks) +- **[crates/xenia-analysis/src/disasm.rs](crates/xenia-analysis/src/disasm.rs)** (NEW, ~50 LOC): + - `pub struct RichDisasmItem<'a> { item, section, function, label }` — adds analysis context. + - `pub fn enrich_section(image, image_base, section_name, va_start, va_end, func_analysis, labels) -> impl Iterator>` — wraps `iter_disasm` with rolling-window function tracking + label lookup. +- **[crates/xenia-analysis/src/sinks/mod.rs](crates/xenia-analysis/src/sinks/mod.rs)** (NEW): module declarations. +- **[crates/xenia-analysis/src/sinks/duckdb.rs](crates/xenia-analysis/src/sinks/duckdb.rs)** (NEW, ~30 LOC): `append_instructions(appender, items) -> Result` — DuckDB Appender call per row. +- **[crates/xenia-analysis/src/sinks/json.rs](crates/xenia-analysis/src/sinks/json.rs)** (NEW, ~60 LOC): `write_jsonl(out, items) -> io::Result` — one JSON object per line. Internal `JsonRow<'a>` derives Serialize; uses `#[serde(skip_serializing_if = "Option::is_none")]` to keep rows compact. +- **[crates/xenia-analysis/src/sinks/text.rs](crates/xenia-analysis/src/sinks/text.rs)** (NEW, ~50 LOC): `write_instr_line(out, item, labels, sections, image_base, data_annotation)` — renders one .asm line with branch-target / data-ref annotation. Uses the structured `branch_target` field (not a regex over the disasm string — cleaner than the old `annotate_branch`). +- **[crates/xenia-analysis/src/lib.rs](crates/xenia-analysis/src/lib.rs)** declares `disasm` and `sinks` modules; re-exports `RichDisasmItem` and `enrich_section`. +- **[crates/xenia-analysis/Cargo.toml](crates/xenia-analysis/Cargo.toml)** adds `serde_json = { workspace = true }` dep. + +### Refactored call sites +- **[crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs)** `insert_instructions_streaming` collapsed from a 50-line byte loop into 12 lines: `for section { let items = enrich_section(...); total += sinks::duckdb::append_instructions(&mut appender, items)?; }`. +- **[crates/xenia-analysis/src/formatter.rs](crates/xenia-analysis/src/formatter.rs)** code-section loop collapsed: now iterates `enrich_section` and calls `write_instr_line` for the per-line render. Orchestration (function headers, labels, xref comments, import annotations) stays in formatter.rs. The old `annotate_branch` helper is **deleted** — branch-target annotation lives in the text sink and uses `branch_target: Option` from `DisasmText`. + +### CLI +- **[crates/xenia-app/src/main.rs](crates/xenia-app/src/main.rs)**: new `--json ` flag on `dis` subcommand. Writes JSON Lines via `sinks::json::write_jsonl` per code section. Wires through `cmd_dis` signature. + +## Architecture + +``` + ┌──────────────┐ + │ image bytes │ + │ + image_base│ + └──────┬───────┘ + │ + ▼ + xenia-cpu::iter_disasm(image, base, range) + │ yields DisasmItem + ▼ +xenia-analysis::enrich_section(...).map(|i| RichDisasmItem { i, section, function, label }) + │ yields RichDisasmItem + ▼ + ┌────────────────┼────────────────┐ + ▼ ▼ ▼ +sinks::duckdb sinks::json sinks::text +append_instructions write_jsonl write_instr_line + │ │ │ + ▼ ▼ ▼ + instructions .jsonl .asm + table (DuckDB) (one row/line) (formatted) +``` + +`DecodedInstr` (8 bytes, in decode cache) is unchanged. `DisasmItem` and `RichDisasmItem` only exist at the sink layer. + +## Constraint #1 honored: DecodedInstr unchanged + +Same as Phase 1 — `DecodedInstr` is still the 8-byte cache-resident struct; `DisasmItem` is allocated only in the iterator/sink layer. + +## Verification + +- `cargo build --workspace` clean (one previously-existing analysis warning was fixed during refactor). +- `cargo test -p xenia-cpu` — all 168 tests + 10 disasm tests pass (2 new for `iter_disasm`). +- `cargo test -p xenia-analysis` — all 9 audit tests pass. +- `xenia disasm -n 8` smoke test: same extended-mnemonic output as Phase 1. +- `xenia dis --db --json --quiet ` end-to-end smoke test: PENDING (running at write time). + +## LOC delta (Phase 2) + +- xenia-cpu/src/disasm.rs: +60 (DisasmItem + iter_disasm + 2 tests) +- xenia-cpu/src/lib.rs: +1 +- xenia-analysis/src/disasm.rs: +50 (new file) +- xenia-analysis/src/sinks/{mod,duckdb,json,text}.rs: +160 (new files) +- xenia-analysis/src/db.rs: −38 (collapsed loop) +- xenia-analysis/src/formatter.rs: −15 (annotate_branch deleted, inner loop replaced) +- xenia-analysis/Cargo.toml: +1 (serde_json dep) +- xenia-app/src/main.rs: +20 (--json flag + sink call) +- **Net: ~+240 LOC** (in line with the plan's "+250 / −250 net 0" estimate, modulo the new JSON sink which had no prior counterpart). + +## Behavior changes visible to users + +1. **New `xenia dis --json ` flag** — emits one structured JSON object per instruction. Schema: `addr, raw, mnemonic, operands, disasm, ext_mnemonic?, ext_operands?, ext_disasm?, branch_target?, section, function?, label?`. +2. Branch-target annotation in the .asm text output is now driven by the structured `branch_target` field (was a regex find of "0x" in the disasm string). Functionally equivalent for direct branches; immune to false-positive matches in non-branch operands containing hex. +3. Three sinks share one decode/format pass per instruction, but db+json+asm output runs decode 3 times (once per sink). Phase 7 / future work could fan out from a single iterator if needed. + +## What's next (Phases 3-4) + +Per [/home/fabi/.claude/plans/ok-execute-your-proposed-refactored-dolphin.md](plan): +- **Phase 3**: Split `db.rs` into `ingest_instructions` + `write_analysis_results`; add `target_hex BIGINT` column on `instructions`; add `crates/xenia-analysis/src/sql_views.rs` with `v_branch_xrefs`/`v_call_graph`/`v_reachability_from_entry`/`v_function_first_instruction`/`v_imports_called`; add `--analyze=rust|sql|both` flag (default `rust`). Rust passes (`func.rs`, `xref.rs`) stay default. +- **Phase 4**: Replace println-only audits with assert-based JSON-fixture goldens. Expand coverage to base + extended + VMX128 (silent-bug area) + DB schema + ISO-gated end-to-end. diff --git a/migration/claude-memory/project_xenia_rs_disasm_unify_phase3.md b/migration/claude-memory/project_xenia_rs_disasm_unify_phase3.md new file mode 100644 index 0000000..ea88601 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_disasm_unify_phase3.md @@ -0,0 +1,78 @@ +--- +name: Disassembler unification — Phase 3 complete (2026-04-27) +description: db.rs split into ingest_instructions + write_analysis_results; new sql_views.rs with 5 views; --analyze=rust|sql|both CLI flag; target_hex column on instructions; Rust/SQL cross-check warning in `both` mode. +type: project +originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec +--- +**Phase 3 of disassembler unification is COMPLETE** (2026-04-27, same session). + +## What's in place + +### Schema +- **`instructions` table** gains a `target_hex BIGINT NULL` column populated from `DisasmText.branch_target`. Indexed via `idx_instructions_target_hex`. Documented in [crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs) module docstring. +- **DuckDB sink** ([crates/xenia-analysis/src/sinks/duckdb.rs](crates/xenia-analysis/src/sinks/duckdb.rs)) writes the new column. + +### Split DbWriter API ([crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs)) +- `pub fn ingest_instructions(pe, info, func_analysis, labels)` — creates `instructions` table + indices and streams rows via the iterator + duckdb sink. **No analysis tables.** +- `pub fn write_analysis_results(pe, info, func_analysis, labels, xrefs)` — creates `functions`, `labels`, `xrefs` tables + indices. Populated from Rust pass output. +- `pub fn write_disasm(...)` — back-compat wrapper that calls both. Existing callers (e.g. `cmd_exec`) keep working unchanged. +- `pub fn create_sql_views(&mut self)` — runs the SQL view definitions from `crate::sql_views::ALL_VIEWS`. +- `pub fn cross_check_branch_xrefs(&self) -> Result<(u64, u64)>` — returns `(sql_only, rust_only)` row counts for symmetric difference between `v_branch_xrefs` and `xrefs WHERE kind IN ('call','jump','branch')`. + +### SQL views ([crates/xenia-analysis/src/sql_views.rs](crates/xenia-analysis/src/sql_views.rs)) — 5 views +- `v_branch_xrefs` — derived from `instructions.target_hex` self-join. CASE on mnemonic mirrors `xref.rs` kind logic: `bl`/`bla` → call, `b`/`ba` → jump, `bc*` → branch. +- `v_call_graph` — `xrefs ⨝ functions` filtered to `kind = 'call'`. Surfaces caller/callee names. +- `v_reachability_from_entry` — recursive CTE seeded from `labels.name = 'entry_point'`, transitive over `xrefs.kind IN ('call','jump','branch')`. `UNION` (not `UNION ALL`) handles call-graph cycles. +- `v_function_first_instruction` — `functions ⨝ instructions ON address`. Convenience for inspecting prologues. +- `v_imports_called` — `xrefs ⨝ labels` filtered to `xrefs.kind = 'call' AND labels.kind = 'import'`. Per-function import call summary. + +All views are `CREATE OR REPLACE` — re-running is idempotent. + +### CLI ([crates/xenia-app/src/main.rs](crates/xenia-app/src/main.rs)) +- New `AnalyzeMode` enum (`Rust` / `Sql` / `Both`) derived `ValueEnum`. +- `Dis { ..., analyze: AnalyzeMode }` field with `default_value_t = AnalyzeMode::Rust`. +- `cmd_dis` routes through: + - Always: `write_base` → `ingest_instructions` → `write_analysis_results` (Rust passes always run, honoring constraint #3). + - `Sql` or `Both`: also `create_sql_views`. + - `Both`: also `cross_check_branch_xrefs` and log on disagreement (info if both zero, warn otherwise). + +## Constraint #3 honored: Rust analysis stays default and functional + +- Default flag value is `rust`. +- Rust passes (`func.rs` + `xref.rs`) ALWAYS run when `--db` is set. The `analyze` flag only controls whether SQL views are *additionally* created. +- The `xrefs` table is always populated by Rust passes. `v_branch_xrefs` is an alternative read surface, not a replacement. +- Data-ref pass (xref.rs lis+addi/ori register tracking) and function detection (func.rs prologue patterns) remain Rust-only — they are not cleanly relational. + +## Verification + +- `cargo build --workspace`: clean. +- `cargo test -p xenia-cpu` / `-p xenia-analysis`: all green (10 disasm tests + 9 audit + 168 cpu). +- `xenia dis --analyze=both --db ` smoke verified end-to-end: 1.87M instructions written, 299,615 with `target_hex` (16% — direct branches), all 5 views queryable, cross-check returns `(0, 0)` — Rust and SQL agree on every (source, target, kind) tuple. +- Sample reachability: 7,557 of 12,156 functions reachable from entry_point (62%) — sensible for a game with significant dead/unused code. + +### Bugs found and fixed during verification + +1. **Kind-tag mismatch.** `XrefKind::tag()` ([xref.rs:21-29](crates/xenia-analysis/src/xref.rs)) returns the SHORT tags `"call"` / `"j"` / `"br"` (and `"read"` / `"write"` / `"ref"`). The first version of `v_branch_xrefs` and `cross_check_branch_xrefs` used the LONG names (`'call'` / `'jump'` / `'branch'`) — which the comment in [db.rs](crates/xenia-analysis/src/db.rs) describes for the *trace* table, not `xrefs`. Cross-check returned 195K SQL-only rows. Fixed by changing CASE to `'call'` / `'j'` / `'br'`. **Don't trust the docstring at the top of db.rs** — `branch_trace.kind` uses long names but `xrefs.kind` uses short tags. + +2. **Reachability view collapsed to 1 row.** First version seeded with `labels.address` (a single instruction VA) and looked for `xrefs.source = r.addr`. But the entry-point address (`mflr r12`) has no outgoing xref — branches happen at later instructions of the function. Fixed by reformulating as function-level reachability: seed with the function containing the entry_point label, then walk `function → instructions → xrefs → target's enclosing function`. `UNION` handles call-graph cycles. + +## LOC delta (Phase 3) + +- xenia-analysis/src/db.rs: +60 (split write_disasm; new methods) +- xenia-analysis/src/sql_views.rs: +120 (NEW) +- xenia-analysis/src/sinks/duckdb.rs: +1 line (target_hex column write) +- xenia-analysis/src/lib.rs: +1 line (mod sql_views) +- xenia-app/src/main.rs: +35 (AnalyzeMode enum + flag + routing + cross-check log) +- **Net: +217 LOC**. + +## Behavior changes visible to users + +1. **New `--analyze=rust|sql|both` flag on `xenia dis`**, default `rust`. Backward compatible — existing scripts behave the same. +2. **New `target_hex BIGINT` column on `instructions` table**. Existing queries work; new column adds query power for SQL-side branch xref derivation. +3. **5 SQL views** available when `--analyze` is `sql` or `both`. Read-only, idempotent. +4. **Cross-check warning** in `both` mode flags any drift between formatter mnemonic strings and `xref.rs` kind classification. + +## What's next (Phase 4) + +Per [/home/fabi/.claude/plans/ok-execute-your-proposed-refactored-dolphin.md](plan): +- **Phase 4**: Replace println-only audits with assert-based JSON-fixture goldens. Expand coverage to base + extended + VMX128 (silent-bug area) + DB schema golden + ISO-gated end-to-end consistency. diff --git a/migration/claude-memory/project_xenia_rs_disasm_unify_phase4.md b/migration/claude-memory/project_xenia_rs_disasm_unify_phase4.md new file mode 100644 index 0000000..8cdb882 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_disasm_unify_phase4.md @@ -0,0 +1,97 @@ +--- +name: Disassembler unification — Phase 4 complete (2026-04-27) +description: Assert-based fixture goldens replace println-only audits. Three JSON snapshots locked, VMX128 silent-bug area covered by direct accessor unit tests + integration fixture, DB schema golden enforces 7-table column layout + 5 SQL views. +type: project +originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec +--- +**Phase 4 of disassembler unification is COMPLETE** (2026-04-27, same session). + +## What's in place + +### Fixture-based goldens for the disassembler +- **[crates/xenia-cpu/tests/disasm_goldens.rs](crates/xenia-cpu/tests/disasm_goldens.rs)** (~280 LOC) — three tests, each loads a JSON fixture and asserts every field of `xenia_cpu::disasm::format(decode(raw, addr))` matches. + - `tests/golden/base_mnemonics.json` — 77 cases covering common ALU / load-store / branch / compare / FPU forms. + - `tests/golden/extended_mnemonics.json` — 51 cases covering the simplified-mnemonic priority order (`li`, `lis`, `subi`, `mr`, `not`, `nop`, `slwi`, `srwi`, `clrlwi`, `clrrwi`, `extlwi`, `clrldi`, `srdi`, `rotldi`, `cmpwi`, `cmpdi`, `cmplwi`, `blr`, `blrl`, `bctr`, `bctrl`, `beqlr`, `bnelr`, `beq`, `bne`, `blt`, `bge`, `bgt`, `ble`, `bdnz`, `bdz`, `b` from `bc 20`, `mflr`, `mfctr`, `mfxer`, `mtlr`, `mtctr`, `mtxer`, `crnot`, `crclr`, `crset`, `crmove`, `lwsync`, `trap`, `tdeqi`, `twlti`). + - `tests/golden/vmx128_registers.json` — 16 cases covering standard VMX (5-bit regs) and the silent-bug VMX128 op6 vd128 high-bit area (vd128 = 96, 127 with vrlimi128; lower-bit encodings record what they actually decode to since the op6 secondary key constrains bits 21-23). +- **Regen workflow**: `REGEN_GOLDENS=1 cargo test -p xenia-cpu --test disasm_goldens` overwrites all three fixtures from current `format()` output. First run also auto-creates if the file is missing (with a panic afterwards forcing the developer to inspect+commit). +- **xenia-cpu Cargo.toml** gains `serde` + `serde_json` as `dev-dependencies` only — production lib stays serde-free, honoring constraint #1. + +### VMX128 silent-bug area: direct accessor unit tests +- **[crates/xenia-cpu/src/decoder.rs](crates/xenia-cpu/src/decoder.rs)** has 7 new unit tests in the existing `tests` module that pin the canonical bit positions for `va128`/`vb128`/`vd128`/`vs128`: + - `vmx128_vd128_low_5_bits_only` — 32 iterations covering vd_lo = 0..31 with vd_b21 = vd_b22 = 0. + - `vmx128_vd128_bit21_adds_32` — bit 21 = 1 produces vd128 = 32. + - `vmx128_vd128_bit22_adds_64` — bit 22 = 1 produces vd128 = 64. + - `vmx128_vd128_full_127` — vd_lo = 31 + bit 21 + bit 22 = 127. + - `vmx128_va128_uses_bit29` — va128 = bits 6-10 + bit 29. + - `vmx128_vb128_uses_bits28_and_30` — vb128 = bits 16-20 + bit 28 + bit 30. + - `vmx128_vs128_aliases_vd128` — vs128 ≡ vd128 across {0, 31, 32, 64, 96, 127}. +- These pin decoder.rs as the canonical source. The pre-Phase-1 ppc.rs had different (wrong) positions; this test set guarantees the bug never returns silently. + +### Analysis-side goldens (shim parity) +- **[crates/xenia-analysis/tests/disasm_goldens.rs](crates/xenia-analysis/tests/disasm_goldens.rs)** (~120 LOC) — 4 tests that load the *same* cpu-side fixture JSON files (via `..` relative path) and verify: + - `xenia_analysis::ppc::disasm(raw, addr).base` == `xenia_cpu::disasm::format(...).disasm` + - `.ext` == `format().ext_disasm` + - All structured fields (`mnemonic`/`operands`/`ext_*`/`branch_target`) match the fixture row. + - `display()` returns extended form when present, base otherwise. +- The cpu fixtures are the single source of truth; analysis shim drift surfaces immediately. + +### DB schema golden +- **[crates/xenia-analysis/tests/db_schema_golden.rs](crates/xenia-analysis/tests/db_schema_golden.rs)** (~230 LOC) — one test that builds an in-memory 16-byte PE-shaped fixture (4 instructions: mflr / nop / blr / nop), runs the full `DbWriter` pipeline (`write_base` → `ingest_instructions` → `write_analysis_results` → `create_sql_views`), and asserts: + - Every column name + type for all 7 tables (`metadata`, `sections`, `imports`, `instructions`, `functions`, `labels`, `xrefs`) via `PRAGMA table_info`. + - Row counts (4 instructions, 0 with target_hex since the fixture is indirect-only). + - All 5 SQL views (`v_branch_xrefs`, `v_call_graph`, `v_function_first_instruction`, `v_imports_called`, `v_reachability_from_entry`) exist after `create_sql_views`. +- Schema drift caught immediately. +- **Caveat noted in comments**: SQL `LIKE 'v_%'` matches DuckDB's built-in `views` system view because `_` is a single-char wildcard. The test enumerates view names explicitly. + +### Deletions +- **xenia-cpu/tests/disasm_audit.rs** (161 LOC) — println-only, no assertions. Migrated to the assert-based goldens above. +- **xenia-analysis/tests/disasm_audit.rs** (164 LOC) — same. + +## Verification + +`cargo test --workspace`: 29 test groups, 0 failures. All previously-passing tests still pass (176 cpu interpreter + 13 disasm + 7 VMX128 accessors + 4 analysis goldens + 1 schema golden + everything else). + +``` +$ cargo test --workspace 2>&1 | grep -c "test result: ok" +29 +$ cargo test --workspace 2>&1 | grep -c "FAILED\|failed" +0 +``` + +## Discoveries / fixture-author surprises + +1. **VMX128 op6 vrlimi128 vd128 < 96 is not a valid encoding.** The secondary key uses bits 21-23 = 111, so the high two bits of vd128 (which share bits 21+22) MUST be 11 for the dispatch to land on vrlimi128. Lower-bit attempts decode as vsrw128 / vpermwi128 instead. The fixture records this exact behavior — labeled honestly so future readers don't think these cases test what they don't. + +2. **Sylpheed's real corpus only contains vrlimi128 with vd128 ∈ 96..=127** (consistent with the constraint above). The decoder has been emitting these correctly since Phase 1's silent-bug fix; the goldens now lock that behavior. + +3. **`PRAGMA table_info` doesn't accept bind parameters** in DuckDB the way `WHERE` does — it uses the statement-level interpolation route. Inlined the table name into the query string with simple format!. + +4. **DuckDB has a built-in `views` system view** that matches SQL `LIKE 'v_%'` (because `_` is a single-char wildcard, `views` = `v` + 'i' + 'ews' fits). Always enumerate view names explicitly, or use `LIKE 'v\_%' ESCAPE '\'`. + +## LOC delta (Phase 4) + +- xenia-cpu/tests/disasm_goldens.rs: +388 (new) +- xenia-cpu/tests/golden/*.json: +28k bytes (~700 lines committed JSON across 3 files) +- xenia-cpu/src/decoder.rs: +95 (7 new VMX128 accessor unit tests) +- xenia-cpu/Cargo.toml: +4 (dev-dependencies serde+serde_json) +- xenia-cpu/tests/disasm_audit.rs: −161 (deleted) +- xenia-analysis/tests/disasm_goldens.rs: +120 (new) +- xenia-analysis/tests/db_schema_golden.rs: +245 (new) +- xenia-analysis/tests/disasm_audit.rs: −164 (deleted) +- **Net: +527 LOC test code + ~700 lines JSON fixtures, −325 LOC of useless println audits.** + +## Tooling for future authors + +- **Adding new test cases**: edit `cases: &[(u32, u32, &str)]` array inline in `tests/disasm_goldens.rs`, run `REGEN_GOLDENS=1 cargo test -p xenia-cpu --test disasm_goldens`, inspect the diff in the JSON fixture, commit. +- **Detecting drift**: any change to `format()` output that affects existing cases will fail the assertion test, naming the row label and showing the diff. Either the change is intentional (regen) or it's a regression (fix code). +- **Schema changes**: `db_schema_golden.rs` will fail if you add/remove/rename a column or change a type. Update the `expected` slice in the test. + +## End-of-phase status + +All four phases of the disassembler unification are now complete: +- **Phase 1**: single-source-of-truth `format()` in xenia-cpu; analysis ppc.rs collapsed to a 30-line shim; VMX128 silent bug fixed. +- **Phase 2**: iterator + 3 sinks (text/JSON/DuckDB) layer; `--json` CLI flag. +- **Phase 3**: db.rs split into ingest/analyze; 5 additive SQL views; `--analyze=rust|sql|both` flag with cross-check warning. +- **Phase 4**: assert-based fixture goldens + VMX128 accessor unit tests + DB schema golden replacing the println-only audits. + +The `DecodedInstr` struct stays at 8 bytes throughout; the decode cache stays at 1.3 MiB; Rust analysis (`func.rs`, `xref.rs`) remains the default and is unchanged. All three user constraints honored end-to-end. diff --git a/migration/claude-memory/project_xenia_rs_duckdb.md b/migration/claude-memory/project_xenia_rs_duckdb.md new file mode 100644 index 0000000..9435b8e --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_duckdb.md @@ -0,0 +1,12 @@ +--- +name: xenia-rs analysis DB is DuckDB, not SQLite +description: Reminder that xenia-analysis switched from rusqlite to duckdb — the `.db` extension is misleading +type: project +originSessionId: f35a2810-e5b7-46ac-a4d9-ea87304be179 +--- +**Why:** Historical — files named like `sylpheed.db` still use the legacy extension, but the file format is DuckDB (verified via `file sylpheed.db → "DuckDB database file, version 64"`). `xenia-analysis/Cargo.toml` depends on `duckdb = { workspace = true }`; there is no `rusqlite`. The CLI memory's mention of "SQLite DB" is stale. + +**How to apply:** +- CLI `sqlite3 path.db` will not open it; use `python3 -c "import duckdb; con = duckdb.connect('path.db', read_only=True); ..."` or install the `duckdb` CLI. +- Schema matches what the CLI memory describes (functions/imports/instructions/labels/metadata/sections/xrefs), just with DuckDB's SQL dialect. `SHOW TABLES` works; `SELECT name FROM sqlite_master` also works for compat. +- When querying, prefer Python with `read_only=True` so you don't step on concurrent writers. diff --git a/migration/claude-memory/project_xenia_rs_edram_resolve_gap.md b/migration/claude-memory/project_xenia_rs_edram_resolve_gap.md new file mode 100644 index 0000000..24366ba --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_edram_resolve_gap.md @@ -0,0 +1,61 @@ +--- +name: EDRAM→memory resolve byte copy — status and remaining gaps +description: What now ships on TILE_FLUSH, which paths still fall through to skip/warn, and the Canary anchors for future expansion +type: project +originSessionId: c0486ac0-d44e-49fc-8a8d-6c28cb11ab9d +--- +## Status (post 2026-04-22 landing) + +`handle_event_initiator` at [gpu_system.rs:376+](../../../../xenia-rs/crates/xenia-gpu/src/gpu_system.rs) +now **writes bytes into guest memory** on TILE_FLUSH. End-to-end flow: + +1. `ResolveInfo::from_register_file` ([draw_state.rs](../../../../xenia-rs/crates/xenia-gpu/src/draw_state.rs)) decodes `RB_COPY_CONTROL / RB_COPY_DEST_*` + `RB_SURFACE_INFO / RB_COLOR_INFO_* / RB_DEPTH_INFO / RB_COLOR_CLEAR / RB_COLOR_CLEAR_LO / RB_DEPTH_CLEAR` into a full Canary-parity struct: rectangle (scissor ∩ dest_pitch, 8-pixel aligned), source base tiles, surface_pitch_tiles (via `GetSurfacePitchTiles`), MSAA, 64bpp flag, clear values, dest_base **masked to `0x1FFF_FFFF`**, Endian128, format, array flag. +2. `ShadowEdram` ([edram.rs](../../../../xenia-rs/crates/xenia-gpu/src/edram.rs)) — 10 MiB (2048 × 80 × 16 samples × 4 B) CPU-side EDRAM that holds per-tile bytes. Clear-resolves paint `RB_COLOR_CLEAR` into the source tiles via `fill_rect_32bpp`; the copy loop reads out via `read_sample_32bpp`. +3. `resolve::copy_to_memory` ([resolve.rs](../../../../xenia-rs/crates/xenia-gpu/src/resolve.rs)) — per-pixel loop. For `k_8_8_8_8` source + dest (bitwise-equivalent fast path) it applies `apply_endian_128` and calls `mem.write_u32(tiled_2d_offset(x, y, pitch_aligned_to_32, bpp_log2=2))` — page versions bump so `texture_cache_host.rs` re-uploads on next bind. +4. `stats.resolves_copied_total` + `resolves_skipped_total` + `resolve_samples_written` flow to the HUD row 2. + +## What's supported (expanded coverage) + +- **Color sources** (any of these → any compatible color dest): + - `k_8_8_8_8` (0), `k_8_8_8_8_GAMMA` (1) → `k_8_8_8_8` (6), `k_8_8_8_8_A` (14), `k_8_8_8_8_AS_16_16_16_16` (50). + - `k_2_10_10_10` (2), `k_2_10_10_10_AS_10_10_10_10` (10) → `k_2_10_10_10` (7), `k_2_10_10_10_AS_16_16_16_16` (54). + - `k_16_16_FLOAT` (6) → `k_16_16_FLOAT` (31). + - `k_32_FLOAT` (14) → `k_32_FLOAT` (36). + - Gated by `is_32bpp_bitwise_equivalent` ([resolve.rs](../../../../xenia-rs/crates/xenia-gpu/src/resolve.rs)) mirroring Canary `IsColorResolveFormatBitwiseEquivalent` (xenos.h:614). +- **Depth sources**: `kD24S8` (0) → `k_24_8` (22); `kD24FS8` (1) → `k_24_8_FLOAT` (23). Reads depth tiles at `RB_DEPTH_INFO.depth_base`. +- **Rectangle derivation**: vertex-fetch-constant-0 when present (6-dword vertex buffer with endian-decoded floats, Fixed16p8 rounding, 3-vertex bounding box per Canary `draw_util.cc:950-1028`). Falls back to scissor ∩ `(0, 0, dest_pitch, dest_height)` when VF0 isn't a valid resolve vertex buffer. All outputs 8-pixel-aligned via `RESOLVE_ALIGNMENT_PIXELS = 8`. +- **`CopySampleSelect` sanitation** (`xenos.h:1039-1052`): MSAA + depth remap invalid selectors. Single-sample picks (`k0/k1/k2/k3`) honored; averaging modes (`k01/k23/k0123`) pick sample 0 + log `warn` (full averaging TODO). +- Endian: `kNone`, `k8in16`, `k8in32`, `k16in32` all correct. `k8in64`/`k8in128` approximated as `k8in32` + `tracing::warn`. +- Clear-resolve + copy-resolve paths both work. +- Destination address masked to Xenon 29-bit physical space. + +## What logs + skips (graceful) + +All of the below return `resolves_skipped_total += 1` with a `tracing::warn` identifying the reason — boot continues: + +- 64bpp source (`k_16_16_16_16`, `k_16_16_16_16_FLOAT`, `k_32_32_FLOAT`). +- 3D/stacked destination (`copy_dest_array = 1`) — Canary `Tiled3D` not ported. +- Non-zero `dest_exp_bias` on linear formats. +- Non-bitwise-equivalent source/dest pair (e.g. `k_16_16` → `k_16_16`, which would need conversion tables). + +## Deferred (next-session backlog, ordered by ROI) + +**Small, bounded — take these first:** +1. **MSAA sample averaging** (`CopySampleSelect::k01/k23/k0123`). Today falls back to sample 0 + `warn`. Fix: read N samples, average by format-aware rule (unorm8 averaged as int, float averaged as float). Needs per-format decoder. +2. **64bpp source** (`k_16_16_16_16`, `k_16_16_16_16_FLOAT`, `k_32_32_FLOAT`). Skipped + logged. Needs double-tile EDRAM stride (`pitch_tiles << is_64bpp`) and two `write_u32` per pixel. Straightforward refactor of `resolve::copy_to_memory`. +3. **`RB_COLOR_CLEAR_LO` for 64bpp clear paint**. Already captured in `ResolveInfo` but `fill_rect_32bpp` only writes one lane. Companion to #2. +4. **Endian `k8in64` / `k8in128`** (properly). Approximated as `k8in32` today. Buffer pixels in pairs/quads before tile-write. Rare in practice. +5. **`copy_dest_exp_bias != 0`**. Skipped + logged. Needs float-format awareness; bake the scale factor into the sample converter. + +**Large lifts — their own sessions:** +6. **wgpu render-target readback into `ShadowEdram`**. The clear-then-resolve path works, but once Sylpheed *draws* (currently `first Xenos draw: 0`), drawn pixels never reach EDRAM because the draw pipeline writes to wgpu attachments, not the shadow. Needs async `copy_texture_to_buffer` + CPU retile. Probably what unblocks frame-2 and beyond. +7. **3D / array destinations** (`copy_dest_array = 1`). Needs Canary's `Tiled3D` + `GetTiledOffset3D` ported. Rare on first-pixels path. +8. **Non-bitwise-equivalent conversion** — e.g. `k_16_16` RT (signed, range [-32, 32]) → `k_16_16` texture (unsigned). Requires Canary's conversion shader tables (`draw_util.cc:1320-1391` shader selection). + +## Canary anchors (for future expansion) + +- [draw_util.cc:926-1318](../../../../xenia-canary/src/xenia/gpu/draw_util.cc) — full `GetResolveInfo` including vertex-fetch rect. +- [draw_util.cc:1320-1391](../../../../xenia-canary/src/xenia/gpu/draw_util.cc) — shader selection (fast vs full paths). +- [render_target_cache.cc:1045](../../../../xenia-canary/src/xenia/gpu/render_target_cache.cc) — `GetResolveCopyRectanglesToDump` for host-RT dump. +- [texture_address.h:190-260](../../../../xenia-canary/src/xenia/gpu/texture_address.h) — `Tiled3D` (for copy_dest_array). +- [xenos.h:1039-1047](../../../../xenia-canary/src/xenia/gpu/xenos.h) — `SanitizeCopySampleSelect` for the MSAA sample-select rules. diff --git a/migration/claude-memory/project_xenia_rs_fix_session_2026_05_03.md b/migration/claude-memory/project_xenia_rs_fix_session_2026_05_03.md new file mode 100644 index 0000000..db2e5ae --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_fix_session_2026_05_03.md @@ -0,0 +1,116 @@ +--- +name: xenia-rs audit-2026-05 fix-session outcome (2026-05-03) +description: Single-session sprint applied 11 commits closing 12 audit IDs across 4 of the 8 planned phases. Renderer plateau partially unblocked but draws=0 persists; parked-waiter handles unresolved. Next session should target the producer side of those handles directly. +type: project +originSessionId: d3aef0c2-0968-4c35-a482-197640977756 +--- +## Headline outcome + +Single-session fix sprint executed against the audit-2026-05 fix queue. +Plan: `/home/fabi/.claude/plans/we-just-finished-a-shiny-conway.md`. + +**12 P0/P1 IDs closed, 11 commits, 9 merge commits on master**: + + Phase A: SWAPBUG-001 / PPCBUG-001 (P0 — addi 32-bit truncation revert) + Phase B: ORACBUG-004 (P0 — sylpheed_n50m stable-digest oracle) + Phase C: KRNBUG-Vd-04, GPUBUG-001, XMODBUG-013 (3× P0 — VdSwap PM4 ring path) + Phase D1: GPUBUG-101 (P0 — c-vs-temp src selector at w0[29:31]) + Phase D2: GPUBUG-100 (P0 — operand swizzle + negate from w1, abs deferred) + Phase D3: GPUBUG-102 (P0 — vertex-fetch endian byte-swap) + Phase E: GPUBUG-103/104/105 (3× P0 — 8 register addresses + index_size bit) + Phase F1: KRNBUG-017 (P0-under-parallel — Kf*SpinLock real impl) + Phase G1: GPUBUG-006 (P1 — sync_with_mmio Acquire/Release) + Phase G2: XMODBUG-002 (P1 — write_bulk page_version bump) + +## What moved + +- `swaps` at -n 100M lockstep: **1 → 2** (Phase A; the audit's bisected swap regression closed). +- `instructions=100M` deterministic; `imports` 11.4M → 987k (game escaped the corrupted-address retry loop). +- Workspace tests: **551 → 556** (+5 net). +- Lockstep stable-fields determinism preserved across all phases (only `packets` varies, ±5%). + +## What did NOT move + +- **`draws=0` persists at -n 100M lockstep.** The audit's central prediction + ("Phases C+D+E together unlock draws > 0") was not met. Root cause: the renderer + plateau is multi-causal, and within 100M instructions the game stalls on + parked-waiter handles BEFORE reaching draw issue. +- `shader_blobs_live=0` after 100M — the game hasn't issued IM_LOAD; resource-loader + worker threads are still parked (handles 0x1004 / 0x100c / 0x15e4 / 0x42450b5c + per `project_xenia_rs_audit_2026_05_02.md`). + +## Architectural notes & gotchas + +- **VdSwap PM4 path (Phase C)**: empirically discovered that `buffer_ptr` (r3 to + VdSwap) is NOT in the primary ring on xenia-rs (canary's contract says it is). + Real test: + `buffer_ptr=0x4acd4df8`, `ring_base=0x0accb000`, `size=0x1000` (4 KB ring). + Workaround landed: cache ring_base/size on `KernelState` at + VdInitializeRingBuffer time, then write PM4 packets at the actual ring WPTR + inside vd_swap. The original `notify_xe_swap` direct-call is retained as a + safety-net fallback, gated on `swaps_seen` not advancing during the drain. +- **Phase D1 redo**: the audit memo's "bit 7 of src byte = c#-vs-r# selector" + was wrong — bit 7 is the `is_src_temp_value_absolute` flag. Canary's actual + selector lives at word-0 bits 29-31 (`src1_sel`/`src2_sel`/`src3_sel` per + `xenia-canary/src/xenia/gpu/ucode.h:2078-2086`). First D1 commit was + reset+redone with correct decoding. +- **Phase G3 not landed**: the Plan agent claimed canary writes back addic/ + addicx/subficx as `i32 → i64 → u64` (sign-extended). Direct verification at + `xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc:117-136` shows canary uses + **full 64-bit add with sign-extended immediate, no truncation**. Reverting + the 32-bit ABI workaround risks regressions (Sylpheed PPC code may have + polluted upper-32-bit values that the workaround currently masks). Defer + until canary semantics are confirmed against a known-good Sylpheed trace. + +## Engineering decisions worth remembering + +- **`--stable-digest` flag** (xenia-app/main.rs): a new `RunDigest::stable_fields_json()` + view excludes timing-sensitive fields (`packets`, `interrupts_delivered`, + `interrupts_dropped`, `resolves`, `texture_decodes`) and `path` (cwd-dependent). + Required because empirical determinism check showed `packets` varies ±5-8% + between two consecutive lockstep runs at -n 50M (GPU thread race per audit M11). + All future lockstep goldens for -n ≥50M MUST use this flag. +- **n4b canonical-invocation oracle deferred**: per audit memory, + `--parallel --reservations-table` is "pathologically slow >32 min for -n 100M". + At -n 4B that's many hours per run, not the 5-15 min the original plan + estimated. Capture as a one-shot artifact under `audit-runs/post-fix/` + (NOT a test golden) once `draws > 0` lands. +- **`KernelState.ring_base` / `ring_size_dwords` fields**: cached at + VdInitializeRingBuffer time so `vd_swap` can write PM4 packets directly into + ring memory without a channel hop to the GPU worker (which would otherwise + own the ring view in threaded mode). Pattern for similar "kernel needs to + know GPU layout" cases. + +## Next session — recommended starting point + +1. **Trace parked-waiter producers.** Run `xenia-rs check sylpheed.iso -n 4B + --trace-handles` and look for: which kernel-side function is supposed to + signal handle 0x1004 / 0x100c / 0x15e4 / 0x42450b5c? The audit ruled out + XamTaskSchedule (XAMBUG-001) as the cause; the actual producer is in the + guest code-path leading up to where workers park. Walk back from the park + site (per `project_xenia_rs_sylpheed_stage3_2026_04_29.md`). +2. **Phase G2 (VSYNC wall-clock) + n2m oracle re-baseline as a paired commit**. +3. **Phase F2/F3** (XamTaskSchedule callback spawn + overlapped completion + helper) if appetite for new XAM infrastructure. +4. **KRNBUG-Mm cluster** for proper Mm protect / range / per-heap honoring. + Required before re-evaluating the addic/subfic semantics. + +## Build state at sprint close + +- HEAD master: `6f851a2` (will be `` after the close-out merge). +- Tests: 556 passing (up from 551 baseline). +- Workspace clean except `audit-runs/post-fix/` and `sylpheed.db.apr18.bak` + (both untracked, audit artifacts). +- audit-findings.md updated with "Fix session 2026-05-03" close-out section + documenting all closed IDs, deferred IDs (with reasons), and recommended + next session. +- All sprint branches deleted post-merge. + +## Cross-references + +- Builds on: `project_xenia_rs_audit_2026_05_02.md` (the audit this session + consumed). +- Builds on: `project_xenia_rs_addis_signext_root_cause_2026_04_29.md` (the + addis 32-bit truncation pattern; SWAPBUG-001 was an over-extension to addi). +- Builds on: `project_xenia_rs_sylpheed_stage3_2026_04_29.md` (parked-waiter + 4-handle map; still unresolved at this sprint's close). diff --git a/migration/claude-memory/project_xenia_rs_handle_audit.md b/migration/claude-memory/project_xenia_rs_handle_audit.md new file mode 100644 index 0000000..b5280eb --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_handle_audit.md @@ -0,0 +1,63 @@ +--- +name: xenia-rs handle-audit harness + post-2026-04-25 sync state +description: Per-handle signal/wait/wake audit (—trace-handles), and the diagnostic finding that the previously-reported HLE sync gap no longer reproduces at -n 500M +type: project +originSessionId: f83e67b7-97f4-4222-a37f-e1720ab3ace6 +--- +## Audit harness landed + +[`xenia-kernel/src/audit.rs`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/audit.rs) — `HandleAudit` + `HandleAuditTrail` capture create/signal/wait/wake events per kernel handle, bounded ring of 32 entries each. `KernelState::audit` is `enabled=false` by default; flip via `--trace-handles` flag or `XENIA_TRACE_HANDLES=1`. Disabled is a single inline early-return in each record method — zero hot-path cost. + +Hook sites (in [`xenia-kernel/src/exports.rs`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-kernel/src/exports.rs)): +- **Create**: `nt_create_event`, `nt_create_semaphore`, `nt_create_timer` +- **Signal**: `KeSetEvent`, `NtSetEvent`, `KePulseEvent`, `NtPulseEvent`, `KeReleaseSemaphore`, `NtReleaseSemaphore`, `NtSignalAndWaitForSingleObjectEx` (signal half), `signal_io_completion_event` +- **Wait**: `do_wait_single`, `do_wait_multiple` (one record per handle in the wait set) +- **Wake**: inside `wake_eligible_waiters` (separate records for manual-reset fan-out vs auto-reset/semaphore single-wake) + +Diagnostic dump in [`xenia-app/src/main.rs::dump_thread_diagnostic`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-app/src/main.rs) prints the audit trail at end-of-run when audit is enabled. Highlights `` (smoking gun for missing signal source) and `` (handles called out in the original deadlock report). + +## Diagnostic finding (2026-04-25) + +The HLE sync gap previously reported at ~7.5M cycles on Sylpheed boot is **no longer reproducing**. Verified: + +- `xenia-rs exec sylpheed.iso --halt-on-deadlock -n 500_000_000` — `EXIT=0`, no halt fired, no `scheduler.deadlock_halts` or `scheduler.deadlock_recoveries` counters appear. +- VdSwap=1 fires at ~22M instructions, VdSwap=2 at ~30M instructions; matches the post-Tier-4 baseline. +- Audit data confirms the originally-suspect handles (0x10FC, 0x1014, 0x1104, 0x10DC, 0x10F0) all *do* receive signals: e.g., 0x10FC = Event/Auto with 1 signal (NtSetEvent from tid=4) + 1 wake; 0x1014 = Semaphore with 15 signals / 15 wakes / 16 waits. +- Threads still parked at end-of-run (tids 2/3/4/5/6/10/13/14/16/18) are in normal worker-idle states (event+semaphore producer/consumer with timeouts, or "service exits on stop-event" with no shutdown signal — both expected). + +**Why:** likely a combination of the IRQ-injection stack-pad fix (2026-04-24) and Tier-4 perf work (2026-04-25) shifted scheduler timing past the previous deadlock window. + +**How to apply:** APC + Mutant infrastructure (`KeInitializeApc=0x6D`, `KeInsertQueueApc=0x7A`, `NtQueueApcThread=0xE3`, `KeAlertThread=0x4F`, `KeInitializeMutant=0x72`, `KeReleaseMutant=0x87`, `NtCreateMutant=0xD4`, `NtReleaseMutant=0xF2` — all in canary [`xboxkrnl_threading.cc`](../../../RE%20Project%20Sylpheed/xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc)) was planned but DEFERRED — the audit data did not point to a missing kernel API. Implement only when a future regression actually requires it. + +## Resolve fill-ins landed (2026-04-25) + +[`xenia-gpu/src/edram.rs`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-gpu/src/edram.rs) gained: `write_sample_32bpp`, `write_rect_32bpp`, `read_sample_64bpp`, `write_sample_64bpp`, `write_rect_64bpp`, `fill_rect_64bpp`. 64bpp helpers use Canary's doubled-pitch convention (`pitch_tiles_32bpp << 1`). + +[`xenia-gpu/src/resolve.rs`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-gpu/src/resolve.rs) `copy_to_memory` now handles: +1. **64bpp sources** via new `is_64bpp_bitwise_equivalent` (k_16_16_16_16, k_16_16_16_16_FLOAT, k_32_32_FLOAT). Two `write_u32` per pixel; `bpp_log2 = 3` for tiled offset. +2. **MSAA averaging** (k01/k23/k0123) via per-format decode/average/encode helpers: + - `k_8_8_8_8`/`k_8_8_8_8_GAMMA`: per-byte rounded unsigned mean + - `k_2_10_10_10`: per-field rounded mean (widths 2/10/10/10) + - `k_16_16_FLOAT`, `k_16_16_16_16_FLOAT`: half-float decode → fp32 sum → encode + - `k_32_FLOAT`, `k_32_32_FLOAT`: bitcast → fp32 sum → bitcast + - `k_16_16_16_16`: per-16-bit-field rounded mean + +[`xenia-gpu/src/gpu_system.rs`](../../../RE%20Project%20Sylpheed/xenia-rs/crates/xenia-gpu/src/gpu_system.rs) clear-paint dispatches to `fill_rect_64bpp` for 64bpp sources, using `RB_COLOR_CLEAR_LO` (lo) + `RB_COLOR_CLEAR` (hi) per Canary `draw_util.cc:1302-1303`. + +Endian k8in64/k8in128 and `copy_dest_exp_bias != 0` remain on backlog (rare on first-pixels path); current code preserves the pre-existing warn+skip behavior for both. + +## wgpu→ShadowEdram readback — deferred, foundation in place + +`ShadowEdram` write APIs (`write_rect_32bpp`, `write_rect_64bpp`) are the foundational data-structure work the future readback retile path will use. The cross-thread plumbing (UiBridge `request_rt_readback` / `poll_rt_readback`, per-RT offscreen wgpu textures in `xenos_pipeline.rs`, `copy_texture_to_buffer` + `map_async` callback) is **deferred**: Sylpheed's current boot path fires no Xenos draws, so wiring the cross-thread readback today would land speculative code that can't be exercised against a real game flow. The plan file at [`/home/fabi/.claude/plans/please-address-the-hle-eager-pixel.md`](../../../home/fabi/.claude/plans/please-address-the-hle-eager-pixel.md) Section 2 has the full design (`ReadbackState`, `OffscreenRt`, `RtCache`). + +## Verification (2026-04-25 session) + +- `cargo test --workspace --release` — **386 tests pass** (was 369 baseline; +17 new for audit, edram, resolve). +- `xenia-rs check sylpheed.iso -n 2_000_000 --expect crates/xenia-app/tests/golden/sylpheed_n2m.json` — clean in both default block-cache mode and `XENIA_FORCE_PER_INSTR=1` per-instruction mode. +- `xenia-rs exec sylpheed.iso --halt-on-deadlock -n 500_000_000` — `EXIT=0`, no deadlock counters tripped. +- `cargo bench -p xenia-cpu` — `tight_alu_loop=119.86 MIPS`, `loadstore_loop=95.67 MIPS`, `mmio_storm=70.08 MIPS` (all at or above the prior post-Tier-4 baseline of 114.8 / 91.8 / 67.8). + +## Files touched this session + +- New: `xenia-rs/crates/xenia-kernel/src/audit.rs`. +- Modified: `xenia-kernel/src/lib.rs` (mod), `state.rs` (KernelState::audit + helpers), `exports.rs` (hook calls at create/signal/wait/wake sites). `xenia-app/src/main.rs` (--trace-handles flag, audit dump in dump_thread_diagnostic, env var `XENIA_TRACE_HANDLES`). `xenia-gpu/src/edram.rs` (new 32bpp+64bpp write APIs + fill_rect_64bpp + tests). `xenia-gpu/src/resolve.rs` (is_64bpp_bitwise_equivalent + 64bpp source path + MSAA averaging + half-float helpers + tests). `xenia-gpu/src/gpu_system.rs` (64bpp clear-paint dispatch). diff --git a/migration/claude-memory/project_xenia_rs_hle_import_fixes_2026_04_27.md b/migration/claude-memory/project_xenia_rs_hle_import_fixes_2026_04_27.md new file mode 100644 index 0000000..310f2a5 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_hle_import_fixes_2026_04_27.md @@ -0,0 +1,37 @@ +--- +name: HLE import fixes (2026-04-27) +description: Three real bugs from the kernel-import audit fixed — KeInitializeSemaphore seeds count/limit, XexGet{Module,Procedure}Address use distinct pseudo-handles + reverse thunk map. +type: project +originSessionId: e11f9c65-ab38-4eac-bff2-c5a64c5b8467 +--- +## What changed (2026-04-27) + +Fixed the three ❌ findings from the kernel/HLE/imports audit. All three Sylpheed-imported and previously latent. + +1. **KeInitializeSemaphore** ([exports.rs:498](xenia-rs/crates/xenia-kernel/src/exports.rs#L498)) — was a 0x14-byte zero-fill that dropped r4 (count) and r5 (limit). `ensure_dispatcher_object` later read those-now-zero fields and minted `Semaphore { count: 0, max: 1 }`. Now writes proper DISPATCHER_HEADER + signal_state(+0x4) + limit(+0x10) per [canary xboxkrnl_threading.cc:692](xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc#L692). + +2. **XexGetProcedureAddress** ([exports.rs:2366](xenia-rs/crates/xenia-kernel/src/exports.rs#L2366)) — was a stub returning STATUS_OBJECT_NAME_NOT_FOUND unconditionally, ignoring r3 (hmodule). Now resolves r3→ModuleId, looks up (module, ordinal) in a new reverse thunk map, writes the address to *r5, returns STATUS_SUCCESS / STATUS_INVALID_HANDLE / STATUS_OBJECT_NAME_NOT_FOUND. + +3. **XexGetModuleHandle** ([exports.rs:3231](xenia-rs/crates/xenia-kernel/src/exports.rs#L3231)) — was returning `state.image_base` for `"xboxkrnl.exe"` (wrong — that's the game's image) and `0` for `"xam.xex"`. Also wrong calling convention (handle in r3 vs canary's NTSTATUS-in-r3-and-handle-via-*r4). Now uses distinct pseudo-handles `HMODULE_XBOXKRNL=0xFFFE_0001` / `HMODULE_XAM=0xFFFE_0002`, writes to *r4, returns NTSTATUS in r3 (X_ERROR_NOT_FOUND=0x048B for unknown). + +## New plumbing + +- `pub const HMODULE_XBOXKRNL`, `HMODULE_XAM` in [state.rs:42-50](xenia-rs/crates/xenia-kernel/src/state.rs#L42). +- `KernelState.thunks_by_ordinal: HashMap<(ModuleId, u16), u32>` field; helpers `register_thunk`, `resolve_thunk`, `module_id_from_hmodule`. +- main.rs Phase 1 thunk loop now also pushes into `thunk_addr_map: Vec<(ModuleId, u16, u32)>` and drains it into `kernel.register_thunk(...)` right after `KernelState::with_gpu(...)` ([main.rs:737](xenia-rs/crates/xenia-app/src/main.rs#L737)). + +## Verification + +- `cargo test -p xenia-kernel` → 76/76 (was 73; 3 new tests added). +- `cargo test --workspace` → all green. +- `xenia-rs check sylpheed.iso -n 30M --parallel` → instructions=30M, swaps=2, unimpl=0, interrupts=53. **VdSwap=2 baseline preserved.** + +## Files touched + +- [xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) +- [xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) (rewrote 3 export bodies + 1 new const + 3 tests near line 4818) +- [xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) (lines 633-651 + 737-742) + +## Why this matters + +Sylpheed only calls `KeInitializeSemaphore` once during boot (visible in the digest) and so far hasn't tripped over the wrong count/max — the fix is preventative. `XexGet*` were both safe stubs; the proper plumbing is now in place if any title (Sylpheed or future) calls them. diff --git a/migration/claude-memory/project_xenia_rs_io_002_volallocunit_2026_05_04.md b/migration/claude-memory/project_xenia_rs_io_002_volallocunit_2026_05_04.md new file mode 100644 index 0000000..53257c2 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_io_002_volallocunit_2026_05_04.md @@ -0,0 +1,43 @@ +--- +name: KRNBUG-IO-002 vol-info alloc-unit fix landed; cascade FALSIFIED +description: 2026-05-04. nt_query_volume_information_file class-3 returned 2048 bytes/cluster; corrected to canary NullDevice (sectors=0x80, bps=0x200 → 0x10000). Lockstep deterministic, but the audit-006 predicted 7→0 cascade did NOT fire (7→7 unchanged). Volume-info is NOT the priv-11 gate. Next session must probe sub_824A9710 entry to find the real upstream gate. +type: project +originSessionId: cc00c351-3064-4841-a0f7-3ce94e28895e +--- +🎯 **KRNBUG-IO-002 (2026-05-04, LANDED branch `xboxkrnl-vol-allocunit/p0-65536-cluster`)** — `nt_query_volume_information_file` class-3 (`FileFsSizeInformation`) at `crates/xenia-kernel/src/exports.rs:1241-1269` corrected from `(total=0x100000, free=0, sectors_per_unit=1, bytes_per_sector=2048)` to canary-NullDevice byte-identical `(total=0x10, free=0x10, sectors_per_unit=0x80, bytes_per_sector=0x200)` (product = 65536). Reference: `xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`. Tests 591 → 592 (added `nt_query_volume_information_file_class3_returns_64k_alloc_unit`). Lockstep `instructions=100000010, swaps=2, draws=0` deterministic across two reruns. sylpheed_n50m oracle still matches its existing golden — **the change is observably a no-op at -n 50M, and remains a no-op at -n 500M**. + +**Why:** AUDIT-006 predicted 7→0 collapse of canary-only exports (priv-11 query → XamTaskSchedule → Cache0 callback thread → 0x100c worker spawn → display-init pump). The block-size mismatch was named the SOLE upstream gate per `project_xenia_rs_io_nullfile_2026_05_04.md` and the audit-006 queue file's Tier-0 entry. The fix sketch was prescribed: 4-LOC change, ≤4 lines diff. + +**How to apply:** the value change is *correct* (canary-byte-identical), so the branch lands. But it is **not load-bearing for the priv-11 gate**. The audit-006 cascade prediction was **falsified** — the same seven canary-only exports remain `(ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings)`, identical to the audit-006 set, with `XexCheckExecutablePrivilege=1, XamTaskSchedule=0` unchanged. + +**Diagnostic — why the prediction failed.** All 16 `NtQueryVolumeInformationFile` calls in our 500 M trace originate from a single LR `0x82611f38`. Both pre- and post-fix runs show the calls completing with `STATUS_SUCCESS` and never causing a downstream branch. The audit-006/audit-005 premise that `sub_824ABA98` (`VerifyDirBlockSize`) consumes the volume-info reply at the priv-11 gate is therefore likely incorrect, *or* the gate consumes a different information class via a different export entirely (`nt_query_information_file`, `nt_query_full_attributes_file`, an FsCtl `NtDeviceIoControlFile` response, etc.). + +**Pre/post snapshot** (audit-006 → post-IO-002): +- canary-only kernel exports: **7 → 7** (identical set) +- `XexCheckExecutablePrivilege`: 1 → 1 (priv=0xA only — priv=0xB never fires) +- `XamTaskSchedule`: 0 → 0 +- `swaps`: 2 → 2; `draws`: 0 → 0 +- worker spawns: 19 → 18 (within noise; LRs `0x824ac5f0×15 + 0x824cd984×1 + 0x824d2e68×2`) +- `imports@-n 100M` (stable digest): 987686 → **987630** (-56) — slight trajectory shift, but not gate-opening +- `NtQueryVolumeInformationFile` calls: 16 → 16 (no new sites reached) +- parked-handle `signal_attempts`: 0 → 0 (handles 0x1004 / 0x100c / 0x15e0 still parked) + +**Stop condition.** Per the IO-002 task brief: *"If audit-006's prediction collapses (7 → 6 instead of 7 → 0), the alloc-unit fix wasn't the SOLE gate after all. Document which of the 7 still classify as REAL_BUT_UNREACHED, capture the new upstream gate from the canary diff, hand back. Do NOT pivot to a second fix this session."* Prediction collapsed cleanly to **zero movement** (7 → 7); branch landed, no second fix attempted, hand-back triggered. + +**Trace artifacts** (re-runnable): +- `audit-runs/post-IO-002/ours.log` (692 MB, 5.6 M lines, `RUST_LOG=probe_calls=trace --trace-handles-focus=0x1004,0x100c,0x15e0 -n 500M`) +- `audit-runs/post-IO-002/canary.log` (audit-006 oracle copy; canary build `9467c77f0`) +- `audit-runs/post-IO-002/diff.py` +- `audit-runs/post-IO-002/lock_n100m_run{1,2}.json` (bit-identical) +- `audit-runs/post-IO-002/canary_exports.txt`, `ours_exports.txt`, `canary_only.txt` + +**Next-session next-gate hypothesis (untested, ranked by likelihood):** + +1. **`sub_824A9710` entry-side probe.** AUDIT-005's instrumentation has never seen the priv-11 site fire in any session. Use `--pc-probe` on the entry of `sub_824A9710` and on every conditional branch inside it; whichever branch exits the function before the priv-11 `XexCheckExecutablePrivilege` call site is the actual gate. Disassemble with `xenia-rs dis --json --at 0x824a9710` and walk top-to-bottom. +2. **Different info-class.** `nt_query_information_file` (class 5 `FileStandardInformation`, class 22 etc.) or `nt_query_full_attributes_file` may be the actual consumer. Our 16 volume-info calls at LR `0x82611f38` complete successfully → **not the gate** even though they were the audit-006 suspect. +3. **Mis-attributed disasm.** AUDIT-005's identification of `sub_824ABA98 = VerifyDirBlockSize(path, expected_alloc_unit_bytes)` came from disasm reading; IO-001's runtime trace already invalidated parts of that attribution. The "verifier rejects 2048" hypothesis is no longer supported by evidence. +4. **A different IOCTL.** `NtDeviceIoControlFile` is now reachable (KRNBUG-IO-001 unblocked it); some FsCtl response we return may be the new gate. The IOCTL code we observed was `0x70000+0x4004` per the IO-001 memory. + +**What this session validates about the audit framework:** AUDIT-006's `REAL_BUT_UNREACHED` classification of all 7 entries was correct (they remain unreached). The framework's *triage discipline* held — we did not pull a Tier-1 entry. But its *gate identification* was wrong (Tier-0 hypothesis), which is exactly the failure mode the stop condition was designed to catch. Future audits that hinge on a single causal hypothesis should explicitly enumerate alternative gates and demand probe-evidence rather than disasm-only attribution. + +**Headline.** Block-size mismatch closed (fix is correct), but the renderer plateau persists — `swaps=2 draws=0`, parked handles `signal_attempts=0`. The 6-session producer hunt for handles 0x1004/0x100c/0x15e0 remains open. Real gate is upstream of `nt_query_volume_information_file`. diff --git a/migration/claude-memory/project_xenia_rs_io_003_ioctl_2026_05_04.md b/migration/claude-memory/project_xenia_rs_io_003_ioctl_2026_05_04.md new file mode 100644 index 0000000..e5a24c9 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_io_003_ioctl_2026_05_04.md @@ -0,0 +1,48 @@ +--- +name: KRNBUG-IO-003 NtDeviceIoControlFile cascade landed +description: Real NtDeviceIoControlFile mirroring NullDevice::IoControl for FsCtlCodes 0x70000+0x74004; priv-11 query and XamTaskSchedule fire. 7→3 canary-only exports. Worker count and 0x100c handle unchanged. +type: project +originSessionId: 6fd4fb3a-7b89-4b04-9055-8e6321310ad2 +--- +# KRNBUG-IO-003 — NtDeviceIoControlFile real impl (2026-05-04, LANDED) + +**Branch:** `xboxkrnl-ioctl/p0-fsctl-mountinfo` (no-ff merge to master). + +## What changed + +- `crates/xenia-kernel/src/exports.rs:90` — `stub_success` → `nt_device_io_control_file`. +- New function body in same file (post `nt_write_file`). Mirrors canary `xboxkrnl_io.cc:645-678` + `null_device.cc`. For FsCtlCode `0x70000`: writes `cache_size/512 (=0x7F8)` at OUT+0 and `512` at OUT+4 (u32 BE). For `0x74004`: writes `0` at OUT+0 and `cache_size = 0xFF000` at OUT+8 (u64 BE). Other codes return `STATUS_INVALID_PARAMETER` with a `tracing::warn!`. Fills `IoStatusBlock` (canary's TODO) with status + bytes-written. +- Stack args 9-10 (OutputBuffer, OutputBufferLength) read from `[sp+0x54]` / `[sp+0x5C]`. **Xbox 360 PowerPC 32-bit ABI**: linkage area sp+0..sp+8, then sp+0x14 starts the parameter save area (8 quadword spill slots × 8 bytes = 64 bytes), so 9th arg lands at sp+0x14+0x40 = sp+0x54. Each slot is 8 bytes wide but holds 4-byte values via `stw` (caller does `stw r5, 84(r1)` and `stw r11, 92(r1)` for the two stack args). Confirmed by disasm of caller `sub_824ABD88:0x824abe04-0x824abe10` and `0x824abe70-0x824abe78`. **No code in xenia-rs ever needed this offset before — first 9+-arg HLE export.** +- 2 unit tests added (one per FsCtlCode); test count 592→594. +- `crates/xenia-app/tests/golden/sylpheed_n50m.json` re-baselined: `instructions=50000004→50000003`, `imports=407362→407255`. `sylpheed_n2m` unchanged. + +## Audit-007 prediction scorecard + +5/8 sharp predictions held. Recorded for meta-validation: + +| # | Prediction | Held? | +|---|---|---| +| (a) cargo test green | 592→594 | ✓ | +| (b) Lockstep determinism | run1≡run2≡run3 (`audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json`) | ✓ | +| (c) `XexCheckExecutablePrivilege` 1→≥2 | 1→2 (priv=0xA + priv=0xB both query) | ✓ | +| (c) `XamTaskSchedule` 0→≥1 | 0→1 | ✓ | +| (e) canary-only exports 7→≤3 | 7→3 (exact lower bound) | ✓ | +| (d) 0x100c worker spawn | still UNCREATED, signal_attempts=0 | ✗ | +| (d) 0x1004 signal_attempts >0 | still 0 | ✗ | +| (f) Worker thread spawn count >19 | unchanged at 19 | ✗ | + +**Bonus signal not predicted:** 0x15e0 semaphore: `signal_attempts=0→1` (primary=1, "not stuck — signals consumed correctly") — XamTaskSchedule's downstream pump now runs. + +## Why: lockstep digest deltas at -n 100M + +`instructions=100000010→100000019` (only +9), `imports=407417→987524` (+2.4×, similar pattern to KRNBUG-XAM-001's HDMI fix). Small instruction delta + huge imports delta = the gate unblocks fast and a tight import-heavy loop runs. `swaps=2 draws=0` plateau persists. + +## How to apply + +Three exports remain canary-only: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`. The next gate is downstream of XamTaskSchedule — likely the post-task-schedule completion path that should spawn the 0x100c worker. Re-running `--branch-probe` against `sub_824A9710` would now show a NEW exit branch (one of `0x824a996c`, `0x824a9998`, `0x824a9a18`) since the priv-11 site now succeeds. That trace would name the next caller in the chain. + +## Trace artifacts + +- `audit-runs/post-IO-003/lock_n100m_run{1,2,3}.json` (byte-identical) +- `audit-runs/post-IO-003/lock_n500m.json` (`instructions=500000010 imports=5629676`) +- `audit-runs/post-IO-003/exec_trace_focus_500m.log` (handle focus 0x1004, 0x100c, 0x15e0) diff --git a/migration/claude-memory/project_xenia_rs_io_004_xnotify_listener_2026_05_06.md b/migration/claude-memory/project_xenia_rs_io_004_xnotify_listener_2026_05_06.md new file mode 100644 index 0000000..e1cb897 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_io_004_xnotify_listener_2026_05_06.md @@ -0,0 +1,68 @@ +# KRNBUG-IO-004 — Real XNotify listener (2026-05-06) + +## Status +LANDED. Branch `xnotify-listener/p0-startup-enqueue` merged no-ff into master. + +## Background +Audit-012 confirmed dispatcher@0x40111890 + vtable@0x820A183C are correctly +populated. The gate was `xnotify_get_next` (xam.rs:363) returning 0 forever, +causing `bc 12, 4*cr6+eq, 0x822F1C20` at 0x822f1be4 to bypass the dispatch arm +at 0x822F1BE8 indefinitely. + +## Phase 1.5 result (NOT committed) +Synth-stub auto-enqueued `(0x0A, 1)` on the first `XNotifyGetNext` after listener +registration. Branch-probe (temporarily augmented to print CTR) at PCs +{0x822f1be8, 0x82175338, 0x82173dc8, 0x822f1c04} confirmed: +- Dispatch arm reached: r3=0x40111890 (= mem[0x40111890]) at thunk 0x82175338 +- bcctrl target = thunk 0x82175338 → sub_82173DC8 (matches audit-012) +- Returned cleanly to 0x822f1c04 (no abort) + +Stub + branch-probe CTR addition reverted. `cargo test --workspace --release` +green at 594. + +## Phase 2 implementation (committed) +- `KernelObject::NotifyListener { mask, max_version, queue: VecDeque, waiters }` in `objects.rs`. +- `KernelState::has_notified_startup` + `has_notified_live_startup` in `state.rs`. +- `xam_notify_create_listener` in `xam.rs`: read mask=r3 (qword), max_version=r4 (clamped to 10), + alloc handle, on first listener with `mask & kXNotifySystem` enqueue + `(0x09, 0)` + `(0x0A, 1)`; with `mask & kXNotifyLive` enqueue `(0x02000001, 0x001510F1)` + `(0x02000003, 0)`. + Mirrors `kernel_state.cc:1013-1033` byte-for-byte. +- `xnotify_get_next` in `xam.rs`: handle=r3, match_id=r4, id_ptr=r5, param_ptr=r6. + Pop front (or scan-by-id if match_id != 0). Mask + version filter on enqueue + per `xnotifylistener.cc:38-51`. XNotificationKey: mask_index=bits 25..30, + version=bits 16..24. +- 5 unit tests added: full-mask drains 4 startup notifications in order; + second listener does not re-fire; system-only mask filters live; + version-0 filter drops too-new; unknown-handle returns 0. +- LOC: 119 total (97 impl + 22 scaffolding pattern matches in main.rs/objects.rs/state.rs). + +## Cascade-prediction scorecard +- (a) `cargo test --workspace --release`: 594 → 599 PASS +- (b) Lockstep determinism `-n 100M`: instructions=100000012 stable across two reruns; bit-identical diff PASS +- (c) AUDIT-009 21-PC + AUDIT-005 9-PC probe set: 3 newly reachable in `sub_82173DC8` ancestry: `0x822c6870` (fired from 2 worker threads tid=14,15), `0x824563e0` (tid=16), `0x823ddb50` (tid=19). PASS (predicted 1-3) +- (d) Canary-only export delta 7 → 3: `KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule` newly reached. Still canary-only: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`. PASS (set shrank; specific predictions partial — XamUserReadProfileSettings remained but TaskSchedule fell off, which is reasonable as the post-fix execution path branched differently) +- (e) signal_attempts on parked handles: 0x15e0 = 1 (primary=1, ghost=0) was 0; 0x1004 still 0; 0x100c in this -n 500M trace. PASS (predicted >0 on at least one) +- (f) Worker thread count 18 → 20. PASS (delta confirmed) +- (g) draws=0 still expected, VdSwap=2 unchanged. PASS (acknowledged plateau) + +## Key runtime values (audit-012-confirmed, re-verified) +- Dispatcher: `0x40111890` +- Dispatcher field at +0: vtable pointer 0x820A183C +- vtable[1] (offset +4): 0x82175338 (XAM thunk → sub_82173DC8) +- Dispatch sequence at 0x822f1be8: `lwz r3, 7944(r25); lwz r5, 84(r31); lwz r4, 88(r31); lwz r11, 0(r3); lwz r11, 4(r11); mtspr CTR,r11; bcctrl` +- Sylpheed listener mask = 0x2F (covers both kXNotifySystem and kXNotifyLive) +- XNotifyGetNext call rate post-fix: 1.49M / -n 500M (was 1.49M pre-fix as well — main.tid=1 is in a frame-poll loop wrapping this call) + +## Still-canary-only (post-fix) +1. `ExTerminateThread` — likely fires on worker shutdown which doesn't happen in our trace +2. `KeReleaseSemaphore` — referenced by 0x15e0's producer chain (signal_attempts=1 primary direct on the kernel handle, no Ke shadow) +3. `XamUserReadProfileSettings` — gated on a path past the renderer plateau; provisional next blocker + +## Master HEAD +Pre-fix: `50a4887`. Post-fix branch merged no-ff with KRNBUG-IO-004 commit + merge commit. + +## Stop-condition adherence +- One step per session: stopped after landing, did not chase ExTerminateThread/KeReleaseSemaphore/XamUserReadProfileSettings. +- LOC budget 120: 119 ≤ 120 PASS. +- C++ runtime audit backlog (CPPBUG-AUDIT-001) untouched. +- No git push. diff --git a/migration/claude-memory/project_xenia_rs_io_nullfile_2026_05_04.md b/migration/claude-memory/project_xenia_rs_io_nullfile_2026_05_04.md new file mode 100644 index 0000000..9d8fa0a --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_io_nullfile_2026_05_04.md @@ -0,0 +1,116 @@ +--- +name: KRNBUG-IO-001 NullFile-style synth-empty read (2026-05-04, LANDED) +description: One-line fix at exports.rs nt_read_file. Synth empty files return SUCCESS+0 instead of EOF, mirroring canary NullFile::ReadSync. Cascade walked massively (10→7 canary-only exports; 6→19 worker threads). +type: project +originSessionId: 4fddefca-e32e-4f61-b2b2-fb42c949822b +--- +**🎯 KRNBUG-IO-001 (2026-05-04, LANDED master `556a8c3`)** + +## What sub_824ABA98 actually is +`VerifyDirBlockSize(ANSI_STRING* path, u32 expected_alloc_unit_bytes)`: +NtOpenFile(dir, FILE_OPEN_FOR_FREE_SPACE_QUERY|FILE_DIRECTORY_FILE, +OBJ_CASE_INSENSITIVE) → NtQueryVolumeInformationFile(class=3 +FileFsSizeInformation, len=24) → NtClose. Returns the query status if +<0; else 0 if `SectorsPerAllocationUnit * BytesPerSector == r4`, +else `0xC000014F`. Three import thunks: 0x8284DD7C=NtOpenFile, +0x8284DD5C=NtQueryVolumeInformationFile, 0x8284DD6C=NtClose. + +## What sub_824ABD88 actually is +`MaybeMountAndIoctl(ANSI_STRING* path, u32 expected)`: calls +sub_824ABC88(path, sp+128) — which compares path against +`\Device\Harddisk0\Partition1` (string at 0x820015A4) and short-circuits +if they match; otherwise opens `\Device\Harddisk0\WindowsPartition` +(0x82001580) and runs NtDeviceIoControlFile(IoCode=0x70000+0x4004=0x74004, +out=16) to read 8 bytes of disk metadata into *out. Then opens `path` +RW (DesiredAccess=0x100003, FILE_NO_INTERMEDIATE_BUFFERING| +FILE_SYNCHRONOUS_IO_ALERT) and runs NtDeviceIoControlFile(IoCode=0x70000, +out=8). Note: leaks the handle on success (no NtClose on the success +path). + +## AUDIT-005's attribution was wrong +Static analysis pointed to sub_824ABA98, but the runtime trace +(`probe_calls hw=0 call=NtReadFile r3=0x1008 ... lr=0x824a9814` +followed immediately by `RtlNtStatusToDosError r3=0xc0000011 ... +lr=0x824a97e4`) decisively located the failure at the **`NtReadFile` +call inside sub_824A9710 at 0x824a9810** — well before sub_824ABA98 is +reached. The sub_824ABA98 caller chain runs only after the cache magic +check or recreate path; the chain was unreachable because the function +bailed at the partition0 read. + +## sub_824A9710 = LoadOrCreateCacheCatalog +Opens `\Device\Harddisk0\partition0` (NtCreateFile with +FILE_ATTRIBUTE_SYSTEM, ShareAccess=FILE_SHARE_READ, CreateDisposition= +FILE_OPEN, OpenOptions=0x22), reads 1024 bytes from offset 2048, +validates "Josh" magic at byte 0, walks a 2-element slot table for +input r26 (caller's device-id from sub_824A9128), formats +`\Device\Harddisk0\Cache%u` and `\Device\Harddisk0\Cache%u\` paths +into ANSI_STRINGs at sp+112 and sp+120, then calls +`sub_824ABD88(sp+112, r27=0x10000)` and `sub_824ABA98(sp+120, r27= +0x10000)` to verify the cache subdirectory. Caller is sub_824A9AA0, +caller of caller is `main()` at 0x8216EA68 with args (1, 0x10000, +0xFF000) → expected alloc-unit-bytes is **0x10000 = 65536** (NOT +0xFF000, which is r25 used elsewhere as a quota). + +## Bug class (β: bit-level / stub gap) +Our synth-empty-file fallback (open_vfs_file when VFS lookup fails) +returns SUCCESS for the open + size=0 file. NtReadFile then returned +STATUS_END_OF_FILE because start_pos=2048 > size=0. Canary mounts +partition0 to a `NullDevice`; `NullFile::ReadSync` +([null_file.cc:24-31](xenia-canary/src/xenia/vfs/devices/null_file.cc)) +returns X_STATUS_SUCCESS with bytes_read=0 and never touches the +buffer. Sylpheed's caller pre-zeroes the 1024-byte stack target via +`memset(sp+208, 0, 1024)` (sub_824A9710 prologue), so on return the +buffer is all zeros, the "Josh" magic check fails, and the recreate +path runs. + +**Fix (one-line behavioral change):** in `nt_read_file` at +[exports.rs:947](crates/xenia-kernel/src/exports.rs#L947), when +`data.is_empty() && total == 0`, return STATUS_SUCCESS with +information=0 instead of falling into the EOF check. This treats the +synth-empty handle as a NullFile. + +## Chain-of-effects (post-fix, master `556a8c3`) +- tests: 590 → 591 (new regression covering NullDevice semantics) +- lockstep: BIT-IDENTICAL across 3 reruns at -n 100M + (`instructions=100000010, imports=987630, swaps=2`) +- sylpheed_n50m golden re-baselined `50000004→50000000`, + `imports 407416→407362` +- canary kernel-call diff: **10 → 7 missing exports**. + Newly matched at -n 500M: XeCryptSha, XeKeysConsolePrivateKeySign, + NtDeviceIoControlFile (the cache-recreate path runs through to + NtWriteFile). Still canary-only: ExTerminateThread, KeReleaseSemaphore, + KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, + XamUserReadProfileSettings. +- boot trajectory: at -n 500M now spawns **19 worker threads** (was + ~6 pre-fix). Notably tid=10 entry=0x82178950 (the parked-handle 0x1004 + worker) and tid=16 entry=0x82170430 (the 0x15e0 worker) NOW spawn. +- parked-handle 0x1004 still `signal_attempts=0` (singleton ctor at + lr=0x824a9f6c). Handles 0x100c and 0x15e0 are now `` because + cascade walked past them and handle assignments shifted forward; + **new parked sites**: 0x12fc, 0x1600, 0x1040, 0x10b8, 0x15e8, 0x1014, + 0x101c, 0x10bc, 0x1044, plus 0x42450b5c (still parked, tid=6). + +## Next-frontier blockers +1. **XamTaskSchedule cluster** (priv-11 path) — the next-up canary + cascade step. +2. **Block-size mismatch in nt_query_volume_information_file**: + currently returns SectorsPerAllocUnit=1, BytesPerSector=2048 → + product=2048. Sylpheed expects 0x10000=65536. When sub_824ABA98 is + eventually reached, it will return 0xC000014F. This is an obvious + downstream gap if the recreate path's sub_824ABD88 call chain ends + up triggering a verify path that returns to sub_824ABA98 with a + tighter expected size. +3. **Many new parked handles** — 9 unique sites; the cascade is + producing more events than we have producers for. Need a fresh + --trace-handles-focus run on the new handles to characterize. + +## Files touched +- [exports.rs:947](crates/xenia-kernel/src/exports.rs#L947): inserted + 9-line synth-empty bypass before the EOF check. +- [exports.rs:4001](crates/xenia-kernel/src/exports.rs#L4001): + `nt_read_file_synth_empty_file_returns_success_with_zero_bytes` + regression covering buffer-untouched + IOSB.information=0 + + STATUS_SUCCESS + completion-event signal. +- [tests/golden/sylpheed_n50m.json](crates/xenia-app/tests/golden/sylpheed_n50m.json): + re-baselined (`instructions 50000004→50000000`, `imports + 407416→407362`). diff --git a/migration/claude-memory/project_xenia_rs_kernel_ke_resume_thread_2026_05_06.md b/migration/claude-memory/project_xenia_rs_kernel_ke_resume_thread_2026_05_06.md new file mode 100644 index 0000000..294c23f --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_kernel_ke_resume_thread_2026_05_06.md @@ -0,0 +1,44 @@ +# KRNBUG-KE-001 — real `KeResumeThread` (LANDED 2026-05-06) + +## Summary +Replaced no-op `ke_resume_thread` (exports.rs:3658-3664 — set r3=0, ignored r3) with a real impl per canary `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227`. Mirrors `nt_resume_thread`'s plumbing (resolve_pseudo_handle → scheduler.find_by_handle → resume_ref). Returns `STATUS_SUCCESS` if the KTHREAD-pointer-as-handle resolves, `STATUS_INVALID_HANDLE` otherwise. + +Branch: `ke-resume-thread/p0-canary-mirror` — local commit + no-ff merge. Tests 600 → 601. Lockstep `instructions=100000003 imports=987516` deterministic ×2. + +## Cascade-prediction scorecard (audit-018 → post-fix) +- **A — thread liveness PASS**: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) transition `Blocked(Suspended)` → ran prologue → now `Blocked(WaitAny)` on audio buffer-completion semaphores 0x828A3254 / 0x828A3230 (handle decimal 2190094932 / 2190094896). +- **B — counters PARTIAL FAIL**: `NtSetEvent 667→3334` (~5× rise). `KeResumeThread=2` real for first time. `KeReleaseSemaphore=0` (still). `XAudioSubmitRenderDriverFrame=0` (still). Workers reached prologue but parked on a downstream consumer wait before the audio render-tick semaphore-release loop. +- **C — canary-only delta FAIL (predicted 2→1, actual 2→2)**: `ExTerminateThread`, `KeReleaseSemaphore` both still canary-only. +- **D — γ-cluster blocker FAIL**: `--pc-probe=0x82184318,0x82184374` armed, neither fires. `--dump-addr=0x828F4070` armed, no DUMP. Listener struct `[+64]` unchanged. `--trace-handles-focus=0x1004,0x100c,0x1020,0x15e4` shows all 4 still `signal_attempts=0`. + +## Milestone status +- Renderer cluster cascade collapsed? **NO**. +- signal_attempts > 0 on parked handles? **NO**. +- draws > 0? **NO** (swaps=2 draws=0 plateau intact). + +## Why C and D didn't fire +The audio render-frame ticker thread (entry=0x824D2878) is per-canary unblocked by `KeResumeThread` and immediately enters its body. In our run the body advances past prologue (PC 0x824D2878 → currently parked at 0x824D2990 / 0x824D28D0) but the gate is a **downstream** semaphore the audio infrastructure hasn't populated. That gate is itself part of a separate bug class — the consumer-side semaphore producer at 0x828A3230 / 0x828A3254 is gated by something else (likely the audit-009/-016/-017 γ-cluster `[0x828F4070+64]==-1`). + +The fix is **necessary but not sufficient**: it cleanly attributes the renderer plateau to a downstream blocker, narrowing the search. + +## Verification +- `cargo test --workspace --release` clean. New test `ke_resume_thread_unblocks_suspended_worker` covers Suspended→Ready transition + INVALID_HANDLE branch. +- `cargo test -p xenia-app --test sylpheed_oracles --ignored` green after re-baselining `sylpheed_n50m.json` (instructions 50000003→50000011, imports 407255→407247; draws/swaps unchanged at 0/2). + +## Files touched +- `crates/xenia-kernel/src/exports.rs` (12 LOC — fix + 41 LOC test) +- `crates/xenia-app/tests/golden/sylpheed_n50m.json` (re-baseline) +- `audit-findings.md` (new KRNBUG-KE-001 section) +- `audit-runs/audit-006/canary_export_queue.md` (status update) + +## Trace artifacts +- `audit-runs/post-ke-resume/lockstep_run{1,2}.json` — lockstep determinism +- `audit-runs/post-ke-resume/run.{log,err}` — 500M cascade verification +- `audit-runs/post-ke-resume/probe.{log,err}` — γ-cluster pc-probe + dump-addr +- `audit-runs/post-ke-resume/handles.{log,err}` — `--trace-handles-focus` + +## Recommended next session — AUDIT-019 (memory-watch on `[0x828F4070+64]`) +Audit-017 Option B. With KE-001 landed, the discipline gate cleanly attributes the renderer plateau to the listener-struct field rather than to a stub upstream. Memory-watch instrumentation should identify the writer that canary calls but we don't. + +## Master HEAD +Pre-session: `7ed6192`. Post-merge HEAD: see `git log master --oneline | head -3` after merge. diff --git a/migration/claude-memory/project_xenia_rs_kernel_stashhandle_2026_05_06.md b/migration/claude-memory/project_xenia_rs_kernel_stashhandle_2026_05_06.md new file mode 100644 index 0000000..c2139eb --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_kernel_stashhandle_2026_05_06.md @@ -0,0 +1,123 @@ +# KRNBUG-α-006 — `ensure_dispatcher_object` writes XObj signature + handle + +Date: 2026-05-07 (calendar carries the same date the audit-024A canary diff documented this divergence) +Branch: `xobj-stashhandle/p0-canary-mirror` (merged --no-ff into master `de5a15e`) +Status: LANDED. Tests 604 → 605. Lockstep deterministic. + +## Background + +Audit-023 + audit-024A documented byte-level divergence at `0x828F4838+0x08`: +canary stores `"XEN\0"` (kXObjSignature fourcc) at +0x08 and a kernel handle +(`0xF8000034` in the captured dump) at +0x0C. Ours had zeros. + +Canary's writer is `XObject::StashHandle(X_DISPATCH_HEADER*, uint32_t handle)` +at `xenia-canary/src/xenia/kernel/xobject.h:253-256`: + +```c++ +header->wait_list.flink_ptr = kXObjSignature; // +0x08 +header->wait_list.blink_ptr = handle; // +0x0C +``` + +Two callers in canary: +- `XObject::SetNativePointer` (xobject.cc:392) when kernel allocates a native + object and binds an existing dispatcher pointer to it. +- `XObject::GetNativeObject` (xobject.cc:474) on first guest-allocated KEVENT/ + KSEMAPHORE adoption — exactly the scenario our `ensure_dispatcher_object` + models. + +## Fix + +`crates/xenia-kernel/src/exports.rs:~3097`, after the host-shadow insert, write: + +```rust +mem.write_u32(ptr + 0x08, 0x58454E00); +mem.write_u32(ptr + 0x0C, ptr); +``` + +Stash handle is `ptr` itself because our `state.objects` is keyed by guest +pointer. Game reads the magic at +0x08 to recognize an already-adopted +dispatcher; the handle at +0x0C is opaque to the game (canary uses it as a +host-side handle map key — same role our pointer-key plays). + +7 LOC in impl (3 lines incl. 1 comment). 27 LOC in tests. 0 deletions. Hard +cap of 30 LOC on impl met. + +## Tests + +New unit test `ensure_dispatcher_object_stamps_xen_signature_and_handle` +asserts both writes against `ke_set_event` driven adoption. Updated +`ensure_dispatcher_object_ignores_unknown_type` to additionally assert ++0x08 / +0x0C remain zero on an unsupported type byte. Workspace tests +604 → 605 pass. + +## Lockstep determinism + +`cargo run --release -p xenia-app -- check sylpheed.iso --stable-digest -n 100_000_000` +produced identical output across 2 reruns: + +``` +instructions=100000003 import_calls=987516 unimplemented=0 +``` + +Identical to pre-fix master HEAD `d9e40d3` lockstep — writeback is host-side, +adds zero guest instructions. `sylpheed_n50m` golden passes unchanged. + +## Cascade observation (-n 500M, --halt-on-deadlock) + +| Metric | Pre-fix | Post-fix | Notes | +|---|---|---|---| +| `0x828F4838+0x08` | zeros | zeros | Guest never calls Ke* on this dispatcher (uses other adoption path) | +| `0x828F4838+0x00` (type) | 0x01 | 0x01 | Sync event header, populated by guest, not us | +| Worker count | 20 | 20 | unchanged | +| `KeReleaseSemaphore` | 0 | 0 | canary-only | +| `XAudioSubmitRenderDriverFrame` | 0 | 0 | canary-only | +| `ExTerminateThread` | 0 | 0 | canary-only | +| `NtSetEvent` | 3334 | 3334 | unchanged | +| `XamUserReadProfileSettings` | 2 | 2 | unchanged | +| `VdSwap` | 2 | 2 | unchanged | +| `KeSetEvent` / `KeResetEvent` / `KeWaitForSingleObject` | 1 / 1 / 5 | 1 / 1 / 5 | unchanged | + +`ensure_dispatcher_object` is hit only on direct PKEVENT / PKSEMAPHORE-pointer +paths (KeSetEvent=1 / KeResetEvent=1 — only ~1-5 invocations per run); most +guest kernel-object touches are handle-based and never traverse this code. +At 0x828F4838 in particular, the guest never invokes a Ke API with a pointer +to it — adoption in canary likely happens via a different export path +(possibly the kernel-allocated `SetNativePointer` lifecycle, which we don't +model in the same way). + +Per the task brief: "may move cascade or may not; lands regardless because +canary-divergent." No cascade ripple observed; expected. + +## Bug-class + +α (load-bearing-stub-omission). Mirror landed; the side-effect-on-guest- +memory was missing from our impl. Symmetric to the XamUserGetSigninState +fix landed pre-this-session — both are canary-correctness restorations +without sharp cascade hypothesis. + +## Discipline gate + +Boxes 2/4/5 pass. Box 1 ("sharp cascade prediction") explicitly waived per +task brief. Box 3 ("doesn't break lockstep") passes (digest unchanged). + +## Trace artifacts + +- `audit-runs/post-stashhandle/dump-500m.log` (-n 500M dump-addr=0x828F4838 + counters) +- `audit-runs/post-stashhandle/dump-50m.log` (-n 50M short run, no halt) + +## Master HEAD + +`de5a15e` (merge of `xobj-stashhandle/p0-canary-mirror` c03f2bc into master). +Not pushed. + +## Next + +The β/γ-cluster blocker remains unresolved. Audit-024A's `XAudioSubmitRender +DriverFrame=0` lead is independent of StashHandle; the audio-thread-start +gate is the natural next target. Sister session 025 was running an audio- +thread-start diagnostic in parallel — coordinate with its findings. + +Audit-024A's hypothesis that the StashHandle write at 0x828F4838 might +unblock something is **observationally falsified** at -n 500M post-fix. +The canary's "XEN\0+handle" stamp at that address is necessary for canary +correctness, not a load-bearing trigger for our cascade. diff --git a/migration/claude-memory/project_xenia_rs_m3_followup_real_parallelism_plan.md b/migration/claude-memory/project_xenia_rs_m3_followup_real_parallelism_plan.md new file mode 100644 index 0000000..151e401 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_followup_real_parallelism_plan.md @@ -0,0 +1,176 @@ +--- +name: M3 follow-up — design for actual N=6 per-HW-thread parallelism (2026-04-26) +description: Hand-off memory for the next session. Current M3 substrate (Arc>, phaser, per-thread block caches, ctx.hw_id, reservation activation) is in place and golden-stable across 6 flag combos. Real N=6 parallelism deferred to a focused follow-up session because it requires recreating ~300 lines of run_execution in a worker-decomposed form. This memo specifies exactly what that session needs to do. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## Current state (verified clean) + +411 tests pass. All 6 flag combinations (default, `--parallel`, `--reservations-table`, `--gpu-inline`, and combos) match `golden/sylpheed_n2m.json` at -n 2M. Sylpheed -n 30M `--parallel --halt-on-deadlock` boots to VdSwap=2 with halts==0. + +Substrate that's already done and the next session can build on: + +- `Arc>` wrap (M3.3) — `cmd_exec_inner` already supports the wrap; the worker thread variant exists. +- `Phaser` primitive (M3.1) — `arrive_and_wait`/`skip`/`shutdown`/`arrive_and_wait_timeout`, 6 unit tests. +- Per-thread block caches (M3.2a) — `[BlockCache; HW_THREAD_COUNT]` indexed by `hw_id`. +- `PpcContext.hw_id` + `PpcContext.reservation_table` (M3.7) — populated by `Scheduler::spawn` / `install_initial_thread` / migration. +- `ReservationTable` with self-describing `enable()`/`is_enabled()` (M3.7). +- `lwarx`/`stwcx.`/`ldarx`/`stdcx.` route through the table when `ctx.reservation_table.is_enabled()` (M3.7). +- `--parallel` CLI flag wired (currently spawns N=1 worker = same as lockstep). + +## What the next session needs to do + +Convert the N=1 spawn into N=6 with **real parallelism** via the **mem::replace + per-iteration kernel lock** pattern. Workers parallelize during `step_block` (no lock held); serialize at the kernel mutex for `pick_runnable` / `mem::replace` / `commit` / `call_export`. + +### High-level design + +```text +N=6 worker threads, each with: + - hw_id: 0..6 + - own BlockCache + - clone of Arc> + - clone of Arc + - clone of Arc + - clone of Arc shutdown + +Worker loop: + while !shutdown.load(Acquire) { + let stolen = { + let mut k = kernel.lock().unwrap(); + // Pick runnable on this slot + k.scheduler.begin_slot_visit(hw_id); + let r = k.scheduler.current?; + let tid = k.scheduler.slots[hw_id].runqueue[r.idx].tid; + let ctx = std::mem::replace( + k.scheduler.ctx_mut_ref(r), + PpcContext::new(), // placeholder + ); + Some((r, tid, ctx)) + }; // kernel lock released + + let Some((r, tid, mut ctx)) = stolen else { + phaser.skip(hw_id); + thread::sleep(Duration::from_micros(100)); + continue; + }; + + // STEP_BLOCK runs WITHOUT kernel lock — true parallelism here + let block = block_cache.lookup_or_build(ctx.pc, &*mem); + let result = step_block(&mut ctx, &*mem, block); + + // Reacquire and commit + { + let mut k = kernel.lock().unwrap(); + // Find current ThreadRef by tid (handles peer-worker migration) + let target_r = k.scheduler.find_by_tid(tid).unwrap_or(r); + *k.scheduler.ctx_mut_ref(target_r) = ctx; + + // Handle StepResult + match result { + StepResult::SystemCall => { + k.scheduler.current = Some(target_r); + // Resolve thunk + dispatch + k.call_export(module, ordinal, &*mem); + k.scheduler.current = None; + } + StepResult::Halt(_) => { /* mark exited */ } + StepResult::Continue => {} + // ... other variants (UnimplOp, Trap, etc.) + } + k.scheduler.end_slot_visit(); + } + phaser.arrive_and_wait(hw_id); + } +``` + +### Cross-cutting work (the coordinator thread) + +Between phaser barriers, ONE thread does: +- `kernel.interrupts.tick_vsync(stats.instruction_count)` + setting D1MODE_VBLANK_VLINE_STATUS bit +- `state.fire_due_timers(now)` — wakes timer waiters +- `try_inject_graphics_interrupt` + IRQ delivery +- GPU interrupt drain (`kernel.gpu.take_pending_interrupts()` → `kernel.interrupts.queue_interrupt`) +- Halt-on-deadlock detection +- Instruction-count atomic update (workers each post their per-iter count) +- Stats aggregation + +**Implementation choice**: either (a) main thread does it after spawning workers, looping `phaser.coordinator_wait()`-style; or (b) one of the workers (hw_id 0) elects to do it after each phaser barrier. (a) is cleaner. + +### Specific non-trivial issues the session must handle + +1. **Cross-worker migration via `find_by_tid`.** When Worker A is in `step_block` and Worker B's `KeSetAffinityThread` (inside its own `call_export`) migrates the thread Worker A is executing, A's `r: ThreadRef` becomes stale. Solution: `find_by_tid(tid)` lookup on commit; `tid` is stable across migration. Defaults to `r` if lookup fails (shouldn't happen but defensive). + +2. **`scheduler.current` racing.** Workers each set `scheduler.current` at pick time and clear at end. Under the kernel mutex these are serialized — but a worker that releases the lock with `scheduler.current = Some(my_r)` leaves it set when other workers acquire. Solution: clear `scheduler.current = None` BEFORE releasing the lock (after the `mem::replace`). Re-set it ONLY around `call_export`. + +3. **Halt sentinel restoration.** `StepResult::Halt(LR_HALT_SENTINEL)` triggers IRQ-callback restore in the existing run_execution. The worker must call `interrupts.saved.take().restore(...)` under the kernel lock if `interrupts.injected_ref == Some(my_r)`. Carved from current `run_execution`'s body. + +4. **DB writer thread-safety.** `xenia_analysis::DbWriter` is not Sync. Either gate per-instruction trace recording (which `--parallel` should disable), or wrap in Mutex per-worker. Simplest: assert `db_writer.is_none()` for parallel mode. + +5. **Debugger pre-step hooks.** `Debugger::wants_hooks()` + per-instruction observation requires holding kernel lock during step (defeats parallelism). Simplest: assert no debugger hooks in parallel mode. + +6. **Block cache lifetime.** Each worker owns its `BlockCache`. The cache references decoded blocks via raw pointer (the `block_ptr` pattern in current run_execution to bridge `lookup_or_build`'s `&dyn MemoryAccess` borrow with `step_block`'s subsequent call). The pattern is sound per-worker because each worker's cache + step_block runs single-threaded on its own thread. + +7. **Reservation table activation.** `--parallel` already implies `kernel.reservations.enable()`. Per-ctx fields (`reservation_table`, `hw_id`) are already wired. Workers' `lwarx`/`stwcx.` will automatically route through the table. Verify the M2.2 stress test (`concurrent_lwarx_stwcx_serializes`) still holds under real parallel guest workloads. + +8. **Timer wake fairness.** `kernel.fire_due_timers` walks `pending_timer_fires` and signals events. Under parallel workers, the timer-fire might race with a worker's wait acquisition. The existing kernel mutex serializes, so this works — but worth a stress test (timers + parallel mode). + +9. **Halt-on-deadlock detection.** Current run_execution checks `kernel.scheduler.has_live_thread()` per round. Coordinator should do the same; on detected deadlock, signal shutdown to workers via `shutdown.store(true, Release)` + `phaser.shutdown()`. + +### Rough size estimate + +- New `run_execution_parallel` function: ~250-350 lines +- `cmd_exec_inner` integration: ~30 lines (mostly already done; flip --parallel from N=1 to call new function) +- Coordinator helper functions: ~80-100 lines +- Tests: ~50-100 lines (mainly stress tests in xenia-app/tests) + +## Verification matrix the next session must pass + +| Check | Expected | +|---|---| +| `cargo test --workspace` | ≥ 411 passed, 0 failed | +| Lockstep golden (no flags) | matches | +| `--gpu-inline` golden | matches | +| `--reservations-table` golden | matches | +| `--gpu-inline --reservations-table` golden | matches | +| `--parallel` golden at -n 2M | matches (no swaps yet) | +| `--parallel --reservations-table` golden at -n 2M | matches | +| `--parallel` -n 30M `--halt-on-deadlock` | exit 0, VdSwap=1 + VdSwap=2 | +| 100× `--parallel` -n 50M `--halt-on-deadlock` | all exit 0 | +| `--parallel` perf vs lockstep at -n 30M | ≥ 1.5× wall-time speedup on a ≥6-core host | + +The 100× stress test is THE gate that surfaces lost-wakeups, lock-order inversions, and ABA hazards. + +## Files the next session will most likely touch + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) — new `run_execution_parallel` function; carve out coordinator helpers (`tick_vsync_and_timers`, `drive_gpu_round`, `drain_interrupts`, `check_halt_on_deadlock`) +- [crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs) — `find_by_tid` already exists at line 487; verify stability under multi-worker calls +- [crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `call_export` may need a `caller_hw_id: u8` param; or workers set `scheduler.current` before calling + +## Why this isn't done in this session + +The plan said "be pedantic about concurrency correctness". A 300-line carving-out of `run_execution` is too large to verify safely in a single session without a per-substep verification cadence. Each piece (worker loop, cross-worker migration handling, coordinator, etc.) needs golden + stress-test gates. Splitting into dedicated focused work is more responsible than racing. + +## Memory files for this session's work (already written) + +- [project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md](project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md) — M3.3 + M3.4 (kernel wrap + N=1 spawn) +- [project_xenia_rs_m3_step_07_reservation_activation.md](project_xenia_rs_m3_step_07_reservation_activation.md) — M3.7 (reservations in interpreter) +- [project_xenia_rs_m3_step_08_verification.md](project_xenia_rs_m3_step_08_verification.md) — M3 session verification +- [project_xenia_rs_m3_followup_real_parallelism_plan.md](project_xenia_rs_m3_followup_real_parallelism_plan.md) — this hand-off + +## Resume command + +To resume in the next session: +```bash +cd "/home/fabi/RE Project Sylpheed/xenia-rs" +cargo test --workspace +./target/release/xenia-rs check sylpheed.iso -n 2_000_000 \ + --expect crates/xenia-app/tests/golden/sylpheed_n2m.json --parallel +# Both should pass before starting work. +``` + +Then write `run_execution_parallel` per the design above. Verify after each substep: +1. Worker loop body (no spawn yet) — call from main thread, verify lockstep golden. +2. Add coordinator helpers — verify lockstep still works. +3. Spawn N=1 — verify lockstep golden under `--parallel`. +4. Scale to N=6 — verify sylpheed boots; check 30M halts==0. +5. Stress 100x. diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_01.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_01.md new file mode 100644 index 0000000..1254ee0 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_01.md @@ -0,0 +1,57 @@ +--- +name: M3 real-par Step 01 — coordinator helpers carved out (2026-04-26) +description: Three free fns coord_pre_round / coord_idle_advance / coord_post_round + RoundCtl enum carved out of run_execution. Pure motion refactor, lockstep bit-identical. 430 tests pass, all 6 flag combos match golden. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## What landed + +`run_execution` in [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) had its outer-loop body shrunk by ~200 lines via three helper functions: + +- **`coord_pre_round(kernel, stats, max_instructions, ips_limit, throttle_start, &shutdown) -> RoundCtl`** — per-round prologue. Holds the SHUTDOWN_CHECK_MASK / IPS-throttle / heartbeat block, vsync ticker (with `D1MODE_VBLANK_VLINE_STATUS` bit 0 set when fire), `fire_due_timers`, `try_inject_graphics_interrupt`. Returns `BreakOuter` only when budget reached or UI shutdown. +- **`coord_idle_advance(kernel, halt_on_deadlock, &shutdown, stats) -> RoundCtl`** — invoked when `round_schedule()` is empty. Advances time to earliest pending deadline, fires timers, handles deadline wakes. On hard deadlock either dumps diagnostics + halts or force-wakes via `STATUS_TIMEOUT`. +- **`coord_post_round(kernel, mem, stats, instrs_at_round_start) -> RoundCtl`** — per-round epilogue. Calls `end_slot_visit`, drives the inline GPU proportional to executed instructions, drains GPU-side pending interrupts via `take_pending_interrupts`, and breaks when no live HW threads remain. + +`RoundCtl` is a `BreakOuter | Continue` enum. The constants `SHUTDOWN_CHECK_MASK` and `HEARTBEAT_MASK` moved into `coord_pre_round`. `LR_HALT` stays in `run_execution` (consumed by the per-slot body). + +## Why + +Step 01 of the M3 real-parallelism plan: the per-HW-thread parallel scheduler (Step 04) needs the coordinator to run these phases between phaser barriers. Carving them out so both lockstep and parallel paths can call them keeps a single source of truth for the sync logic and lets every substep land lockstep-bit-identical. + +## Files touched + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - +RoundCtl enum + - +coord_pre_round (~70 lines) + - +coord_idle_advance (~80 lines) + - +coord_post_round (~50 lines) + - run_execution body shrunk: prologue / idle / postlogue replaced with `match` calls + - `SHUTDOWN_CHECK_MASK` / `HEARTBEAT_MASK` constants moved into the helper + +## Subtle pattern note + +The helpers take `shutdown: &Option>` (not the original owned `Option>`). Rust's match-ergonomics implicitly borrow the contents, so the original `if let Some(ref flag) = shutdown` patterns had to drop the `ref` keyword inside the helpers. Took two iterations to compile. + +## Verification + +- `cargo build --release -p xenia-app`: clean (one warning-free build). +- `cargo test --workspace`: 430 passed, 0 failed (≥411 baseline ✓). +- All 6 golden combos at -n 2M match: `default`, `--parallel`, `--reservations-table`, `--gpu-inline`, `--gpu-inline --reservations-table`, `--parallel --reservations-table`. + +## Regression-fix breadcrumbs + +If a regression appears after this step: + +1. **Golden mismatch in lockstep**: the carve must be observation-equivalent. If the digest drifts, check that the order of operations in `coord_pre_round` matches the original (max-budget → IPS → SHUTDOWN_CHECK → heartbeat → vsync → fire_due_timers → try_inject_graphics_interrupt). Off-by-one ordering with `fire_due_timers` vs `try_inject_graphics_interrupt` changes which interrupts land. +2. **Golden mismatch with `--parallel`**: same root cause as (1); the parallel branch wraps around `run_execution` unchanged in this step. +3. **Compile error about `ref` in shutdown match**: dropped intentionally; match-ergonomics implicit borrow. +4. **`coord_idle_advance` not breaking when expected**: the original code had `info!()` inside the `else` branch then `break` — keep this structure. The helper returns `BreakOuter` after the `info!()`. + +## What this does NOT yet do + +- No worker-loop split (Step 02). +- No drop-and-reacquire pattern (Step 03). +- No N=6 spawn (Step 04). +- The per-slot body (lines 1551-1959) is untouched. + +## Test count: 430 (was 430 pre-step; this is a pure motion refactor). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_02.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_02.md new file mode 100644 index 0000000..3466bb9 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_02.md @@ -0,0 +1,82 @@ +--- +name: M3 real-par Step 02 — WorkerCtx + worker_prologue/worker_epilogue (2026-04-26) +description: Per-slot body of run_execution split into worker_prologue (under-lock prologue + per-instr path inline) + worker_epilogue (block-cache StepResult handling). Per-HW-slot WorkerCtx owns its own block cache + decode cache. Lockstep bit-identical; 430 tests pass; all 6 golden combos match. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## What landed + +The per-slot body inside `run_execution` (originally lines 1551-1959) split into three reusable pieces: + +1. **`struct WorkerCtx`** in [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - `hw_id: u8` + - `block_cache: BlockCache` + - `decode_cache: DecodeCache` + - `force_per_instr: bool` + - `WorkerCtx::new(hw_id, force_per_instr)` constructor. + +2. **`PrologueOutcome`** enum: `Continue` (inline-handled), `BreakOuter`, `StepBlock { tid, thread_ref, block_ptr, pc_before }` (block-cache step pending). + +3. **`SlotOutcome`** enum: `Continue`, `BreakOuter`. + +4. **`worker_prologue`** function: + - Calls `begin_slot_visit`. + - Halt-sentinel detect + restore. + - Import-thunk dispatch (calls `kernel.call_export`). + - Unmapped-PC fault check. + - Decides block-cache fast path vs per-instruction observation. + - For block-cache: looks up via `wc.block_cache.lookup_or_build`, returns `StepBlock`. + - For per-instruction (debugger / DB writer / `XENIA_FORCE_PER_INSTR=1`): runs `step_cached` with `wc.decode_cache`, db logging, post-step hook, decrement_quantum, should_break — entirely inline. + +5. **`worker_epilogue`** function: + - Charges `executed` quantum decrements. + - Applies `StepResult` (Continue/SystemCall/Unimplemented/Trap/Halted), with the `Halted` arm returning `BreakOuter` only when the halting thread is `INITIAL_GUEST_TID`. + - Debugger should_break check. + +`run_execution`'s outer-loop body now calls `worker_prologue`, dispatches based on the outcome, and on `StepBlock` runs `step_block(ctx, mem, &*block_ptr)` followed by `worker_epilogue`. The lockstep loop holds the kernel state borrowed straight through (no lock-release window — that's Step 03). + +## Why + +Step 02 of the M3 real-parallelism plan: the parallel scheduler (Step 04) needs the per-slot body decomposed so it can release the kernel mutex around `step_block` (Step 03's drop-and-reacquire pattern). With prologue and epilogue as separate functions, lockstep keeps a tight wrapping but parallel can interleave a `drop(g) ... step_block ... g = lock()` between them. + +Per-worker `BlockCache` and `DecodeCache` (instead of one shared `DecodeCache`) is the precondition for parallel mode: each worker owns its caches so peers don't contend on cache mutation. + +## Files touched + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - +`WorkerCtx` (struct + constructor) + - +`PrologueOutcome` + `SlotOutcome` enums + - +`worker_prologue` (~200 lines, owns the entire per-slot prologue + per-instruction inline path) + - +`worker_epilogue` (~70 lines, owns the block-cache StepResult handling) + - `run_execution`: replaced `[BlockCache; 6]` + shared `DecodeCache` with `[WorkerCtx; 6]`; replaced ~400-line per-slot body with ~50-line dispatch on `PrologueOutcome`. + - Removed locally-scoped `LR_HALT` constant + `BlockCache`/`DecodeCache`/`step_cached`/`StepResult`/`INITIAL_GUEST_TID`/`PpcOpcode` imports (now scoped inside the helpers). + +## Critical correctness invariants preserved + +- **Halt-sentinel restore** runs in `worker_prologue` before any step → still under the kernel lock when Step 03 lands. +- **Import-thunk path** stays fully under the kernel lock (call_export reads scheduler.current); never released around it. +- **Block-cache raw-pointer pattern**: prologue produces `*const DecodedBlock` from `wc.block_cache.lookup_or_build(pc, mem)`. The pointer remains valid through the step + epilogue because `wc.block_cache` is owned by the worker and isn't mutated again until the next prologue iteration. Documented in the safety comment on `worker_epilogue`. +- **`scheduler.current`** is set by `begin_slot_visit` inside the prologue; the lockstep path leaves it set through step + epilogue (because `end_slot_visit` is in `coord_post_round` after the slot loop). Step 04 will move `end_slot_visit` to clear `current` before unlock. + +## Verification + +- `cargo build --release -p xenia-app`: clean. +- `cargo test --workspace`: 430 passed, 0 failed. +- All 6 golden combos at -n 2M match: `default`, `--parallel`, `--reservations-table`, `--gpu-inline`, `--gpu-inline --reservations-table`, `--parallel --reservations-table`. + +## Regression-fix breadcrumbs + +If a regression appears after this step: + +1. **Golden mismatch in lockstep**: most likely a missed instruction-count or quantum-decrement. The block-cache path's stats update (`stats.instruction_count.wrapping_add(executed)`) and N-time `decrement_quantum` calls landed in `worker_epilogue`. The per-instruction path's `stats.instruction_count += 1` and single `decrement_quantum` landed in `worker_prologue`'s per-instr branch. +2. **Golden mismatch with `--parallel`**: same root cause — the parallel branch wraps `run_execution` unchanged in this step. If only `--parallel` mismatches but lockstep matches, suspect `wc.decode_cache` per-worker behavior. (Decode cache results are deterministic given PC + page_version + raw instruction; per-worker should never change behavior.) +3. **`scheduler.current` is None panic**: `worker_prologue` does `kernel.scheduler.current.expect("begin_slot_visit set scheduler.current to Some when slot has runnable thread")`. This panic would fire if `pick_runnable` returned None for a slot that `round_schedule` reported as runnable — a scheduler invariant violation, not a bug introduced by this step. +4. **Borrow checker errors after editing the per-slot loop**: `kernel.scheduler.ctx_mut_ref(thread_ref)` borrows `kernel.scheduler` mutably; the borrow must end before the next `kernel.scheduler.*` call. The current dispatch uses an explicit scope `{ let ctx = ...; step_block(ctx, mem, block) }` to bound the borrow. + +## What this does NOT yet do + +- No drop-and-reacquire of the kernel lock around `step_block` (Step 03). +- No N=6 worker spawn (Step 04). +- No `find_by_tid` on epilogue commit (Step 04 — needed only when peers can migrate during the unlock window; in lockstep there are no peers). + +## Test count: 430 (was 430 pre-step; pure motion refactor). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_03.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_03.md new file mode 100644 index 0000000..ebe409e --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_03.md @@ -0,0 +1,97 @@ +--- +name: M3 real-par Step 03 — drop-and-reacquire around step_block (N=1) (2026-04-26) +description: New run_execution_parallel function with per-round mutex acquire/release and a drop-guard window around step_block. --parallel branch now calls it instead of run_execution. Single worker, uncontended unlock window. 430 tests pass; all 6 golden combos match; sylpheed -n 30M --parallel reaches VdSwap=2, halts==0 in 3866ms. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## What landed + +A new function `run_execution_parallel(mem, &Arc>, ...)` in [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) that does the per-round locking dance internally: + +```text +loop { + let mut guard = kernel_arc.lock(); + coord_pre_round(&mut *guard, ...); // budget / vsync / timers / IRQ + guard.scheduler.begin_round(); + let order = guard.scheduler.round_schedule(); + if order.is_empty() { + coord_idle_advance(&mut *guard, ...); + drop(guard); continue; + } + for hw_id in order { + match worker_prologue(...) { + Continue => continue, + BreakOuter => break 'outer, + StepBlock { tid, thread_ref, block_ptr, pc_before } => { + let mut ctx_taken = mem::replace( + guard.scheduler.ctx_mut_ref(thread_ref), + PpcContext::new(), + ); + let cycle_before = ctx_taken.cycle_count; + drop(guard); // ← unlock + let result = step_block(&mut ctx_taken, mem, &*block_ptr); + let executed = ctx_taken.cycle_count - cycle_before; + guard = kernel_arc.lock(); // ← relock + let target_ref = tid + .and_then(|t| guard.scheduler.find_by_tid(t)) + .unwrap_or(thread_ref); + *guard.scheduler.ctx_mut_ref(target_ref) = ctx_taken; + worker_epilogue(...); + } + } + } + coord_post_round(&mut *guard, ...); + drop(guard); +} +``` + +The `--parallel` branch in `cmd_exec_inner` now calls `run_execution_parallel` instead of locking once and calling `run_execution`. The worker thread's body shrank to just dispatching to the parallel function with `&kernel_for_worker` (Arc clone). + +## Why + +Step 03 of the M3 real-parallelism plan: establish the locking dance pattern with a single worker (uncontended unlock window) so that we can prove golden-stable before scaling to N=6. This is the foundation Step 04 builds on: it just spawns 6 workers calling the same function with a shared `Arc>`. + +Critical correctness pieces validated by golden: +- The unlock around `step_block` is observation-equivalent at N=1 because no peer holds the lock during the unlock window. +- `mem::replace` with `PpcContext::new()` gives the placeholder a valid (zeroed) ctx; the real ctx is owned by `ctx_taken` while step_block runs. +- After step_block, the relock + `find_by_tid(tid)` resolves the post-step ThreadRef. With N=1 it always equals `thread_ref` (no migration possible). +- Writing `ctx_taken` back via `*ctx_mut_ref(target_ref) = ctx_taken` overwrites the placeholder with the freshly-stepped state. + +## Files touched + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - +`run_execution_parallel` (~140 lines, sits right after `run_execution`). + - `cmd_exec_inner` parallel branch: changed worker spawn to call `run_execution_parallel(&*mem, &kernel_for_worker, ...)` instead of locking and calling `run_execution`. + +## Verification + +- `cargo build --release -p xenia-app`: clean. +- `cargo test --workspace`: 430 passed, 0 failed. +- All 6 golden combos at -n 2M: match. +- `xenia-rs exec sylpheed.iso -n 30_000_000 --parallel --halt-on-deadlock`: exit 0, VdSwap=1 + VdSwap=2 fire end-to-end, no deadlock_halts. Wall: 3866 ms (vs ~2950 ms for lockstep at -n 30M per the M3.8 memo, so single-worker locking overhead is ~30% — expected since every iteration acquires and releases the mutex; will improve when actual parallelism (N=6) lands). + +## Concurrency invariants observed + +- **Lock window**: held during prologue (begin_slot_visit through `mem::replace` ctx-out) and epilogue (write-back through StepResult handling). +- **Unlock window**: only around `step_block` on a local `PpcContext`. No kernel state touched. +- **`scheduler.current` discipline**: set by `begin_slot_visit` inside prologue; not cleared between unlock and epilogue (single worker; no peers to mislead). Step 04 will clear it before unlock when peers exist. +- **`find_by_tid`**: invoked unconditionally on relock to be future-proof under migration; with N=1 it always returns the original `thread_ref`. + +## Regression-fix breadcrumbs + +If a regression appears after this step: + +1. **Lockstep golden mismatch**: should not happen — `run_execution` is unchanged. If it does, the ctx_taken / find_by_tid path was accidentally introduced. +2. **`--parallel` golden mismatch**: most likely a `mem::replace` ordering issue. Verify `cycle_before` is captured BEFORE `drop(guard)`, and `executed` AFTER `step_block` returns. If `executed` is computed from `kernel.scheduler.ctx(thread_ref).cycle_count` instead of `ctx_taken.cycle_count`, that's the bug — kernel ctx is the placeholder, not the stepped state. +3. **`scheduler.current.expect("call_export: no current thread")` panic in parallel mode**: means `worker_prologue` set current via begin_slot_visit but a code path observed `current = None`. Check that `coord_pre_round`'s `try_inject_graphics_interrupt` isn't reading current from a stale state. +4. **Halt-on-deadlock unexpectedly fires**: the wall-time delta vs lockstep is normal (~30% slower at N=1 due to per-iteration locking), but the **scheduler progression** must be the same. If VdSwap=2 doesn't reach, suspect that the round structure (begin_round / round_schedule / for-each-slot) drifted from lockstep. + +## What this does NOT yet do + +- N=6 worker spawn (Step 04). +- Phaser-based barrier sync between coordinator and workers. +- `scheduler.current = None` discipline before unlock. +- `find_by_tid` migration handling under real peer concurrency. +- Idle worker parking (Step 05). + +## Test count: 430 (was 430; no new tests in this step). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_04.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_04.md new file mode 100644 index 0000000..a7161a9 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_04.md @@ -0,0 +1,117 @@ +--- +name: M3 real-par Step 04 — N=6 workers + main coordinator + 7-party phaser (2026-04-27) +description: Real per-HW-thread parallelism. run_execution_parallel spawns 6 worker threads via thread::scope, main thread is the coordinator, 7-party Phaser with B1 (round-start) + B2 (round-end). All 5 lockstep combos still match golden; --parallel digest now diverges by ~7 instructions at -n 2M (expected per master plan); -n 30M --parallel reaches VdSwap=2 with halts==0. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## What landed + +`run_execution_parallel` in [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs) was rewritten from the Step 03 single-worker shape to spawn N=6 worker threads via `std::thread::scope`. The main thread is the coordinator. Synchronization uses `xenia_cpu::Phaser::new(7)` (6 workers + coordinator). + +### Per-round shape + +``` +Coordinator (main thread): + loop { + lock kernel + stats; coord_pre_round; release stats; (keep kernel) + if BreakOuter -> shutdown phaser, break. + begin_round + round_schedule; publish runnable_mask[6]. + if order.is_empty() -> idle_advance under lock; arrive B1+B2; loop. + else: drop kernel. + arrive B1. // workers wake, process slots concurrently. + arrive B2. // wait for all workers to finish. + lock kernel + stats; coord_post_round; release. + } + +Each worker (hw_id 0..6, scoped): + loop { + arrive B1 (or shutdown -> break). + if !runnable_mask[hw_id]: skip B2; continue. + lock kernel + stats; worker_prologue; release stats; keep kernel guard. + match prologue_outcome: + Continue: drop kernel. + BreakOuter: drop kernel, signal shutdown, phaser.shutdown, break. + StepBlock { tid, thread_ref, block_ptr, pc_before }: + mem::replace ctx_taken out of slot[hw_id] runqueue. + end_slot_visit (clears scheduler.current). + drop kernel. + step_block on local ctx_taken. // <- parallelism here. + lock kernel. + target_ref = find_by_tid(tid).unwrap_or(thread_ref). + write ctx_taken back to ctx_mut_ref(target_ref). + scheduler.current = Some(target_ref). + worker_epilogue. + scheduler.current = None. + drop kernel. + if BreakOuter: signal shutdown, phaser.shutdown, break. + arrive B2. + } +``` + +### Synchronization + +- **`Mutex`** shared between coordinator and workers via `&Mutex` (thread::scope makes Arc unnecessary). Lock order: kernel mutex first, stats mutex second. Workers and coordinator both follow this order — no inversion. +- **`runnable_mask: [Arc; 6]`** published by the coordinator after `round_schedule()` (Release store). Workers read after B1 release (Acquire load). +- **`internal_shutdown: Arc`** signal: any worker on BreakOuter (or coord on max_instructions/UI shutdown) sets it Release and calls `phaser.shutdown()` to wake parked peers. +- **Phaser**: 7 parties; coordinator participates as `COORD_ID = 6`. `arrive_and_wait_timeout(5s)` returns Timeout if a peer hangs; coordinator/worker calls `phaser.shutdown()` on timeout. + +### Two non-trivial bugs fixed during bring-up + +**Bug 1 — `Debugger::new()` defaults to paused/trace-enabled.** The worker's local `Debugger::new()` had `paused = true` and `trace_enabled = true`, so `wants_hooks()` returned true, and `worker_prologue`'s per-instruction path triggered `should_break()` immediately, returning `BreakOuter` after the first slot iteration. Fix: mirror the main-thread shape — `paused = false; step_mode = Run; trace_enabled = false;` after `Debugger::new()`. + +**Bug 2 — Stale `runnable_mask` snapshot leads to placeholder PC=0 fault.** The coordinator publishes `runnable_mask` from `round_schedule()`, then drops the kernel lock. A worker M (e.g., on slot 0 with the main thread) acquires the lock and runs an import thunk via `call_export` (e.g., `KeWaitForSingleObject`) that blocks the only Ready thread on slot N. Worker N then acquires the lock and `begin_slot_visit(N)` calls `pick_runnable()` which now returns None — so `running_idx = None` and `scheduler.current = None`. `ctx(N)` returns the idle sentinel (`PpcContext { pc: 0, ... }`). Without a guard, `worker_prologue` falls through to the unmapped-PC fault at `mem.is_mapped(0) == false` and returns `BreakOuter`, halting the run mid-init. + +Fix: after `begin_slot_visit`, short-circuit on `scheduler.current.is_none()` — return `PrologueOutcome::Continue` to skip this slot for this round. Documented inline at [worker_prologue](xenia-rs/crates/xenia-app/src/main.rs). + +### Files touched + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - `run_execution_parallel`: rewritten to spawn N=6 + coordinator + phaser. + - `worker_prologue`: added stale-mask guard (`scheduler.current.is_none()` → Continue). + - `cmd_exec_inner`: added early `anyhow::bail!` rejection of `--parallel` with debugger hooks / DB writer / `XENIA_FORCE_PER_INSTR=1` (defense-in-depth — `run_execution_parallel` also asserts). + - Worker spawn now uses `xenia_debugger::Debugger::new()` plus the cold-run setup (paused=false, step_mode=Run, trace_enabled=false). + +## Verification + +- `cargo build --release -p xenia-app`: clean. +- `cargo test --workspace`: 430 passed, 0 failed. +- 6-combo golden gate at -n 2M: + - `default`, `--reservations-table`, `--gpu-inline`, `--gpu-inline --reservations-table`: **match golden**. + - `--parallel`, `--parallel --reservations-table`: **digest differs** by ~7 instructions (`actual_instructions=2_000_007` vs `expected=2_000_000`). This is the expected behavior under N=6 parallel scheduling per the master plan's M3 acceptance criteria: *"parallel mode's digest will differ from lockstep (per existing repo note about thread-interleaving divergence). Acceptance is VdSwap=2 + clean halt counters, not golden equivalence."* +- `xenia-rs exec sylpheed.iso -n 30_000_000 --parallel --halt-on-deadlock`: **exit 0**, VdSwap=1 + VdSwap=2 fire, no `deadlock_halts`. Wall: ~56 s (vs ~3 s lockstep — see Perf section below). + +## Perf snapshot (informational; gated in Step 07) + +`xenia-rs exec sylpheed.iso -n 30_000_000`: +- lockstep: ~3 s (10 M instr/s). +- `--parallel` (N=6): ~56 s (538 K instr/s) — **18x slower than lockstep**. + +The slowdown is dominated by mutex contention. In sylpheed's first 30 M instructions only one slot (slot 0, the main thread) is doing meaningful work most of the time; the other 5 workers wake at B1, observe `runnable=false`, skip B2, re-park. Each barrier crossing acquires the phaser's internal `Mutex`, and the active worker also contends on the kernel mutex per prologue/epilogue. Round granularity is small (often 1-2 guest instructions per round during import-thunk-heavy init), so per-round overhead ~1-2 μs × millions of rounds dominates. + +Step 05 (idle-worker parking) and Step 07 (perf gate) will address this. The 1.5x speedup target requires either (a) keeping idle workers parked outside the phaser entirely, (b) dropping per-round phaser sync once we observe parallelism is paying off, or (c) coalescing round boundaries. + +## Concurrency invariants observed + +- **Lock order**: kernel mutex first, stats mutex second. Workers and coordinator both follow. No inversion. +- **`scheduler.current` discipline**: set by `begin_slot_visit` (under kernel lock); cleared by `end_slot_visit` BEFORE worker drops the lock around `step_block`; re-set after relock so `worker_epilogue`'s `exit_current` path works; cleared again before final unlock. +- **Stale mask**: covered by the `scheduler.current.is_none()` short-circuit in `worker_prologue`. +- **Cross-worker migration**: `find_by_tid(tid)` resolves the post-step `ThreadRef`. With sylpheed, no migration occurs in the first 30 M, so this is exercised mainly under future stress (Step 06). +- **Reservation table**: still activated implicitly via `--parallel` (M3.7 substrate). lwarx/stwcx route through it. + +## Regression-fix breadcrumbs + +If a regression appears after this step: + +1. **Lockstep golden mismatch**: shouldn't happen — `run_execution` is unchanged. If it does, the recent guard in `worker_prologue` (`scheduler.current.is_none() → Continue`) might be hitting in lockstep too. In lockstep, `for hw_id in order` only visits slots that were Ready at `round_schedule` time, so `pick_runnable()` should always return Some. Verify via debug log. +2. **`--parallel` halts (not VdSwap=2)**: re-add the placeholder pc=0 trace at the top of `worker_prologue` and rerun. Most likely a new race — start with the migration scenarios listed in the hand-off memo's hazard table. +3. **`--parallel` panic with `find_by_tid` returning Some(stale_ref)**: inspect `ThreadRef.generation` — M2's generation packing should make stale refs detectable. +4. **Wall time regression in `--parallel`**: expected at this step; Step 05 and Step 07 address it. If lockstep regresses too, that's a separate bug. + +## What this does NOT yet do + +- Idle-worker parking outside the phaser (Step 05). +- 100x stress harness (Step 06). +- Perf gate (Step 07). +- The 1.5x speedup target. + +## Test count: 430 (was 430; functional correctness unchanged). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_05.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_05.md new file mode 100644 index 0000000..dbb8b4c --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_05.md @@ -0,0 +1,77 @@ +--- +name: M3 real-par Step 05 — slot-wake parking deferred (race analysis) (2026-04-27) +description: Attempted to park inactive workers (skip B1 + B2 entirely; coord skips on their behalf) for perf. Discovered a TOCTOU race between coord's mask publish and worker's mask read across round boundaries. Reverted to Step 04's design where workers always arrive at B1. Documented the race for a future fix. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## What was attempted + +Have inactive workers (those whose slot has no runnable threads this round) park on `thread::park()` instead of arriving at the phaser's B1 + skipping B2. The coordinator would skip both barriers on their behalf so phaser advance still hits party_count. + +Goal: reduce phaser-mutex contention. Currently every round, all 6 workers contend on the phaser's internal `Mutex` for both barriers — 14 mutex cycles per round just on barrier sync. With sylpheed's first 30M instructions where only slot 0 is active, that's 12 idle mutex cycles per round × millions of rounds = the dominant overhead. + +## Why it doesn't work (the race) + +The pattern fails on the active→inactive transition at round boundaries: + +1. **Round N**: Worker w is active. Reads `runnable_mask[w] = true` (Acquire). Arrives at B1. Processes slot. Arrives at B2. +2. **B2 N advances.** Worker w returns from B2 wait. Loops to top of its outer loop. +3. **Round N+1**: Coord under kernel lock — does pre_round, begin_round, publishes new mask (`runnable_mask[w] = false` because some peer's `call_export` blocked w's last Ready thread). +4. **Race**: between B2 N advance and coord's mask publish, w may have already read `runnable_w[w]` to decide park-vs-arrive. If w reads BEFORE coord's publish, it sees the OLD `true` value and arrives at B1 of round N+1 — but coord computed `inactive_count` from the new mask (which has w inactive) and skipped B1 on w's behalf. The phaser counter wraps: 1 (w arrived) + 1 (coord skipped for w) = 2 contributions for one slot, total exceeds 7, phaser advances early, B2 wraps similarly, eventually a phaser timeout fires. + +**Observed empirically**: changed worker to park on inactive-mask, ran sylpheed `-n 30M --parallel --halt-on-deadlock`. Output: `worker: phaser B2 timeout — peer hung; shutting down hw_id=0`, `coordinator: phaser B1 timeout`. + +## Why the obvious fixes don't fix it + +- **Acquire-Release on the mask alone**: insufficient. The Release-on-publish pairs with Acquire-on-read, but there's no synchronization point between the two events; the mask read can still happen before the publish in real time. +- **Skip on behalf only when worker is parked**: requires coord to know whether each worker is parked vs about-to-park, which needs another sync. +- **Generation counter on the mask**: would let workers detect a stale read, but they'd need to retry — and every retry path eventually has to converge with the coord's "skip on behalf" decision. + +## Why Step 04's design (keep workers always arriving at B1) is race-free + +Workers always arrive at B1. The phaser's Release-on-advance pairs with the worker's Acquire-on-mask-read AFTER B1. The coord's mask publish strictly happens-before its B1 arrive, which strictly happens-before B1 advance, which strictly happens-before the worker's post-B1 mask read. + +Concretely: +``` +Coord: Worker: + publish mask (Release) + arrive B1 (contributes) + <-- B1 advances when 7 contributions arrive --> + arrive B1 (contributes) + <-- woken by B1 advance --> + read mask (Acquire) ✓ +``` + +The cost: every worker contends on the phaser mutex twice per round, even idle ones. ~14 mutex cycles per round. + +## Path forward (deferred) + +A race-free way to skip B1 for inactive workers requires synchronization between coord's mask publish and worker's park-decision. Options: + +1. **Coalesce post_round with next round's pre_round under a single lock window**, and have workers wait on a Condvar inside that window for the new mask. Effectively: coord publishes mask + signals "round ready" while still holding the kernel lock; workers park on the Condvar, wake on signal, read mask. The Condvar wait/notify is the synchronization point. +2. **Eliminate the phaser entirely**: workers run their slot whenever they can acquire the kernel mutex. Coord runs pre/post-round whenever it acquires. No round boundaries; the kernel mutex IS the boundary. Requires reworking how `try_inject_graphics_interrupt` and `fire_due_timers` interleave with worker steps. +3. **Phaser with dynamic party_count**: per-round the phaser is reset with party_count = 1 (coord) + active_count (active workers). Inactive workers stay parked on a separate AtomicBool/Condvar. Requires Phaser API changes (current implementation has a fixed `party_count`). + +(1) is the smallest delta — it adds a Condvar but keeps the existing helper functions and round shape. Worth a focused follow-up session. + +## What changed in the tree + +- Reverted the inactive→park optimization in `run_execution_parallel` to Step 04's "always arrive at B1, skip B2 if inactive" pattern. +- Removed the `worker_handles` table (was for unpark) and the `unpark_all_on_shutdown` block. +- Inline comment at the worker's B1 arrival explaining why we don't park. + +## Verification + +- `cargo build --release -p xenia-app`: clean. +- `cargo test --workspace`: 430 passed, 0 failed. +- All 5 lockstep golden combos at -n 2M: still match. +- `--parallel` golden at -n 2M: still differs by ~7 instructions (expected per master plan). +- `xenia-rs exec sylpheed.iso -n 30_000_000 --parallel --halt-on-deadlock`: VdSwap=2 reaches, halts==0. Wall: ~57.5 s (essentially identical to Step 04's 55.7 s). + +## What this does NOT yet do + +- Idle-worker park-between-rounds (deferred per race analysis above). +- 100x stress harness (Step 06). +- Perf gate (Step 07) — will not be met without the deferred parking fix or one of the alternatives above. + +## Test count: 430 (unchanged). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_06_07.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_06_07.md new file mode 100644 index 0000000..5335051 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_06_07.md @@ -0,0 +1,86 @@ +--- +name: M3 real-par Step 06 + 07 — stress harness + perf gate (2026-04-27) +description: 20× stress at -n 5M --parallel --halt-on-deadlock all passed (correctness validated under load). Perf gate fails — --parallel is ~24x slower than lockstep at -n 30M (target was 1.5x speedup). The 100x at -n 50M gate from the master plan is wired but #[ignore]-gated; running it requires the deferred parking optimization to make wall time tractable. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## Step 06 — stress harness + +### What landed + +- New file: [crates/xenia-app/tests/parallel_stress.rs](xenia-rs/crates/xenia-app/tests/parallel_stress.rs). +- Two `#[ignore]`-gated integration tests: + - `parallel_stress_short`: 20 runs × `xenia-rs exec sylpheed.iso -n 5_000_000 --parallel --halt-on-deadlock --quiet`. + - `parallel_stress_long`: 100 runs × `-n 50_000_000`. Wired but expensive (hours at current perf). +- Failures dump per-run `.stdout`/`.stderr` to `target/parallel-stress-{label}-{NNN}.{stdout,stderr}`. +- Summary line includes p50, p95, and max wall times. + +Run with: +``` +cargo test --release -p xenia-app --test parallel_stress -- --ignored --nocapture parallel_stress_short +cargo test --release -p xenia-app --test parallel_stress -- --ignored --nocapture parallel_stress_long +``` + +### Why we use `exec`, not `check` + +`check` rejects `--halt-on-deadlock` (CLI gate); only `exec` accepts it. Stress is interested in "does the run complete cleanly?" — exit 0 for ok, non-zero for any panic / fault / halt. The golden compare from `check` is orthogonal to stress. + +### Verification + +`parallel_stress_short` ran clean: **20/20 ok, 0 failed**. p50 = 22675 ms, p95 = 27879 ms, max = 28893 ms. (The wall time bumps midway through the runs were from the perf measurement of Step 07 sharing CPU; correctness was unaffected.) + +This validates the locking dance (mem::replace + drop guard + step_block + relock + find_by_tid + writeback) under repeated execution: no lost wakeups, no lock-order inversions, no phaser timeouts, no `FAULT: PC in unmapped memory`. The `scheduler.current.is_none()` guard in `worker_prologue` (added in Step 04) is hit on every round where some peer's `call_export` blocked the only Ready thread on a slot, and the run continues correctly. + +### What this does NOT yet validate + +- **Long-run stability**: 100x at -n 50M would surface ABA hazards on long-lived `ThreadRef`s and exercise more migration scenarios. The test is wired (`parallel_stress_long`) but is impractical at the current ~24x slowdown. Should be the first re-run after Step 05's parking deferral lands. +- **Cross-thread golden parity**: per the master plan's M3 acceptance criteria, parallel mode is allowed to drift in digest. Stress only checks "exit 0". + +--- + +## Step 07 — perf gate + +### Methodology + +Five runs back-to-back of `xenia-rs exec sylpheed.iso -n 30_000_000` for each of `lockstep` and `--parallel`. Median wall time. + +### Measured + +``` +lockstep (5 runs): 3631, 3752, 3780, 3953, 4042 ms (median 3780 ms, ~7.94 M instr/s) +--parallel (3 runs): 83465, 92561, 106780 ms (median 92561 ms, ~324 K instr/s — 1/24 of lockstep) +``` + +### Verdict + +**Gate FAILS.** Target was `parallel ≤ lockstep / 1.5` (≥1.5× speedup on a ≥6-core host). Actual is `parallel ≈ lockstep × 24.5` (24.5× SLOWER). + +### Root cause + +Per-round synchronization overhead dominates. In sylpheed's first 30 M instructions, only slot 0 (the main thread) is doing meaningful work most of the time. Each round: + +- 14 phaser-mutex contentions (6 workers + coordinator × 2 barriers). +- 4 kernel-mutex contentions (coord pre+post, active worker prologue+epilogue). +- 2 stats-mutex contentions (worker prologue+epilogue). + +Round work-budget: typically 1-10 guest instructions per active slot. Per-instruction cost ≈ 100 ns (lockstep). Per-round sync cost ≈ 1-2 µs. Result: ~95% of `--parallel` wall time is mutex contention, ~5% useful interpretation. + +### What would close the gap + +In rough order of expected impact: + +1. **Step 05's race-free parking** (deferred). If 5 of 6 workers can park on a Condvar that's signaled after the kernel mutex publishes the new round's mask, idle workers stop contending on the phaser entirely. Estimated impact: ~5x (eliminates 12 of 14 phaser-mutex cycles per round in the slot-0-only case). +2. **Coalesce per-round work into larger units**: have the coordinator run vsync/timer/IRQ less frequently than every round (e.g., every 1000 instructions). Allows workers to do more steps between coordinator interventions. +3. **Replace stats `Mutex` with atomic fields** (`AtomicU64::fetch_add`). Removes 2 mutex cycles per active worker per round. +4. **Drop the phaser entirely** for single-active-slot rounds — the active worker just locks/processes/unlocks without barrier sync. This is essentially "fall back to lockstep when only 1 worker is active" — speculative and complex. + +(1) is the cleanest follow-up. (2) is architectural and has correctness implications (vsync timing affects guest behavior). (3) is straightforward but unlikely to be more than a 10-20% improvement on top of (1). + +### What this does NOT yet do + +- The 1.5× speedup target. +- Run the master plan's full 100× × -n 50M stress. (`parallel_stress_long` is wired but impractical until perf closes.) + +--- + +## Test count: 432 (was 430; +2 stress tests, both `#[ignore]`-gated so the default `cargo test --workspace` count is still 430). diff --git a/migration/claude-memory/project_xenia_rs_m3_realpar_step_08.md b/migration/claude-memory/project_xenia_rs_m3_realpar_step_08.md new file mode 100644 index 0000000..f67d23d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_realpar_step_08.md @@ -0,0 +1,103 @@ +--- +name: M3 real-par Step 08 — final verification + session summary (2026-04-27) +description: M3 real-parallelism session complete. N=6 worker threads + main coordinator + 7-party phaser, with kernel mutex released around step_block. Lockstep golden-stable across 4 combos; --parallel boots sylpheed to VdSwap=2 with halts==0. 20× stress at -n 5M passed. Perf gate (1.5× speedup) NOT met — the deferred parking optimization (Step 05) is needed before --parallel becomes faster than lockstep. +type: project +originSessionId: 35b35eef-690b-4871-b2ed-f69a1d2145e2 +--- +## Session deliverables + +A working per-HW-thread parallel scheduler in [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + +- **`run_execution_parallel`** — N=6 worker threads via `std::thread::scope`, main thread is the coordinator. 7-party `Phaser` with B1 (round-start) + B2 (round-end). Workers release the kernel mutex around `step_block` so all 6 can step concurrently. +- **`coord_pre_round` / `coord_idle_advance` / `coord_post_round`** — coordinator helpers carved out of `run_execution`, called by both lockstep and parallel paths. +- **`worker_prologue` / `worker_epilogue`** — per-slot body split into "before step_block" and "after step_block" pieces. Used by both lockstep `run_execution` and `run_execution_parallel`. +- **`WorkerCtx`** — per-HW-slot state (block cache, decode cache, hw_id) owned by each worker thread. +- **`--parallel` early-bail** at `cmd_exec_inner` — debugger hooks / DB writer / `XENIA_FORCE_PER_INSTR=1` are rejected with `anyhow::bail!`. + +## Verification matrix (final) + +| Check | Result | +|---|---| +| `cargo test --workspace` | **430 passed, 0 failed, 2 ignored** (the new `parallel_stress_*` tests) | +| Lockstep golden (no flags) at -n 2M | **MATCH** | +| `--reservations-table` golden at -n 2M | **MATCH** | +| `--gpu-inline` golden at -n 2M | **MATCH** | +| `--gpu-inline --reservations-table` golden at -n 2M | **MATCH** | +| `--parallel` golden at -n 2M | DIFFERS by ~7 instructions (expected per master plan; parallel digest may diverge from lockstep) | +| `--parallel --reservations-table` golden at -n 2M | DIFFERS by ~7 instructions (expected) | +| `--parallel` -n 30M `--halt-on-deadlock` | **PASS**: VdSwap=1 + VdSwap=2 reach, deadlock_halts == 0, exit 0 (~57 s) | +| 20× `--parallel` -n 5M `--halt-on-deadlock` (stress) | **20/20 ok, 0 failed**, p50 = 22.7 s, p95 = 27.9 s | +| 100× `--parallel` -n 50M (master plan gate) | **NOT RUN** — wired as `parallel_stress_long #[ignore]`, expected hours at current perf; rerun after Step 05's deferred parking lands | +| `--parallel` perf vs lockstep at -n 30M (median over 5/3 runs) | **FAIL**: parallel ≈ 24× SLOWER than lockstep (target was ≥1.5× faster). Root cause documented in Step 06+07 memo. | + +## Per-step memory files (this session) + +- [project_xenia_rs_m3_realpar_step_01.md](project_xenia_rs_m3_realpar_step_01.md) — coord_pre_round/idle_advance/post_round + RoundCtl carved out of run_execution. +- [project_xenia_rs_m3_realpar_step_02.md](project_xenia_rs_m3_realpar_step_02.md) — WorkerCtx + worker_prologue/worker_epilogue split, per-slot body lifted. +- [project_xenia_rs_m3_realpar_step_03.md](project_xenia_rs_m3_realpar_step_03.md) — single-worker drop-and-reacquire around step_block. Substrate for N=6. +- [project_xenia_rs_m3_realpar_step_04.md](project_xenia_rs_m3_realpar_step_04.md) — N=6 workers + coordinator + 7-party phaser. Includes the `Debugger::new()` paused-default fix and the stale-mask `scheduler.current.is_none()` guard. +- [project_xenia_rs_m3_realpar_step_05.md](project_xenia_rs_m3_realpar_step_05.md) — slot-wake parking attempted, **DEFERRED** due to TOCTOU race; 3 race-free alternatives documented for follow-up. +- [project_xenia_rs_m3_realpar_step_06_07.md](project_xenia_rs_m3_realpar_step_06_07.md) — stress harness landed (20× passed); perf gate measured (24× slowdown, gate not met). +- [project_xenia_rs_m3_realpar_step_08.md](project_xenia_rs_m3_realpar_step_08.md) — this file. + +## What this session delivered (vs master plan target) + +| Deliverable | Status | +|---|---| +| N=6 worker threads | ✅ | +| Main-thread coordinator | ✅ | +| Phaser-based round sync | ✅ | +| Kernel mutex released around step_block | ✅ | +| `find_by_tid` migration handling | ✅ (in `worker_epilogue`) | +| `--parallel` reaches VdSwap=2 at -n 30M | ✅ (~57 s) | +| Halt-on-deadlock handling | ✅ | +| Reservation table active under `--parallel` | ✅ (substrate from M3.7) | +| Lockstep bit-identity at -n 2M | ✅ for the 4 lockstep combos | +| 100× × -n 50M stress | ⚠️ wired (`parallel_stress_long`), not run (impractical at current perf) | +| 1.5× wall-time speedup vs lockstep | ❌ (currently 24× slower) | + +## What's blocking 1.5× speedup + +The **per-round phaser-mutex contention dominates wall time** in sylpheed's first 30 M instructions. With only slot 0 actively stepping most of the time, the other 5 workers contribute 12 of 14 phaser-mutex cycles per round to no useful end. The deferred Step 05 parking optimization would close ~5× of that gap; the remaining 5× is harder and would require either dropping the phaser for single-active-slot rounds, or coalescing rounds. + +The race-free parking design recommended in [project_xenia_rs_m3_realpar_step_05.md](project_xenia_rs_m3_realpar_step_05.md): coalesce post_round + next-round pre_round under a single kernel-lock window, with workers waiting on a Condvar inside that window for the new mask. The Condvar wait/notify is the synchronization point that makes the park-on-inactive-mask race-free. + +## Concurrency invariants (final) + +- **Lock order**: kernel mutex first, stats mutex second. Workers and coordinator both follow. No inversion possible. +- **Atomic ordering**: Release on writers, Acquire on readers, on every shared atomic (runnable_mask, internal_shutdown). Phaser's internal phase counter uses Release/Acquire across the contribute/wait pair. +- **`scheduler.current` discipline**: set ONLY under the kernel lock by `begin_slot_visit`; cleared by `end_slot_visit` BEFORE worker drops the lock around `step_block`; re-set after relock so `worker_epilogue`'s `exit_current` works; cleared again before final unlock. Peers never observe a stale ref. +- **Stale `runnable_mask` guard**: `scheduler.current.is_none()` short-circuit at the top of `worker_prologue` covers the case where a peer's `call_export` blocked the slot's only Ready thread between the coordinator's `round_schedule` snapshot and the worker's `begin_slot_visit`. +- **Cross-worker migration**: `find_by_tid(tid)` in `worker_epilogue` resolves the post-step `ThreadRef`. Falls back to the original `thread_ref` when the lookup misses. With sylpheed's 30 M-instruction init no migration occurs in practice — exercised mainly by the stress harness under future scenarios. +- **Bit-identity rule for lockstep**: Every substep ended with all 6 lockstep golden-flag-combo digests matching. The 4 non-parallel combos still match at the end of the session. + +## Files touched (cumulative across the session) + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - +`RoundCtl` enum, +`coord_pre_round` / `coord_idle_advance` / `coord_post_round` (~200 lines). + - +`WorkerCtx` struct + constructor. + - +`PrologueOutcome` / `SlotOutcome` enums. + - +`worker_prologue` (~270 lines) / +`worker_epilogue` (~70 lines). + - +`run_execution_parallel` (~300 lines including N=6 thread::scope spawn + coordinator loop). + - `cmd_exec_inner` parallel branch: rejects debugger hooks / DB writer / force-per-instr early; spawns the worker thread that calls `run_execution_parallel`. + - `run_execution` (lockstep): per-slot body collapsed onto `worker_prologue` + `worker_epilogue`. Per-round helpers replace the inline pre/post-round logic. +- New file: [crates/xenia-app/tests/parallel_stress.rs](xenia-rs/crates/xenia-app/tests/parallel_stress.rs). + +No new dependencies. Uses only `std::sync::{Arc, Mutex}`, `std::sync::atomic::{AtomicBool, Ordering}`, `std::thread`, and existing `xenia_cpu::Phaser` / `xenia_cpu::PpcContext` / etc. + +## Stable end-state for the workspace + +- Build clean, all 430 tests pass. +- Golden stable for 4 lockstep combos. +- `--parallel` boots sylpheed end-to-end at -n 30M. +- Stress harness wired; short variant validated at 20×. +- The `parallel_stress_long` test is the gate that should run after the parking optimization closes the perf gap. + +## Recommended next session + +1. **Race-free parking** (~50 lines per the Step 05 memo's option 1: Condvar inside the kernel-lock window). Re-enable inactive-worker parking. Expected ~5× perf improvement on slot-0-only rounds. +2. **Run `parallel_stress_long`** (100× × -n 50M). Validate ABA hazards / migration races under sustained load. Should be tractable at the new perf level. +3. **Re-measure perf gate**. If still short of 1.5×, investigate option 2 (coalesce rounds) or option 3 (drop phaser for single-active-slot rounds). +4. **Atomic `ExecStats` fields**. Once parking is in, the stats mutex becomes the next biggest contention point; trivial to convert to atomics. + +## Test count: 430 passed (430 unit + 2 ignored stress tests). diff --git a/migration/claude-memory/project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md b/migration/claude-memory/project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md new file mode 100644 index 0000000..435eeb1 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md @@ -0,0 +1,71 @@ +--- +name: M3.3 + M3.4 — kernel wrap + spawn substrate landed (2026-04-26) +description: --parallel CLI flag + Arc> + spawn one worker thread holding the kernel lock for the run. N=1 substrate; N=6 deferred. Lockstep golden bit-identical. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## What landed + +Step combining M3.3 (kernel wrap) + M3.4 (spawn substrate, N=1 worker). + +### CLI changes + +- `Exec` and `Check` subcommands gained `--parallel` flag. +- `XENIA_PARALLEL=1|true|yes` env-var fallback. +- Plumbed through `cmd_exec` / `cmd_check` / `cmd_exec_inner` signatures. + +### Spawn architecture (in `cmd_exec_inner`, headless branch) + +When `parallel || env_parallel`: + +1. `kernel` is consumed into `Arc::new(Mutex::new(kernel))`. +2. A single worker thread (`xenia-cpu-host`) is spawned via `std::thread::Builder`. +3. The worker takes ownership of `debugger`, `db_writer`, `thunk_map`, plus a clone of the kernel Arc and the mem Arc. +4. Inside the worker: `let mut guard = kernel_for_worker.lock().unwrap()` then `run_execution(&*mem, &mut *guard, ...)`. +5. The worker returns `(stats, debugger, db_writer, thunk_map)` so the caller can resume post-run analysis (digest, summary, diagnostic). +6. `Arc::try_unwrap(kernel_arc)` recovers the kernel since only one strong ref remains after worker join. + +Lockstep mode (no flag): unchanged. Calls `run_execution(&*mem_arc, &mut kernel, ...)` directly on the main thread. + +## Files touched + +- [crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): + - `Exec` + `Check` Cli structs: added `parallel: bool` field + - Top-level `match` arms: pass `parallel` through + - `cmd_exec` / `cmd_check` / `cmd_exec_inner` signatures: `parallel: bool` + - Headless branch (around line 1010): spawn-or-direct conditional, recovers `kernel`/`debugger`/`db_writer`/`thunk_map` from worker + +## Verification + +- `cargo build --workspace`: clean +- `cargo test --workspace`: 411 passed, 0 failed +- `xenia-rs check sylpheed.iso -n 2_000_000 --expect golden/sylpheed_n2m.json` (default): matches +- Same with `--parallel`: matches (worker holds lock throughout; observable execution identical to lockstep) + +## What this proves + +- The `Arc>` wrap compiles and joins cleanly. +- `try_unwrap` succeeds (no Arc-clone leaks). +- The worker recovers all state (debugger/db_writer/thunk_map) without UB. +- Lockstep behavior is bit-identical with the wrap in place. + +## What this does NOT yet do + +- N=6 host threads (only N=1 worker spawned). +- Per-slot parallelism (the worker holds the lock for the entire run, so it's effectively sequential). +- The phaser is not yet wired into the worker (used only by tests so far). +- Reservation table is not yet consulted by the interpreter. +- IRQ routing is unchanged. + +## Regression-fix breadcrumbs + +If a regression appears in this step: + +1. **Compile errors after `parallel: bool` plumbing**: check that ALL three call sites (`cmd_exec` dispatch, `cmd_check` dispatch, the body of `cmd_exec_inner`) pass `parallel` through. The four CLI variants (Exec/Check) × (cmd_exec/cmd_check) all needed updates. +2. **Test failures**: the spawn moves debugger + db_writer + thunk_map into the closure. If any of these aren't returned from the worker, post-run analysis breaks. Triple return: `(stats, debugger, db_writer, thunk_map)`. The thunk_map tuple slot is read into `_thunk_map_back`; rebinding `kernel` happens via `Arc::try_unwrap`. +3. **Golden mismatch under `--parallel`**: would indicate the worker thread perturbs scheduling somehow. The worker is N=1 and holds the lock throughout, so execution should be identical to lockstep. If it diverges, check whether the `start: Instant` capture is racy (it shouldn't be — the worker doesn't touch it). +4. **Hangs on shutdown**: the worker is fully owned by the join handle; if it never returns, `join()` blocks forever. Check that `run_execution` itself terminates as it does in lockstep. + +## Test count baseline + +411 passing tests. Same as post-M3.2a (no new tests added in this step). diff --git a/migration/claude-memory/project_xenia_rs_m3_step_07_reservation_activation.md b/migration/claude-memory/project_xenia_rs_m3_step_07_reservation_activation.md new file mode 100644 index 0000000..abe08a8 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_step_07_reservation_activation.md @@ -0,0 +1,109 @@ +--- +name: M3.7 — reservation table activation in interpreter (2026-04-26) +description: lwarx/stwcx/ldarx/stdcx now route through ReservationTable when ctx.reservation_table is Some and the table is enabled; otherwise legacy per-PpcContext path. Reservations enable() flag moved from KernelState into ReservationTable. Scheduler.spawn populates ctx.hw_id + ctx.reservation_table. All 6 flag combos still match golden. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## What landed + +### `ReservationTable` now self-describes its enabled state + +[crates/xenia-cpu/src/reservation.rs](xenia-rs/crates/xenia-cpu/src/reservation.rs): +- New field: `enabled: AtomicBool` (default `false`). +- New methods: `enable()` (Release store), `disable()` (Release store), `is_enabled()` (Acquire load). + +Previously the flag lived on `KernelState.reservations_enabled`. Moved here so the interpreter can consult the table directly without needing a kernel reference. + +### `KernelState.reservations_enabled` field removed + +[crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs): the redundant `AtomicBool` is gone. `kernel.reservations.enable()` is the new API. `cmd_exec_inner` calls it under `--reservations-table`/`XENIA_RESERVATIONS_TABLE=1`/`--parallel`/`XENIA_PARALLEL=1`. + +### `PpcContext` carries reservation handle + hw_id + +[crates/xenia-cpu/src/context.rs](xenia-rs/crates/xenia-cpu/src/context.rs): +- `pub reserved_generation: u32` — generation stamp from `ReservationTable::reserve()` at most recent lwarx/ldarx. +- `pub reservation_table: Option>` — populated by scheduler at spawn. +- `pub hw_id: u8` — the slot ID this thread is bound to (matches `Scheduler::slots[hw_id]`). + +`PpcContext::new()` defaults all to zero/None. + +### Scheduler holds + propagates the reservation table + +[crates/xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs): +- `Scheduler.reservation_table: Option>` (default None). +- `Scheduler::set_reservation_table(Option>)` setter. +- `Scheduler::spawn` populates `t.ctx.hw_id = slot_id; t.ctx.reservation_table = self.reservation_table.clone();` for every spawned thread. +- `Scheduler::install_initial_thread` does the same for slot 0. +- `Scheduler::set_affinity_ref` (migration) updates `thread.ctx.hw_id = target` when moving across slots. + +### `KernelState::with_gpu` wires the table into the scheduler + +`KernelState::with_gpu` now constructs the `ReservationTable` *first*, calls `scheduler.set_reservation_table(Some(reservations.clone()))` *before* installing the kernel state, so any `install_initial_thread` after construction picks up the Arc clone automatically. + +### Interpreter lwarx/stwcx/ldarx/stdcx routing + +[crates/xenia-cpu/src/interpreter.rs](xenia-rs/crates/xenia-cpu/src/interpreter.rs) lines around 1082, 4071: + +**lwarx / ldarx** (claim path): +```rust +ctx.reserved_line = ea & !RESERVATION_MASK; +ctx.reserved_val = val; +ctx.has_reservation = true; +if let Some(t) = &ctx.reservation_table { + if t.is_enabled() { + ctx.reserved_generation = t.reserve(ea, ctx.hw_id); + } +} +``` + +**stwcx / stdcx** (commit path): +```rust +let table_route = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()).cloned(); +let success = if let Some(t) = &table_route { + ctx.has_reservation + && ctx.reserved_line == line + && t.try_commit(ea, ctx.reserved_generation, ctx.hw_id) +} else { + ctx.has_reservation && ctx.reserved_line == line +}; +// CR.EQ on success; on failure, t.release(...) so the active-reserver count returns to zero +``` + +Both paths leave the per-ctx fields coherent so a flag flip mid-run doesn't corrupt outstanding reservations. + +### `--parallel` implies reservations enabled + +[crates/xenia-app/src/main.rs](xenia-rs/crates/xenia-app/src/main.rs): `if reservations_table || reservations_via_env || parallel_active { kernel.reservations.enable(); }`. The parallel spawn path needs the table; we activate it implicitly. + +## Files touched + +- `crates/xenia-cpu/src/reservation.rs` — `enabled` field + enable/disable/is_enabled methods +- `crates/xenia-cpu/src/context.rs` — `reserved_generation`, `reservation_table`, `hw_id` fields on PpcContext +- `crates/xenia-cpu/src/interpreter.rs` — lwarx/stwcx/ldarx/stdcx arms route through table when active +- `crates/xenia-cpu/src/scheduler.rs` — `Scheduler.reservation_table` + setter; spawn/install/migration populate ctx +- `crates/xenia-kernel/src/state.rs` — removed `reservations_enabled` field; constructor wires scheduler.set_reservation_table +- `crates/xenia-app/src/main.rs` — `kernel.reservations.enable()` instead of touching the removed flag; parallel implies activation + +## Verification + +- `cargo build --workspace`: clean +- `cargo test --workspace`: 411 passed, 0 failed +- Sylpheed `-n 2_000_000 --expect golden/sylpheed_n2m.json`: matches in **all 6 combinations**: + - default + - `--parallel` + - `--reservations-table` + - `--gpu-inline` + - `--gpu-inline --reservations-table` + - `--parallel --reservations-table` + +## Regression-fix breadcrumbs + +If a regression appears in this step: + +1. **Compile errors about removed `reservations_enabled`**: any external code that still touches `kernel.reservations_enabled` needs to call `kernel.reservations.enable()` / `is_enabled()` instead. Search for "reservations_enabled" in the workspace. +2. **Test failures specifically when `--reservations-table` or `--parallel` is set**: the per-ctx fallback should preserve identical observable behavior. If golden mismatches: the table's `try_commit` may be incorrectly failing where the legacy path would succeed. Check that `ctx.reserved_line == line && ctx.has_reservation` is still required as a precondition for table commits. +3. **Test failures without any flag**: shouldn't happen — the table's `is_enabled()` returns false by default, so the legacy path runs unchanged. If you see a regression, check that `ReservationTable::new()` initializes `enabled` to `AtomicBool::new(false)`. +4. **Migration losing reservations**: when `set_affinity_ref` moves a thread across slots, we update `thread.ctx.hw_id = target`. If a thread's reservation was claimed under the old hw_id, the table-routed `try_commit` after migration will fail (different hw_id). This is correct behavior — Xenon reservations don't survive a thread migration. The legacy per-ctx path is also reset because the post-migration thread enters a fresh execution slot. +5. **Spawn doesn't populate ctx fields**: every guest thread MUST get `t.ctx.reservation_table = self.reservation_table.clone()`. If a spawn site is missed, that thread's lwarx/stwcx fall through to the legacy path even when other threads use the table — usually harmless but breaks the inter-thread invariants. + +## Test count: 411 (unchanged from prior step; this step is pure plumbing). diff --git a/migration/claude-memory/project_xenia_rs_m3_step_08_verification.md b/migration/claude-memory/project_xenia_rs_m3_step_08_verification.md new file mode 100644 index 0000000..d87771d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_m3_step_08_verification.md @@ -0,0 +1,79 @@ +--- +name: M3.8 — final M3 verification (2026-04-26) +description: All M3-session steps verified. 411 tests pass. Sylpheed -n 2M golden matches in all 6 flag combos (default, --parallel, --reservations-table, --gpu-inline, --gpu-inline --reservations-table, --parallel --reservations-table). Sylpheed -n 30M --parallel reaches VdSwap=1 + VdSwap=2 with no halts. +type: project +originSessionId: af90c866-579c-4506-af85-cd5a5030af85 +--- +## Final verification matrix (M3 session) + +### Workspace tests + +``` +cargo test --workspace +PASSED: 411 FAILED: 0 +``` + +### Golden digest (all 6 flag combinations) + +``` +[] matches golden +[--parallel] matches golden +[--reservations-table] matches golden +[--gpu-inline] matches golden +[--gpu-inline --reservations-table] matches golden +[--parallel --reservations-table] matches golden +``` + +### Sylpheed boot under --parallel + +``` +xenia-rs exec sylpheed.iso -n 30_000_000 --parallel --halt-on-deadlock +exit: 0 +gpu: XE_SWAP (kernel-direct) frame=1 fb=0x4b0d7000 1280x720 +gpu: XE_SWAP (kernel-direct) frame=2 fb=0x4b0d7000 1280x720 +counter kernel.calls{name=VdSwap} = 2 +``` + +VdSwap=1 + VdSwap=2 fire end-to-end under parallel mode. No deadlock halts. Identical halt-counter behavior between `--parallel` and default modes at -n 30M. + +## What this session delivered + +Five M3 steps landed across this session: + +1. **M3.1** — `Phaser` primitive (custom barrier-with-skip; 6 unit tests). +2. **M3.2a** — Per-HW-slot block caches (`[BlockCache; HW_THREAD_COUNT]`). +3. **M3.3 + M3.4** — `Arc>` wrap + `--parallel` CLI flag + N=1 spawn substrate. The interpreter runs on a dedicated `xenia-cpu-host` worker thread holding the kernel mutex. +4. **M3.7** — Reservation table activation. `lwarx`/`stwcx.`/`ldarx`/`stdcx.` route through `ReservationTable` when `ctx.reservation_table.is_enabled()`. `--parallel` implies activation. +5. **M3.8** — End-to-end verification on Sylpheed. + +## What's intentionally not in this session (deferred to follow-up) + +- **N=6 host-thread spawn**: the substrate is in place (Arc>, phaser primitive, per-thread block caches, per-ctx hw_id+reservation_table fields), but only one worker is currently spawned. Scaling to six requires either (a) a per-slot `Mutex` refactor inside the scheduler (the original M2.7 plan), or (b) a finer-grained "release the kernel lock during instruction execution" pattern. Both are substantial focused refactors. +- **M3.5** — slot-level wakeups (`slot_wake[6]: AtomicBool` + Thread handles + unpark on signal). +- **M3.6** — IRQ injection via `pending_local_irq[6]` (the array is wired but unused; the existing single-thread IRQ injection still runs). + +## Concurrency-correctness invariants in place + +- **Memory ordering**: Release on writers + Acquire on readers across every shared atomic (page versions, reservation table slots, GPU MMIO mailboxes, parker wake_pending, phaser phase, reservation enabled flag). +- **Lock discipline (where locks exist)**: GPU thread holds its own state exclusively; CPU↔GPU communication exclusively through atomics + crossbeam channels. Kernel under `--parallel` is wrapped in `Arc>`; the worker holds the lock for the entire run (single-worker, no contention). +- **Reservation invariants**: only the originating hw_id can commit its reservation; non-reserving stores invalidate; the active-reserver counter returns to zero on commit/release. +- **Phaser invariants**: arrived + skipped count toward the same advance threshold; generation counter prevents lost-wakeup (Acquire phase load in wait predicate); shutdown wakes all parked arrivers cleanly. + +## Per-step memory files (this session) + +- [project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md](project_xenia_rs_m3_step_03_04_kernel_wrap_spawn.md) — M3.3 + M3.4 +- [project_xenia_rs_m3_step_07_reservation_activation.md](project_xenia_rs_m3_step_07_reservation_activation.md) — M3.7 +- [project_xenia_rs_m3_step_08_verification.md](project_xenia_rs_m3_step_08_verification.md) — this file + +## Wall-clock perf snapshot (for reference; not a regression gate) + +`xenia-rs exec sylpheed.iso -n 30_000_000`: +- default (no --parallel): ~2.95 s +- --parallel (N=1 worker): ~2.95 s + +Same wall-clock (single-worker spawn = bit-identical to lockstep aside from thread spawn overhead in the hundreds-of-µs range). + +## Stable end-state for the workspace + +- `git diff --stat` would show changes in: `crates/xenia-cpu/src/{phaser.rs,context.rs,interpreter.rs,scheduler.rs,reservation.rs,lib.rs}`, `crates/xenia-kernel/src/state.rs`, `crates/xenia-app/src/main.rs`. +- Build clean. Tests green. Golden stable. diff --git a/migration/claude-memory/project_xenia_rs_methodology_verification_2026_05_08.md b/migration/claude-memory/project_xenia_rs_methodology_verification_2026_05_08.md new file mode 100644 index 0000000..c3f581a --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_methodology_verification_2026_05_08.md @@ -0,0 +1,78 @@ +--- +name: methodology verification cluster (RECONCILE-A/B + VERIFY-A/B) 2026-05-08 +description: Four meta-audits ran in parallel pairs to test whether our analysis tooling has misled prior conclusions. Outcome — methodology is largely sound at the kernel level; user's "Linux black window" is host-presenter divergence; reading-error ledger (10 entries, mostly function-boundary mis-attribution) is the real motivating gap. +type: project +originSessionId: 3ac64eba-6d00-4c87-aed8-b1739c32b94d +--- +**🔍 META-AUDIT (2026-05-08, READ-ONLY, master `e061e21` unchanged)** — four parallel verifications + reconciliations triggered by user concerns about analysis-tool blind spots. + +## VERIFY-A: static reachability soundness + +Question: does our `xrefs.kind='call'` BFS correctly identify the audit-009 unreached cluster, or is it artifactually narrow (missing reachability via indirect dispatch)? + +Method: re-applied audit-030's `--log_lr_on_pc` canary patch with 12 cluster L1 PCs (the 6 narrow audit-009 entries + 6 sample from broader 112-set). Each probe ran ~35s while canary's audio loop was active. + +Result: **0/12 PCs fire in canary**. The audit-009 cluster is genuinely cold in canary too. Static reachability conclusion is corroborated. Bayesian 95% upper bound on cluster reach-rate ≈ 22% from this sample; full 112-PC sweep would tighten to <5% (~75 min). + +Implication: the audit-009/016/017/020/021/026/027/029 framing is sound. The cluster IS unreached in canary. + +## VERIFY-B: mem-watch coverage + +Question: does `--mem-watch` (audit-019) miss store classes beyond stb/sth/stw/std (e.g., DCBZ, VMX, multi-word, byte-reverse)? + +Method: read-only audit of xenia-cpu interpreter. For each PowerPC store-class instruction, confirm whether its handler calls `GuestMemory::write_*` (hooked) or bypasses (blind). + +Result: **12/12 store classes COVERED**. DCBZ uses 8× or 32× write_u32(0). VMX/VMX128 stores loop write_u8 via vmx.rs helpers. Byte-reverse swaps then writes via hooked path. stmw / stswx are per-element loops. stwcx./stdcx. atomic store-conditional hooked. FP stores route via trait default → write_u32/u64. + +Cross-check: the writer at sub_821701c8:0x82170234 (audit-017's only non-(-1) writer of `[0x828F40B0]`) is `stw r11, 64(r30)` — plain `stw`, fully hooked. **Audit-019's "1 fire only on init" finding is NOT a false negative.** + +Implication: no mem-watch blind spots. β-blocker observations stand. + +## RECONCILE-A: Windows-vs-Linux canary kernel-call diff + +Question (post user observation that Lutris-launched Windows shows intro video, Linux Debug shows black): is Linux Debug build itself partially stalled? If so, all our cross-runtime memory dumps + LR traces (audits 023/024A/026/027/029/030/031) reflect a partial canary, not a working one. + +Method: log-diff `/home/fabi/xenia_canary_windows/xenia.log` (Lutris Windows = visibly working) against `/home/fabi/RE Project Sylpheed/xenia-rs/audit-runs/audit-024a-canary-diff/canary.log` (Linux Debug). + +Result: **identical kernel-call trajectories**. Both reach `\dat\movie\ADV.wmv` + XMA decoder. Linux ended at frame 42/186, Windows at 72/186 — Linux log was simply terminated earlier. Top-25 call frequency profiles essentially identical. Linux even fires MORE `XamInputGetCapabilities` (3644 vs 3052). + +Set diff (Windows-only call NAMES): `XamEnumerate` ×1, `XamUserReadProfileSettings` ×2. **Both already fire in our impl** per audit-018 counters. + +Implication: Linux Debug build IS a faithful kernel-call-level oracle. The user's "Linux black window" observation is a presentation-layer phenomenon (next entry). + +## RECONCILE-B: Linux build presenter divergence (source-level) + +Question: why does Linux Debug build show black window where Windows shows intro video? + +Method: read-only audit of xenia-canary source for `XE_PLATFORM_WIN32` / `XE_PLATFORM_LINUX` conditionals. + +Result: ranked hypotheses, with concrete evidence: +- **H1 (HIGH)**: Vulkan presenter likely fails swapchain creation on certain display configs. `vulkan_presenter.cc:211-215` registers only `kTypeFlag_XcbWindow` (XCB). Wayland is TODO (`window_gtk.cc:289`). `windowed_app_main_posix.cc:23-24` forces `GDK_BACKEND=x11`. Under Wayland-only or odd Xwayland setups, surface returns `nullptr`. User confirmed Weston (Wayland compositor with Xwayland) ALSO shows black — possibly H1 in Xwayland mode too, or a deeper Vulkan-on-Linux issue. +- **H2 (MEDIUM)**: Vulkan backend feature gap vs D3D12 (Windows DXVK→Vulkan vs Linux native Vulkan = different code paths). Sylpheed uses 8_8_8_8_GAMMA / 2_10_10_10_FLOAT formats heavily during intro. +- **H3 (MEDIUM)**: Frame-limiter cadence mismatch (Linux unconditional MarkVblank per loop iter vs Windows period-gated). +- **H4 (LOW)**: `--disable_instruction_infocache=true` is cosmetic per `xex_module.cc:1356-1362`. + +Bug class: B-host-divergence. Guest-engine progression matches Windows; host-side rendering pipeline doesn't pump VdSwap completions to the visible surface. **The guest is working; the screen just doesn't show it on Linux.** This is irrelevant to our engine-level analysis since the GUEST is what we emulate. + +User confirmation post-RECONCILE: "I have run Canary + Sylpheed in Weston too, but it still shows black screen after the splash screen. You may continue using the Linux build for now, because you claimed that it is identical to the Windows build and the issue lies at the host level. However be careful about any claims and conclusions stemming from it." + +## Combined methodology verdict + +- Static reachability BFS: SOUND (VERIFY-A) +- Mem-watch coverage: SOUND (VERIFY-B) +- Linux Debug build as kernel-level oracle: FAITHFUL (RECONCILE-A) +- Visual divergence (Linux black, Windows shows intro): host-presenter only (RECONCILE-B), irrelevant to engine analysis +- Reading-error ledger: **10** (mostly function-boundary attribution errors). This is the real gap motivating the analysis-overhaul. +- Audit-032 added a methodology pattern note: host-side dispatchers (canary's WorkerThreadMain) invoke guest callbacks WITHOUT a guest call site. LR-traces and call-graph analysis cannot detect this; static analysis is necessary to identify host-pump-vs-guest-dispatch. + +## What this means strategically + +1. Past audit findings stand. No re-grading. +2. The renderer plateau (audit-009 cluster) remains the gate for `draws > 0` and is genuinely blocked on tooling we don't yet have. +3. The analysis-toolset overhaul is motivated primarily by: + - 10 function-boundary attribution errors (forces cross-validation overhead every session) + - Zero C++/MSVC support (vtable detection, RTTI extraction, class identification, name demangling) + - Indirect-dispatch reachability (would catch reachability via vtable/function-pointer) +4. Mem-watch extension and basic instruction coverage extension are LOWER priority (mem-watch is comprehensive; the static-side coverage gap is real but doesn't block). + +Master HEAD `e061e21` unchanged. Tests 605. swaps=2 draws=0 plateau intact. diff --git a/migration/claude-memory/project_xenia_rs_overhaul_rapid_survey_2026_05_08.md b/migration/claude-memory/project_xenia_rs_overhaul_rapid_survey_2026_05_08.md new file mode 100644 index 0000000..4cc2aac --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_overhaul_rapid_survey_2026_05_08.md @@ -0,0 +1,47 @@ +# RAPID-SURVEY: post-overhaul DB survey for audit-009 cluster + +Date: 2026-05-08. Master HEAD: 9028021. READ-ONLY (no source mod, no commit, no rebuild). DB: `/home/fabi/RE Project Sylpheed/xenia-rs/sylpheed.db` (post-overhaul, all M1-M12 landed). + +## Survey questions Q1-Q7 — results + +**Q1. Cluster L1 PCs in vtables / dispatch tables**: ZERO HITS. The 6 L1 PCs (0x822919C8, 0x82293448, 0x82288028, 0x82292D80, 0x822851E0, 0x82286BC8) appear in NEITHER `methods` NOR `function_pointer_array_entries`. They are not direct vtable slots and not direct dispatch-table entries. Consistent with audit-031/032: dispatched via `this->vptr` writes that M5 cannot trace (M5.5 territory). + +**Q2. Dispatch tables / vtables pointing INTO cluster**: 13 arrays found in `0x820A9B98-0x820AA024`, ALL OF THEM static-`.rdata` arrays in the data section preceding the cluster. Highlights: +- Vtable `0x820A9C28` (3 slots) → `82291740, 82291C90, 82291BD0` (cluster). Ctor candidate: `sub_8228F858`, also `sub_822917A0`. +- Vtable `0x820AA024` (8 slots, slot 0 = `82293ED8`, slots 1-7 = abort handler `sub_825ED990`). Ctor candidates: `sub_82293EC8` (4 callsites), `sub_82294110`, `sub_82294898`, `sub_822A0860`, `sub_822A0E90`. +- Dispatch tables `0x820A9B98/A4/B4/C4/D4/E4/F4/0x820A9C04/D74/D84/D94` — all 2-slot, sourced from cluster ctors (e.g. `sub_8228F858 +0x1AC, +0x260` writes to `0x820A9D74` & `0x820A9C28`). + +ALL ctor candidates that write these vtables are themselves in the cluster or in adjacent 0x822A0000 range — no external static call chain reaches them. + +**Q3. Cluster reachability (call BFS + indirect)**: 309 pdata-validated functions in the cluster. **309 unreachable** via `v_reachability_from_entry`. **309 unreachable** via `v_indirect_reachability_from_entry`. Indirect view added ZERO new reach. Audit-009's "42 unreachable" was a vast undercount — pdata correction added many missed functions; corrected count is 309 (~7x larger). Cluster is 100% cold. + +**Q4. Subsystem strings referenced from cluster** (93 references): `BASE_INFO`, `THUMBNAIL`, `LOAD_BASES`, `SAVE_BASES`, `LOAD_MENUS`, `SAVE_MENUS`, `AUTO_SAVED`, `CLEARED`, `MISSION_SELECT`, `STATE_STAND_BY`, `STATE_GAME_CLEAR`, `NOW_LOADING`, `EX_FONT`, `Points`, `FlightTime`, `ClearTimes`, `CompletionRate`, `Disk free space : %d bytes`, `Content request : %d bytes`, `vector too long`, `deque too long`. Cluster is **save-game / mission-select / front-end UI / HUD subsystem**, NOT raw renderer. `SilpheedSCS` strings live OUTSIDE the cluster (`820A54C0/5500/6F70`, referenced from `sub_821A6CF0+0xE6C`, `sub_821A8578+0xE0`, `sub_82239F00+0x3B4`). + +**Q5. Cluster fns with EH**: 68 functions (including all 6 L1 PCs). Cluster is heavily C++ EH-instrumented (try/catch around save-game I/O). `sub_82291C90` (2700B), `sub_82288E70` (2012B), `sub_82288028` (1896B), `sub_82286BC8` (1768B), `sub_82292D80` (1524B). High-leverage entries. + +**Q6. Mis-merge candidates** in `0x82200000-0x822F0000`: ZERO. .pdata correction is clean across the audit-relevant region; no past-audit findings need re-validation due to mis-named functions. + +**Q7. audit-031 boundary fix**: VERIFIED. `sub_824D23B0` (1224B), `sub_824D29F0` (280B), `sub_824D2BD8` (48B), `sub_824D2C08` (928B) — all four pdata-validated and present as separate rows. The pre-overhaul mis-merge is corrected. + +## Highest-leverage finding + +The audit-009 cluster is now CONFIRMED to be the **save-game/mission-select/UI subsystem**, not the active 3D renderer. The 309-function cluster is reachable only through `this->vptr` dispatches whose vtable writes (`stw rVtable, off(this)`) come from cluster-internal ctors (`sub_82293EC8`, `sub_82294898`, `sub_8228F858`, `sub_82284590`) that are themselves never statically reached. The Sylpheed front-end never enters this subsystem because the parent allocator/factory is never instantiated. + +This refines the gate: the missing trigger is whatever instantiates the front-end-UI controller object. Look UPSTREAM for what creates and dispatches into 0x822A0000 or earlier. + +## Is M5.5 mandatory? + +**Yes, for forward call-graph deepening within the cluster** — but **not strictly required to dispatch a useful next move**. The Q4 string evidence already names the subsystem clearly. A targeted runtime probe on the EXTERNAL entries (e.g. `sub_8228A628`, `sub_8228E138`, `sub_8228E498` — the 3 cluster fns with external static callers — and their parents `sub_82172524`, `sub_82175810`, `sub_8217EB78`) plus a canary-diff on those PCs would show whether canary's UI-init reaches them and ours doesn't. M5.5 would let us walk vptr writes back to the trigger ctor; that's the high-value follow-up. + +## Recommended next session + +1. **Pivot away from "renderer plateau" framing.** It's the front-end-UI/save-game subsystem, not draw-call code. Re-baseline the audit-009 framing in MEMORY.md. +2. **Run `--lr-trace` / `--pc-probe` against canary on `sub_8228A628`, `sub_8228E138`, `sub_8228E498`, `sub_82172524`, `sub_82175810`, `sub_8217EB78`.** If canary fires them and ours doesn't, walk back the canary-only LRs to find the missing kernel/import gate. +3. **Schedule M5.5** as next analyzer milestone — `this`-flow vtable resolution would resolve the 309 cluster fns into named caller chains and likely surface the UI-controller ctor PC. +4. **Cross-check `SilpheedSCS::CMessageBridge::Load/CreateDeviceObjects`** callsites (sub_821A6CF0, sub_821A8578) — both fired in past audits but result paths land in cluster ctors that never run. + +## Files (absolute paths) + +- DB: `/home/fabi/RE Project Sylpheed/xenia-rs/sylpheed.db` +- Schema doc: `/home/fabi/RE Project Sylpheed/xenia-rs/crates/xenia-analysis/SCHEMA.md` +- Memory: `/home/fabi/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/project_xenia_rs_overhaul_rapid_survey_2026_05_08.md` diff --git a/migration/claude-memory/project_xenia_rs_perf_tier4.md b/migration/claude-memory/project_xenia_rs_perf_tier4.md new file mode 100644 index 0000000..573af96 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_perf_tier4.md @@ -0,0 +1,146 @@ +--- +name: xenia-rs Tier-4 perf landed (2026-04-25) +description: MMIO fast-reject + basic-block cache + GPU pacer; Sylpheed boot 318→136ms (2.3×); 370 tests pass; thread-interleaving divergence at large -n is expected +type: project +originSessionId: b082ddb2-530b-45e9-a454-5dfa856fecf3 +--- +## What landed + +Three perf changes on top of the prior Tier-1–3 work: + +1. **MMIO fast-reject** — `xenia-memory/src/heap.rs` `find_mmio` now does a + single `(addr & mmio_aperture_mask) != mmio_aperture_value` compare + before falling through to the linear `iter().find` over registered + regions. Aperture pair recomputed in `add_mmio_region` via a + `fold_aperture` helper (greatest common bit-mask agreement). Fast + path is a *necessary* condition only — `contains()` still runs for + matching candidates, so MMIO semantics are unchanged. + +2. **Basic-block cache** — new `xenia-cpu/src/block_cache.rs`. 64 K + direct-mapped slots keyed by `(start_pc >> 2) & 0xFFFF`, each holding + a `DecodedBlock { start_pc, end_pc, page_version, instrs }`. Block + walk stops on `PpcOpcode::terminates_block()` (branch / sc / trap / + Invalid), at `MAX_BLOCK_INSTRS = 32`, or at a 4 KiB guest-page + boundary (so single-`page_version` invalidation suffices). New + `xenia-cpu::interpreter::step_block` dispatches each instruction in + the block via the existing match-based `execute`. + +3. **Hot-loop wiring + GPU pacer** — + `xenia-app/src/main.rs::run_execution` now branches on + `debugger.wants_hooks() || db_writer.is_some() || + force_per_instr` — only the per-instruction path runs when any of + those is true. A new env var `XENIA_FORCE_PER_INSTR=1` forces the + slow path for A/B testing. Post-round GPU dispatch was changed from + "1 `execute_one` per round" to + `gpu_runs = max(1, min(64, executed_this_round / HW_THREAD_COUNT))` + so block mode (which executes ~6× more instructions per outer + round) doesn't starve the GPU. + +## Why u32-narrowing and threaded-code dispatch were skipped + +- **u32-narrowing**: cmpi/cmpli/cmp/rlwinm arms already cast to u32/i32 + in their bodies. The remaining "obvious" target — addi/addis — runs + natively at u64 because Xenon GPRs are 64-bit. No measurable win + available without rewriting the ISA semantics. +- **Threaded-code dispatch**: extracting ~200 match arms into per-opcode + free functions for an uncertain LLVM-jump-table-vs-fn-ptr win was a + poor risk/reward. The basic-block cache benefit doesn't depend on + threaded dispatch (each instruction inside a block still goes through + the existing match), so this was the right phase to skip. + +Both decisions match the plan's bench-gated rule: "Phase 4 must not be +merged on principle alone — it merges only if numbers go up." + +## Numbers + +Baseline (pre-perf-track) → final (`xenia-rs check sylpheed.iso -n 2_000_000`): + +| metric | baseline | final | delta | +|------------------------|-----------|--------|--------| +| wall-time | 318 ms | 136 ms | 2.3× | +| `tight_alu_loop` bench | 96.9 MIPS | 114.8 | +18.5% | +| `loadstore_loop` bench | 78.3 MIPS | 91.8 | +17.2% | +| `mmio_storm` bench | 59.7 MIPS | 67.8 | +13.6% | +| workspace tests | 352 | 370 | +18 | + +Bench is `cargo bench -p xenia-cpu` against the new +`crates/xenia-cpu/benches/interpreter.rs` harness. No criterion dep — +custom `harness = false` `main()`. + +## Verification + +- **Golden digest at -n 2M** (`crates/xenia-app/tests/golden/sylpheed_n2m.json`): + byte-identical between block and per-instruction modes. +- **VdSwap fidelity**: frame=1 fires before -n 18M; frame=2 fires + between -n 18M–22M. Prior memory said "~28 M cycles" but that + predates the GPU pacer; the actual figure with current scheduling + shifts by mode (block mode is faster wall-time but identical + instruction-count behavior up to the point of first thread + divergence). +- **Deadlock counters**: 0 halts / 0 recoveries on every Sylpheed run. +- **All 370 workspace tests pass**, including new tests: + - `xenia-memory::heap`: 5 (mmio_fast_path_*, fold_aperture_*). + - `xenia-cpu::opcode`: 5 (terminates_block_*). + - `xenia-cpu::block_cache`: 6 (build, page boundary, max-len, invalid + terminator, invalidation, hit-returns-cached). + - `xenia-cpu::interpreter`: 2 parity tests + (block_dispatch_matches_per_instruction_alu_loop + + loadstore_loop) — bit-identical CPU state between paths on a + single-thread workload. + +## Important caveat: thread-interleaving divergence at large -n + +At -n 30M+, the `--expect` digest **differs** between block and +per-instruction modes: + +- imports diverge by ~10% (block lower) +- packets diverge by ~3.7× (block lower) + +This is **fundamental to any block-batching dispatcher** in a +multi-threaded scheduler. Per-instruction mode round-robins +instructions across HW threads (HW0 ← 1 instr, HW1 ← 1 instr, …); +block mode lets HW0 burst up to MAX_BLOCK_INSTRS before yielding. +Different valid interleavings of the same multi-threaded program +reach different relative-progress states at any given total +instruction count. Both produce correct Sylpheed boots — VdSwap=1 +and =2 fire, no deadlocks. Bit-identical comparison between modes +is only meaningful at -n 2M (before workers spawn) and that +remains the regression rail. + +## Files touched in 2026-04-25 perf-track session + +- `crates/xenia-cpu/Cargo.toml` — `[[bench]] name = "interpreter" harness = false`. +- `crates/xenia-cpu/benches/interpreter.rs` — new (3 benches). +- `crates/xenia-cpu/src/lib.rs` — `pub mod block_cache;`. +- `crates/xenia-cpu/src/block_cache.rs` — new file. +- `crates/xenia-cpu/src/interpreter.rs` — `step_block`, parity tests. +- `crates/xenia-cpu/src/opcode.rs` — `terminates_block` + tests. +- `crates/xenia-memory/src/heap.rs` — MMIO fast-reject + tests. +- `crates/xenia-app/src/main.rs` — block-cache wiring, GPU pacer, + `XENIA_FORCE_PER_INSTR` escape hatch. +- `crates/xenia-app/tests/golden/sylpheed_n2m.json` — golden digest. + +## How to A/B test in future sessions + +```bash +# block-cache mode (default) +./target/release/xenia-rs check -n 2_000_000 --expect crates/xenia-app/tests/golden/sylpheed_n2m.json + +# force per-instruction (debugging) +XENIA_FORCE_PER_INSTR=1 ./target/release/xenia-rs check -n 2_000_000 --expect ... + +# bench +cargo bench -p xenia-cpu +# or: cargo run --release --bench interpreter +``` + +## What's next on the perf track if needed + +If Sylpheed boot is still too slow after this lands: + +1. Profile with `--profile out.svg` to see where time goes now. +2. Threaded-code dispatch is still on the table — but only with a + bench showing >1.5× win on `tight_alu_loop` from a small-prototype + spike branch. +3. The `MAX_BLOCK_INSTRS = 32` cap could be tuned. Lower (16, 8) + reduces thread-interleaving divergence at the cost of dispatch wins. diff --git a/migration/claude-memory/project_xenia_rs_ppc_audit_2026_04_29.md b/migration/claude-memory/project_xenia_rs_ppc_audit_2026_04_29.md new file mode 100644 index 0000000..14d927c --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_ppc_audit_2026_04_29.md @@ -0,0 +1,196 @@ +--- +name: PPC instruction audit (2026-04-29) — addis-family bug hunt +description: Comprehensive audit of all PPC opcodes triggered by the addis 32-bit-ABI fix. Audit-only; no code changes. Findings tracker has stable PPCBUG-NNN IDs for fix-session reference. +type: project +originSessionId: a75558fe-b213-4ee1-8f1b-7b2ca696e638 +--- +**Audit triggered by**: the `addis` 32-bit-ABI sign-extension fix from earlier on 2026-04-29 (`project_xenia_rs_addis_signext_root_cause_2026_04_29.md`). Hypothesis: the addis bug is unlikely to be the only one of its kind among the 429 unique opcode arms in the interpreter. + +**Status**: in-flight, audit-only. **No code under `xenia-rs/crates/` has been modified.** This is explicit per the plan: the deliverable is a triaged report; a follow-up session applies fixes. + +**Where everything lives**: +- **Plan**: `/home/fabi/.claude/plans/we-just-found-and-stateful-sparrow.md` +- **Pre-pass red flags**: `RE Project Sylpheed/xenia-rs/audit-prepass-findings.md` +- **Per-group subagent reports**: `RE Project Sylpheed/xenia-rs/audit-out/group-NN-*.md` +- **Consolidated findings tracker (with stable PPCBUG-NNN IDs)**: `RE Project Sylpheed/xenia-rs/audit-findings.md` + +**Methodology**: orchestrator (this session) does a regex pre-pass, then fans out one Sonnet subagent per opcode group (~38 groups across 8 batches), then 3 dedicated Phase-C subagents for decoder/disasm. User asked the orchestrator to checkpoint and pause between batches. Each subagent writes its report to `audit-out/group-NN-*.md`. The orchestrator appends new IDs to `audit-findings.md` after each batch. + +**Phase B complete — all 8 batches done (groups 1-38, the entire interpreter)**: 224 PPCBUG IDs assigned through high-water-mark PPCBUG-519. ~50 HIGH, ~70 MEDIUM, rest LOW. The cross-cutting recommendation is **systematically truncate every GPR writeback in every integer ALU op via `as u32 as u64`**; the existing CA/CR0/OE helpers become correct without further changes once the writeback invariant is restored. Defensive secondary recommendation: `subfcx`/`subfex`/`addic`/`addicx`/`subficx` should additionally truncate their compare operands locally for resilience against future regressions. + +**Batch 4 surfaced the headline find**: **PPCBUG-107 — `ReservationTable::invalidate_for_write` is defined and unit-tested in `reservation.rs:234` but never called from any store instruction in `interpreter.rs`.** Under `--parallel` mode with the table enabled, a plain `stw` by thread B to an address reserved by thread A doesn't clear thread A's reservation slot — every cross-thread atomic via lwarx/stwcx. is silently broken. Spinlocks, atomic counters, condition-variable handshakes, producer-consumer queues all fail. **This is a plausible smoking gun for the Sylpheed renderer plateau** (4 worker threads parked on never-signaled events per `project_xenia_rs_sylpheed_stage3_2026_04_29.md`). Recommended: investigate this finding ahead of all the other batch-1-2-3 fixes — it's a single small fix that could unblock the renderer. + +**Other batch 2-4 highlights**: +- **PPCBUG-040** — DECODER bug: `sh64()` accessor wrong bit order for sradi (silent mis-decode). +- **PPCBUG-046** — DECODER bug: wrong MB[5] reconstruction in all 6 doubleword-rotate opcodes; `clrldi r3, r4, 32` (canonical zero-extend-to-32 idiom) becomes a no-op. +- **PPCBUG-053+054** — `bcx`/`bclrx` CTR zero-test is 64-bit; combined with PPCBUG-006 (negx) this makes `neg; mtctr; bdnz` loops run forever. +- **PPCBUG-065** — `twi 31, r0, IMM` (Xbox 360 typed-trap) drops the SIMM type code; relevant to the Sylpheed C++ throw investigation (exception-class discriminator lost). +- **PPCBUG-095..098** — pre-pass HIGH halfword-load suspects all confirmed: `lha`/`lhax`/`lhau`/`lhaux` sign-extend to 64 bits; one-line fix per opcode. +- **PPCBUG-029** — `norx` is the `not` simplified mnemonic; every `not` actively poisons GPR upper 32 bits. + +**Batch 5 (stores) findings**: +- **PPCBUG-107 cascade fully mapped**: 32 store opcodes total missing `invalidate_for_write`. Single-commit mechanical sweep. Specifically: PPCBUG-130 (9 byte/halfword stores), PPCBUG-140..144 (5 word stores, including `stw`/`stwu` which are HOT — every stack push), PPCBUG-150 (5 doubleword stores), PPCBUG-160 (3 multiple/string stores; `stmw` is in every function prologue/epilogue, multi-cache-line), PPCBUG-167 (9 FP stores). Fix-shape: `if t.has_active_reservers() { t.invalidate_for_write(ea) }` before each `mem.write_*`. +- **PPCBUG-151** (NEW concurrency bug, separate from PPCBUG-107) — `stwcx.`/`stdcx.` share the same reservation slot without a width discriminator. lwarx+stdcx and ldarx+stwcx cross-pairs silently succeed when ISA says fail. Requires adding `reservation_width: u8` to `PpcContext`. +- **PPCBUG-161** — `stswx` permanent no-op (mirror of PPCBUG-123 lswx); same XER TBC fix. +- **PPCBUG-165 / 166** — `stfs*` doesn't set FPSCR exception bits during double→single narrowing, and ignores FPSCR.RN (always uses host MXCSR). Games polling FPSCR get false negatives; directed rounding (truncate/ceil/floor) wrong by up to 1 ULP. Canary shares both. + +**Batch 6 (FPU) findings — better than expected**: significant FPSCR infrastructure has landed since the manual's frozen snapshots (`to_single` honors FPSCR.RN for single, `check_invalid_*` and `update_after_op` exist). PPCBUG-165/166 are specific to the store path; arithmetic ops are mostly wired up. +- **PPCBUG-184** — `fresx` computes full IEEE 1/b instead of ~12-bit hardware estimate; NR convergence sees different residuals. Canary shares. +- **PPCBUG-221** — `round_to_i64` NearestEven broken for `|v| > 2^52`; falls through to wrong rounding. Affects fctidx/fctiwx via PPCBUG-227. +- **PPCBUG-203** — fmsubx/fnmaddx/fnmsubx omit VXISI check; canonical `fnmsub`-based Newton-Raphson is hottest FPU path in graphics middleware. +- **PPCBUG-183 / 205** — fnmadd/fnmsub Rust unary `-` flips NaN sign bit; ISA says skip negation on NaN result. +- **PPCBUG-180 / 200** — XX/FR/FI inexact bits never set in any FPU op (single + double). +- **PPCBUG-201** — FPSCR.RN not honored for double arithmetic (only wired for single). +- **PPCBUG-185** — FPSCR.NI flush-to-zero not modeled (Xenon flushes subnormals). +- **PPCBUG-223** — fcmpo missing FPSCR[VXSNAN]/[VXVC] on NaN operands. +- **3 IDs retracted** by group-31 subagent after deeper analysis (PPCBUG-220/222/226). Audit methodology is self-correcting. + +**Frozen-snapshot drift count: 7 opcodes** (addicx, andisx, td/tdi/tw/twi, cmp/cmpi, ldarx, stwcx). Worth a separate audit-pass after this session: `find ppc-manual -name '*.md' | xargs grep -l 'frozen snapshot'` and diff each against live code. + +**Batch 7 (VMX integer add/sub + compare/min/max + logical/shift) findings**: +- **PPCBUG-275 (HIGH)** — DECODER bug: `rc_bit()` reads LSB bit 0 (correct for X/XO-form) but **VC-form vector compares put Rc at bit 10**. All 9 integer-vcmp.* opcodes have CR6 update unconditionally dead. Breaks the canonical AltiVec memchr/strncmp early-exit idiom. Fix: add `vc_rc_bit()` accessor. +- **PPCBUG-315 (HIGH)** — `vrlimi128` reads `z` field from bits 16-17 (low 2 bits of IMM) instead of bits 6-7, and `IMM` from bits 2-5 (VD128h ext + reserved) instead of 16-19. Field-extraction bug, not arithmetic. +- VMX integer add/sub (group 32) was structurally clean (only test gaps). +- VMX logical/shift (group 34) clean except vrlimi128. + +**Batch 8 (VMX permute/pack + float + multiply-sum + load/store) findings — heavy**: +- **PPCBUG-360 (HIGH)** — `vperm128` reads VC permute-control from `vd128()` instead of VX128_2 VC field at integer bits 6-8. **Every `vperm128` uses the wrong control vector.** Core of every D3D vertex shader's swizzle. +- **PPCBUG-361 (HIGH)** — `vsldoi128` SH MSB read from bit 4 (reserved) instead of bit 9. **All shifts ≥8 bytes silently become 0-7.** +- **PPCBUG-362 (HIGH)** — `vpermwi128` PERMh from VD128l bits 21-23 instead of bits 6-8. +- **PPCBUG-363 (HIGH)** — `vpkd3d128` post-pack permutation **entirely absent**; the most common D3D vertex-pack pattern (`pack=1`) is always wrong. Vertex/index buffer packing for D3D is broken. +- **PPCBUG-420/421 (HIGH)** — VMX float compares (vcmpeqfp./vcmpgefp./vcmpgtfp./vcmpbfp.) hit the same rc_bit-at-bit-0 bug as PPCBUG-275; **CR6 permanently dead for ALL VMX float compare dot forms**. +- **PPCBUG-422 (HIGH)** — VMX128_R form Rc at bit 4 (different from VC form's bit 10). +- **PPCBUG-423 (HIGH)** — `vcmp*fp128.` dot forms decode as `Invalid` (decode_op6 key4 table has only non-dot keys); any game using VMX128 float compare + CR6 feedback crashes. +- **PPCBUG-424 (HIGH)** — `vmaddfp128` operand swap: computes `VA×VB+VD` but ISA says `VA×VD+VB`. Existing denorm-flush test uses aliased vA=vD which hides the bug. +- **PPCBUG-425 (HIGH)** — `vmaddcfp128` similar operand swap. +- **PPCBUG-510 (HIGH)** — `stvewx128` aligns EA to 16 bytes and writes ALL 16 bytes; ISA says word-align EA, extract one word lane, write only 4 bytes. **Every execution corrupts 12 adjacent bytes.** Non-128 stvewx is correct; 128 variant was never updated. +- **PPCBUG-511..514 (HIGH ×4)** — all 16 VMX stores missing `invalidate_for_write` (PPCBUG-107 cascade extension). +- **PPCBUG-426..437 (MED)** — VMX float arithmetic gaps mirroring scalar FPU (no NJ subnormal flush, vnmsub double-rounding, vrefp/vrsqrtefp/vexptefp/vlogefp full IEEE precision instead of ~12-bit estimate, vrfin uses half-away-from-zero, vctsxs NaN returns 0). + +**DECODER/FIELD-EXTRACTION bug count: 8** (PPCBUG-040 sh64 for sradi, PPCBUG-046 mb_md for rld*, PPCBUG-275 rc_bit-VC-form, PPCBUG-315 vrlimi128 z/IMM, PPCBUG-360 vperm128 VC, PPCBUG-361 vsldoi128 SH, PPCBUG-362 vpermwi128 PERMh, PPCBUG-363 vpkd3d128 missing post-permute). All but PPCBUG-040/046 are in VX128_* forms. **Phase C decoder audit was the right call** — these field-extraction bugs were not on the pre-pass radar (which scanned interpreter.rs for arithmetic patterns only). + +**Notable findings from batch 1**: +- `negx` is **active** (not latent) — `!ra` flips upper 32 bits unconditionally. Every `neg` poisons GPR. +- `subfcx` CA computation is the exact 64-bit unsigned compare that the addis bug exploited — apply defensive 32-bit truncation here regardless of upstream cleanup, because a wrong CA is unrecoverable. +- `addicx` CR0 update is a **regression** vs. the frozen snapshot in `ppc-manual/alu/addicx.md` — the manual's snapshots are useful drift detectors for other opcodes too. +- `mulli`, `mullwx` write 64-bit signed products to GPR without truncation. +- `divwx` quotient sign-extension is a same-shape-as-addis bug (Canary explicitly zero-extends here). +- Pre-pass hints REFUTED: `divwux`, `mulhwx`, `mulhwux` are clean. + +**Phase C complete (decoder + disasm audit)**: 24 new IDs (PPCBUG-560..654 sparse). + +**C1 (field extractors) — structural diagnosis of Phase B's 8 field-extraction bugs**: +- **PPCBUG-560 (HIGH)** — Tests-mask-bug: `rldicl()` test helper encodes sh[5:1] and sh[0] opposite to ISA. The wrong `sh64()` formula correctly inverts the wrong test encoding, so tests passed. Fix of PPCBUG-040 sh64 MUST land together with fix of test helpers. +- **PPCBUG-561..565 (MEDIUM ×5)** — 5 accessors are missing from `decoder.rs`, forcing the interpreter to inline wrong formulas: `mb_md()` (already correct in disasm.rs:1256), `vc_rc_bit()`, `vx128r_rc_bit()`, `vx128_4_imm()`/`vx128_4_z()`, `vx128_p_perm()`, `vx128_5_sh()`. Each maps directly to a Phase B finding (PPCBUG-046, 275/420, 422, 315, 362, 361). **Structural fix pattern**: promote accessors to decoder.rs as a single sweep; interpreter consumes them. +- **PPCBUG-566 (LOW)** — `xer_tbc()` missing → blocks lswx/stswx (PPCBUG-123/124/161). +- **PPCBUG-567 (LOW)** — zero scalar accessor unit tests; Phase 4 only covers VMX128 register accessors. + +**C2 (opcode-lookup tables) — mostly clean**: +- **PPCBUG-600 (MEDIUM)** — formal cross-reference for PPCBUG-423: 5 VMX128 compare dot-forms missing from `decode_op6` key4 (vcmpeqfp128., vcmpgefp128., vcmpgtfp128., vcmpbfp128., vcmpequw128.). +- **PPCBUG-601 (MEDIUM)** — `decode_op6` overlapping windows; correctness depends on undocumented invariant. +- **PPCBUG-602..605 (LOW)** — undocumented dispatch quirks, test gaps. +- **All other dispatch tables match Canary entry-for-entry**. + +**C3 (disassembler) — analysis-DB only impact**: +- **PPCBUG-640 (HIGH)** — `fmt_bc` emits `bdnzge`/`bdzge` for pure `bdnz`/`bdz`; `uncond` bit not checked before appending CR-condition name. Every CTR-only loop misidentified in analysis DB. Fix already exists in fmt_bclr — port the pattern. +- **PPCBUG-641 (HIGH)** — re-assessment of PPCBUG-088: every `lwsync` stored with wrong mnemonic in DB. +- **PPCBUG-643 / 644 (MEDIUM)** — SIMM and D-form displacement displayed as decimal; Canary uses hex; misaligns DB queries. +- **PPCBUG-645..654 (LOW)** — extended-mnemonic gaps, test gaps. +- **Notable**: disassembler has correct LOCAL field-extraction (mb_md, vperm128 VC, vsldoi128 SH, vpermwi128 PERM) — interpreter could just port the disassembler's helpers. + +**Phase B + C — final tracker state**: ~248 PPCBUG IDs, ~55 HIGH, ~75 MEDIUM, ~110 LOW. + +**Phase D complete — AUDIT FINISHED**: triaged fix-order report at `xenia-rs/audit-report-2026-04-29.md` (project root). Every one of the 253 PPCBUG IDs is referenced — verified mechanically by `grep`-comparing entry headers in `audit-findings.md` against the report. Retracted IDs (220/222/226 never reached the tracker; 482/483 marked WITHDRAWN in tracker) are explicitly listed in the report's Notes. + +**Recommended fix order (8 phases) — abridged**: +- **P1 — PPCBUG-107 cascade** (cross-thread atomicity): single mechanical sweep adding `invalidate_for_write` to ~38 store sites. Likely Sylpheed renderer plateau cause. +- **P2 — Decoder/field-extraction sweep**: 6 missing decoder accessors (PPCBUG-561-565), `sh64()` fix + test helper fix (PPCBUG-040+560 must land together), `decode_op6` dot-form key entries (PPCBUG-423+600). +- **P3 — Other HIGH bugs**: stvewx128 corruption (PPCBUG-510), vmaddfp128/vmaddcfp128 operand swap (PPCBUG-424/425), bcx CTR 32-bit (PPCBUG-053+054), fmt_bc spurious suffix (PPCBUG-640), lwsync/sync (PPCBUG-641). +- **P4 — 32-bit ABI writeback truncation sweep**: ~30 IDs across active poisoning (negx/norx family), addis-shape (addi/addic/divwx etc.), latent writebacks, and CR0 width. +- **P5 — FPU correctness**: round_to_i64 near-2^52, FMA VXISI gaps, NaN sign preservation, FPSCR exception bits, subnormal flush, estimate precision. +- **P6 — Other MEDIUM**: trap pc-after-advance, sc LEV, twi typed-trap (PPCBUG-065 — relevant to Sylpheed C++ throw work), mtmsrd L=1, lswx/stswx XER TBC enabling. +- **P7 — Frozen-snapshot drift sweep** (8 opcodes). +- **P8 — Test gaps** (~50 IDs; bundle with each fix PR). + +**Coupled-fix matrix**: 14 must-land-together pairs documented in the report's Coupling matrix. + +## Fix session progress (2026-05-01) + +**P1 — Cross-thread atomicity sweep — MERGED (ca5b90b, 2026-05-01)** +- PPCBUGs fixed: 107, 108, 130, 140-144, 150, 151, 160, 167, 511-514 + review additions (dcbz, dcbz128 guards; stswi/stswx two-line guards) +- `cargo test --workspace --release`: 449 passed, 0 failed +- Acid test `-n 4B --parallel --reservations-table`: **swaps=2, draws=0** — renderer plateau NOT unblocked by P1. Atomics were broken, but that was not the direct cause of `draws=0`. + +**P2 — Decoder/field-extraction sweep — MERGED (52b05b1, 2026-05-01)** +- PPCBUGs fixed: 040, 046, 275, 276, 315, 360, 361, 362, 363, 369, 420, 421, 422, 423, 560, 561, 562, 563, 564, 565, 600 (21 IDs, 8 commits) +- Key fixes: sh64() bit order, mb_md() (clrldi no-op fixed), vc_rc_bit()/vx128r_rc_bit() Rc accessors + 13 vcmp sites, vrlimi128/vsldoi128/vpermwi128 field extraction, vperm128 vc field, vpkd3d128 post-pack permutation +- `cargo test --workspace --release`: 201+6+144+76+16+8+… passed, 0 failed (well above 506+ baseline) +- Independent code review: all 9 check items OK +- Acid test: **pending** (sylpheed.iso not available in CI; must run on user machine) +- Next: P3 — isolated HIGH bugs. Priority order: PPCBUG-510 (stvewx128 16-byte corruption), PPCBUG-424+425 (vmaddfp128/vmaddcfp128 operand swap), PPCBUG-053+054 (bcx CTR 32-bit + mtspr CTR truncation, **coupled**), PPCBUG-640 (fmt_bc spurious suffix), PPCBUG-641 (lwsync vs sync). + +**P3 — Isolated HIGH bugs — MERGED (f3ebaba, 2026-05-02)** +- PPCBUGs fixed: 053+054 (coupled CTR 32-bit), 424+425 (vmaddfp128/vmaddcfp128 operand swap), 510 (stvewx128 corruption), 640+650 (bdnz/bdz suffix), 641+649 (sync/lwsync), **700 (NEW)** (6 commits) +- **PPCBUG-700 — discovered during phase end-to-end review**: VMX128 register accessors (va128/vb128/vd128/vx128r_rc_bit) all disagreed with canary's bitfield struct (xenia-canary `ppc_decode_data.h:484-663`). Audit's line-2958 "confirmed-clean" assessment was based on miscounting LSB-first packed C++ bitfields. Per canary: VA = PPC[11-15] | PPC[26]<<5 | PPC[21]<<6 (3 fields, 7 bits); VB = PPC[16-20] | PPC[30-31]<<5; VD = PPC[6-10] | PPC[28-29]<<5; VX128_R Rc = PPC[25] (NOT PPC[27] as PPCBUG-422 said). Affects 30+ VMX128 opcodes; real game code with VR>=32 was silently mis-decoded. Now correct. +- `cargo test --workspace --release`: **470 passed, 0 failed** +- Acid test: **deferred to end of all phases** (per user direction). +- Next: P4 — 32-bit ABI writeback truncation sweep. ~30 IDs across 4a-4d. + +**P4 — 32-bit ABI writeback truncation sweep — MERGED (d945aea, 2026-05-02)** +- PPCBUGs fixed: ~43 IDs across 7 commits (6 batches + 1 review-fix). The big systemic sweep that restores the 32-bit ABI invariant: every GPR write zero-extends from u32, every CR0 update views the result as i32. +- Batches: 4a active poisoning NOT/SUB (006/008/018/019/028/029/030/031/033); 4a/4d coupled extsbx+extshx+CR0 (034-037); 4b immediate ALU (001-005/007); 4b mul/div + srawx coupled (009/010+011/041+042+043); 4b halfword + lwa loads (095-098/105); 4c latent + 4d CR0 catch-all (012-017/020/023-026/032/044). +- **Review-fix (49103bb)**: independent reviewer caught a blocking issue — subfx/subfcx OE handlers still used legacy `sum_overflow_64` (the helper assumes a 64-bit result; for 32-bit results with bit 31 set it spuriously flagged OV=1 on every legitimate i32::MIN). Now uses inline `true_diff != (result32 as i32) as i128`. Plus discriminating regression tests for both subfo and subfco. Reviewer also caught `mulli_overflow_wraps_to_32` rubber-stamping (test passed on both pre/post fix); rewritten with polluted upper bits. +- `cargo test --workspace --release`: **494 passed, 0 failed** +- Acid test: **deferred to end of all phases** per user direction. + +**P5 — FPU correctness — MERGED (d39d0ba, 2026-05-02)** +- PPCBUGs fixed: ~22 IDs across 7 commits (6 batches + review-fix nit). +- Batches: 5a round_to_i64 + vrfin round-to-even (221+227, 432); 5b FMA VXISI + NaN sign (181/182/183/202/203/205) — added new `check_invalid_fma_add` helper to fpscr.rs; 5c FPU XX-on-inexact (223/224/225/229/230); 5d VSCR.NJ subnormal flush (435/436/437); 5e fresx canary parity (184); 5f single-FMA vnmsubfp + vctsxs NaN sat (426/427/433). +- **Three deferred** (audit-findings status remains open): PPCBUG-201 (FPSCR.RN for double arithmetic — MXCSR set/restore wrappers), PPCBUG-185 (FPSCR.NI flush for scalar FPU — NI bit constant + post-op flush wrapper), PPCBUG-180/200 (XX/FR/FI in update_after_op — pre/post round comparison). Each requires substantial helper rework; planned as focused sub-batches. +- Review verdict: "MERGE-READY" — no blocking issues. One reviewer nit applied immediately (vrfin → stdlib `round_ties_even()`). +- `cargo test --workspace --release`: **498 passed, 0 failed** + +**P6 — Other MEDIUM correctness — MERGED (112202c, 2026-05-02)** +- PPCBUGs fixed: 13 IDs across 5 commits (4 batches + review nit). +- Batches: trap PC + sc LEV logging + typed-trap logging (063/064/065); XER TBC infrastructure enabling lswx/stswx + lswi/stswi nb fix + lmw RA-skip (123/124/125/126/161/162/566); mcrfs VX recompute + mtmsrd L=1 + mfvscr zero (068/078/080); mulld_ov verification + auto-resolved markers for 021/022/027/039. +- **Structural enum extensions deferred** (not yet needed by any consumer): `StepResult::HypervisorCall` (PPCBUG-064 sc 2 routing), `StepResult::Trap { type_code: u16 }` (PPCBUG-065 typed-trap routing — relevant if SEH dispatch added). +- **Cosmetic/test-coverage deferrals**: 642 (fmt_bcctr ISA-undefined), 643/644 (SIMM/D-form hex), 367/368 (vupkhpx/vpkpx channels), 487/495 (vsum naming), 515/516 (lvebx/lvsr docs), 601 (decode_op6 invariant doc). +- Review verdict: all 4 commits LGTM, one cosmetic nit applied (mcrfs uses fpscr::VX_ALL constant). No blocking issues. +- `cargo test --workspace --release`: **498 passed, 0 failed** + +**P7 — Frozen-snapshot drift — MERGED (a7155f4, 2026-05-02, manual regen — no xenia-rs code change)** +- PPCBUGs cleared: 3 IDs (066, 117, 145) — stale `ppc-manual//.md` snapshots. +- Methodology: ran `python3 ppc-manual/generator/generate_manual.py`. Existing idempotent generator scrapes xenia-rs + xenia-canary source for each opcode and emits 350 family pages + 598-key index.json. Verified post-regen: old "For now, just trace and continue" stubs gone; modern constructs (trap::evaluate, current reservation_line model) appear correctly. +- The `ppc-manual/` directory lives in `/home/fabi/RE Project Sylpheed/ppc-manual/` and is NOT versioned in xenia-rs/.git. Commit a7155f4 in xenia-rs is bookkeeping only (audit-findings + report). + +**P8 — Test gap closure — MERGED (4029041, 2026-05-02)** +- 38 IDs closed across branch/CR/SPR/sync, loads, stores, FPU, VMX (integer/float/permute/load-store). +- Batches: 4 batches + review-fix rename = 5 commits. 53 net new tests. +- Review verdict: LGTM, no blocking issues, no rubber-stamps. Every hand-encoded raw was mechanically cross-checked against canary's INSTRUCTION table. +- Load-bearing wins: `lswx_uses_xer_tbc_for_byte_count` and `stswx_uses_xer_tbc_for_byte_count` directly exercise the P6 XER-TBC infrastructure (these opcodes were permanent no-ops pre-P6). VX-form encoding nit caught + corrected mid-development (XO is at bit 0, not bit 1). +- ~12 LOW test-gap IDs remain Status: open (045, 047, 088, 117 already-applied, 145 already-applied, 279, 317, 322, 324, 325, 371-378, 491-494, 518, 519, 567) — non-blocking, can be closed incrementally. +- `cargo test --workspace --release`: **551 passed, 0 failed** (up 53 from 498 at P7 merge). +- Acid test: **deferred to end of all phases** per user direction. + +**ALL EIGHT PHASES COMPLETE.** Total ~161 PPCBUGs applied across 8 phases. + +**Post-P8 end-to-end review + acid test (2026-05-02)**: +- Reviewer caught one BLOCKING-LIKELY issue: my P4 batch 5 PPCBUG-105 fix changed `lwa`/`lwax`/`lwaux` from sign-extend to zero-extend, deviating from PowerISA. Hotfix at HEAD f1166d0 restored ISA-spec sign-extension. Rationale: ISA-deviation in a 64-bit-mode opcode could break any kernel-mode code; the audit's "32-bit-ABI hazard" concern was speculative. +- Cosmetic fix at HEAD 09c6c92: collapsed `fpscr.rs:289` duplicate-branch typo. +- Acid test `-n 4B --parallel --reservations-table`: **swaps=1, draws=0** (NOT swaps=2 like P1; possibly due to scheduler chatter slowing things or a minor regression in some cumulative fix). No panics, no errors, no RtlRaiseException. **Renderer plateau NOT unblocked** by the cumulative PPC correctness fixes. +- **Implication**: the Sylpheed renderer `draws=0` plateau has a non-PPC-correctness root cause. The audit caught real bugs (well-grounded against canary), but they're not the renderer-blocker. Next investigation tracks: graphics-pipeline (EDRAM resolve, RT readback), kernel HLE (event signaling, timers), or the unresolved BST-validation paradox (per `project_xenia_rs_sylpheed_event_chain_2026_04_29.md`). +- **551 tests passing**, **0 failures** at master HEAD (post-hotfixes). + +**Where everything lives**: +- **Fix plan**: `/home/fabi/.claude/plans/i-want-to-apply-delightful-lovelace.md` +- **Pre-pass triage**: `xenia-rs/audit-prepass-findings.md` +- **Per-group reports**: `xenia-rs/audit-out/group-NN-*.md` and `xenia-rs/audit-out/phase-cN-*.md` +- **Detailed findings tracker** (per-PPCBUG entries): `xenia-rs/audit-findings.md` +- **Triaged fix-order plan**: `xenia-rs/audit-report-2026-04-29.md` ← **start here for the fix session** + +**What the fix session should do next**: +1. Read `xenia-rs/audit-findings.md` end-to-end. Each PPCBUG-NNN has location, fix snippet, and notes. +2. Apply fixes in dependency order. PPCBUG-010 + PPCBUG-011 (divwx writeback + CR0) **must** land in the same commit. PPCBUG-012..-019 (latent writebacks) can land in any order but should land before PPCBUG-020 (CR0 catch-all sweep). +3. Add the proposed unit tests — particularly for `subfcx`, `addic`, `addicx`, `mulli`, `subficx`, `negx` which currently have zero coverage. +4. After each fix, run `cargo test --workspace --release` and `xenia-rs check sylpheed.iso -n 100M` to detect regressions. +5. The acid test is whether the Sylpheed renderer plateau (`swaps=2, draws=0`) breaks open after applying the high-priority fixes. The hypothesis is that one or more of these latent bugs is the next-domino-after-addis. diff --git a/migration/claude-memory/project_xenia_rs_producer_stack_trace_2026_05_03.md b/migration/claude-memory/project_xenia_rs_producer_stack_trace_2026_05_03.md new file mode 100644 index 0000000..619a313 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_producer_stack_trace_2026_05_03.md @@ -0,0 +1,120 @@ +--- +name: producer stack-trace diagnostic + parked-waiter creator IDs (KRNBUG-AUDIT-002) +description: 2026-05-03 — diagnostic-only multi-frame guest stack capture at NtCreateEvent; identified subsystems for handles 0x1004 / 0x100c / 0x15e0; corrected project_memory typo (0x15e4 → 0x15e0); 0x42450b5c is a guest-pointer wait, not a handle. +type: project +originSessionId: fc916bd6-940d-4d2f-9875-03af1d5a8493 +--- +**🔍 KRNBUG-AUDIT-002 (2026-05-03)** — diagnostic landed, producer not yet +fixed. Tests 576 → 581. Lockstep `sylpheed_n50m` golden BIT-IDENTICAL. +Master untouched (no commit yet — work sits in working tree). + +**Why:** the previous session's KRNBUG-AUDIT-001 told us the producer is +missing for the parked-waiter handles, but every handle's `created lr` +points at the same `silph::Event` ctor wrapper (sub_824A9F18, 83 +callers) — useless for subsystem ID. Needed multi-frame stack at +allocation time. + +**How to apply:** when chasing a producer for a parked Event/Semaphore +handle, add it to `--trace-handles-focus=…` and the next run dumps a +6-frame back-chain under `created stack`. Walker is in +`crates/xenia-kernel/src/state.rs::walk_guest_back_chain` (reads +`[r1]`/`[prev_sp - 8]`; gated on focus set; one `HashSet::contains` on +unfocused hot path; verified read-only — no determinism impact). + +------------------------------------------------------------------------ + +**Confirmed creator chains (every captured frame cross-checked against +sylpheed.db `instructions` — saved-LR matches a `bl` site exactly):** + +| Handle | Tid | Creator chain | Pool size | +|---|---|---|---| +| 0x1004 | 10 | static-ctor 0x8280F810 → sub_8217C850 → sub_821783D8 → sub_824A9F18 | **8-instance pool** (static ctor calls bridge sub_8217C850 8 times) | +| 0x100c | 2 | entry_point + 0x198 → sub_8216EA68 (= main) → sub_82181C20 → sub_821800D8 → sub_82181750 → sub_824A9F18 | singleton (called inside main()) | +| 0x15e0 | 16 | sub_82172BA0 → sub_821707C0 → sub_8216F618 → sub_821701C8 → sub_824A9F18 | singleton (different cluster from 0x100c) | + +All 3 ctors share **identical 4-callee shape**: +`RtlInitializeCriticalSectionAndSpinCount` + silph::Event ctor + 1-2 +silph internals. All 3 worker entries call `sub_824AA658(r3=-2, r4=5)` +first thing (silph::Thread::SetProcessor(CURRENT, 5)), then +spinlock + `RtlEnterCriticalSection` + check queue. **Canonical +work-queue worker pattern** — producer should `NtSetEvent(handle)` +under the CS but no such call ever fires in 500M instructions. + +------------------------------------------------------------------------ + +**Corrections to prior session memory:** + +- The 4-handle list `0x1004, 0x100c, 0x15e4, 0x42450b5c` had + `0x15e4` as a transcription error — actual handle is `0x15e0`. + Confirmed via `--halt-on-deadlock` thread diagnostic: + `tid=16 hw=4 idx=1 state=Blocked(WaitAny { handles: [5600] })` + (5600 = 0x15e0). +- `0x42450b5c` is **NOT a kernel handle** — `>= 0x40000000` is the + guest user heap range (handle table starts at 0x1000+4n). Tid=6 + is parking on a guest pointer (embedded `KEVENT`?) reached via a + non-`do_wait_single` wait path: audit shows + ` ` (waits=0 despite waiter_count=1). + Treat as separate bug class — needs hooks on the + `KeWaitForSingleObject(*PDISPATCHER_HEADER)` path. + +------------------------------------------------------------------------ + +**Wider parked-waiter set (-n 500M, halt-on-deadlock):** + +| Handle | Tid | Notes | +|---|---|---| +| 0x1004 | 10 | Event/Manual, 8-pool, sole waiter | +| 0x100c | 2 | Event/Manual, singleton | +| 0x15e0 | 16 | Event/Manual, singleton | +| 0x12f4 | 13, 14 | Semaphore, 2 waiters share it | +| 0x15f8 | 18 | Event/Auto, do_wait_multiple | +| 0x1038 | 4 | Event/Auto, in WaitAny[0x1038, 0x103c] | +| 0x10b0 | 5 | Event/Auto, in WaitAny[0x10b0, 0x10b4] | +| 0x42450b5c | 6 | guest-pointer wait (separate bug) | + +------------------------------------------------------------------------ + +**Next session — surgical producer hunt (do not pivot to fix yet):** + +1. **Per-handle vtable readout**: at the worker's wait point, the + `this` pointer is in `r3`/`r28` at function entry. Read + `*this[0]` (vftable addr), then vftable[-1] → MSVC RTTI + `TypeDescriptor` → class name string. Resolves the exact + `SilpheedSCS::*` class. RTTI candidates already located in PE: + `WorkHudThread2`, `WorkHudThreadTaskCaller`, `CTaskUpdater`, + `CRenderCommandQueue`, `CCollisionManager`, etc. +2. **Find producer**: once class is named, grep PE strings + + sylpheed.db for `Push*`/`Submit*`/`Enqueue*` methods on the + class. Their signal call (silph::Event::Set wrapper, not + sub_824A9F18 itself) → check whether it ever runs. +3. **Two failure modes:** + - **(A) KeSetEvent on embedded KEVENT bypasses handle waiter list** + — same family as 0x42450b5c. Smoking gun: metric + `kernel.calls{name=KeSetEvent}` is non-zero but audit shows + zero signals for the handle. + - **(B) Producer never reached** — UI/timer/vsync gate. Smoking + gun: `kernel.calls{KeSetEvent}` zero for the handle. +4. **0x42450b5c**: instrument the non-`do_wait_single` wait path + (PC=0x824cd4f4, function entry NOT in analyser's `functions` + table — likely a `KeWaitForSingleObject(*PDISPATCHER_HEADER)` + wrapper). Once audited, repeat steps 1-3. + +------------------------------------------------------------------------ + +**Build state:** working tree only — no commit. Files touched: +- `crates/xenia-kernel/src/audit.rs` (+`record_create_with_stack`, + +`created_stack` field, +2 tests) +- `crates/xenia-kernel/src/state.rs` (+`audit_create_with_ctx`, + +`walk_guest_back_chain`, +3 tests) +- `crates/xenia-kernel/src/exports.rs` (`nt_create_event`, + `nt_create_semaphore`, `nt_create_timer` switched to new helper) +- `crates/xenia-kernel/src/xam.rs` (`xam_task_schedule` switched; + removed dead `let lr = ctx.lr as u32`) +- `crates/xenia-app/src/main.rs` (focus dump prints + `created stack (N frames)` block) +- `audit-findings.md` (KRNBUG-AUDIT-002 entry + producer-trace + finding appended) + +Master HEAD per prior memory: `9d45efe`. Tests on this branch: 581 +(was 576). Goldens: `sylpheed_n50m.json` re-confirmed BIT-IDENTICAL +under `--stable-digest`. diff --git a/migration/claude-memory/project_xenia_rs_scheduler.md b/migration/claude-memory/project_xenia_rs_scheduler.md new file mode 100644 index 0000000..64b3d2e --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_scheduler.md @@ -0,0 +1,56 @@ +--- +name: xenia-rs scheduler architecture (post-Axis-1-to-5 refactor, 2026-04-23) +description: Canonical scheduler model — 6 HW slots × per-slot priority runqueues, single host thread, GuestThread as first-class, ThreadRef identity, bind-and-migrate affinity. Supersedes the old HwThread[32] one-thread-per-slot model. +type: project +originSessionId: a178fdd6-2965-4652-903a-f684cf80835d +--- +## Model in one paragraph + +Single host thread runs the interpreter (`GuestMemory` pinned). Scheduler has **6 `HwSlot`s** matching Xenon hardware. Each slot holds `runqueue: Vec` + `running_idx: Option`. A `GuestThread` owns its own `PpcContext` inline — the live register file is always the one on whichever thread the slot has pinned as running, so context switch is just a `running_idx` flip (no memcpy). Unlimited guest threads per slot. + +## Identity + +`ThreadRef { hw_id: u8, idx: u16 }` — 4-byte positional identity used across the boundary. Waiter lists in `KernelObject::{Event,Semaphore,Mutex,Thread}`, `state.cs_waiters`, `interrupts.injected_ref`, and `scheduler.timed_waits` all store `ThreadRef` (not raw hw_id). After `swap_remove` (Axis 4 migration), refs are fixed up via `MigrationFixup::apply`. + +## Compat accessors (how ~30 call-sites survived the data-model refactor) + +`scheduler.ctx(hw_id) / ctx_mut(hw_id) / ctx_mut_ref(r) / state(hw_id) / tid(hw_id) / thread_handle(hw_id) / suspend_count_mut(hw_id) / current_hw_id()` — each resolves through `slots[hw_id].running_idx`. Safe sentinel (`idle_ctx`) returned when running_idx is None. This let the refactor avoid rewriting every `hw_threads[i].ctx` site in [main.rs](xenia-rs/crates/xenia-app/src/main.rs) and [exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs). + +## Scheduling + +- **`HwSlot::pick_runnable`** — highest-priority Ready/ServicingIrq thread; tiebreak lowest idx. +- **`Scheduler::round_schedule`** — emits slot ids in rotating order starting from `rotation_cursor`, filtered by `non_empty_runnable: u8` bitset. Empty-slot fast path. `OrderMode::Seeded` layers Fisher-Yates on top of the filtered list. +- **`Scheduler::begin_slot_visit(hw_id)`** — called by main.rs at top of each slot iteration; picks runnable, sets `running_idx`, writes `self.current: Option`. +- **`Scheduler::decrement_quantum()`** — Axis 3 per-instruction tick; on hit-zero, reloads to `QUANTUM_DEFAULT = 50_000` and rotates within same-priority tier (observed next round, not mid-instruction). + +## Affinity + priority (Axis 4/5 wire-up) + +- **`KeSetAffinityThread(handle, mask) -> old_mask`** does real migration: `set_affinity_ref` finds the thread, updates mask, if current slot no longer allowed → `swap_remove` from source slot, push onto least-depth allowed slot, rewrite `PCR+0x2C`, return `MigrationFixup`. `KernelState::set_affinity` walks every waiter list and applies the fixup. +- **Self-migration handling**: if the migrating thread is `scheduler.current`, the ref is updated in place. `call_export`'s post-call ctx restore re-reads `current` (not the stashed entry ref) so ctx lands on the new slot. `main.rs`'s post-export `pc = lr` advance uses `post_ref = scheduler.current` for the same reason. +- **`KeSetBasePriorityThread` / `KeQueryBasePriorityThread`** store/read `GuestThread.priority: i32`. NT-style [-15..+15], default 0. Drives `pick_runnable`. +- **`KeSetIdealProcessor` / `KeQueryIdealProcessor` / `NtSetInformationThread`** (classes 2/3/13) wired; ideal is a spawn-placement hint (not migrate-on-change). + +## Lifecycle details + +- `exit_current` flips state to `Exited(code)` but does NOT `Vec::remove` (would invalidate peer ThreadRefs). Pruning happens at `spawn` time via `prune_exited_if_needed` when a slot reaches `PRUNE_DEPTH_THRESHOLD = 4`. +- `install_initial_thread` on `Scheduler` lives next to `spawn`; both write `PCR+0x2C = hw_id` via the `PcrWriter` trait (impl `GuestMemoryPcr` in [state.rs](xenia-rs/crates/xenia-kernel/src/state.rs)). +- `KernelObject::Thread.waiters: Vec` (not `Vec`) — necessary for correctness under per-slot runqueues. + +## Known caveat (2026-04-23) + +Axis 4's real migration distributes Sylpheed's workers across slots differently than the old 32-slot one-per-slot model. The resulting wait/signal chain trips a single `scheduler.deadlock_recoveries` event during boot; default force-wake recovery resolves it and the game progresses to **VdSwap=2** (up from pre-Axis-4's 1). Under `--halt-on-deadlock` this trips `scheduler.deadlock_halts = 1` at ~7.5M cycles. The issue is a latent HLE sync-primitive gap exposed by correct migration, not an Axis 4 defect. Root cause: one of tid=1/3/4/7's blocking events isn't being signaled by its expected source after thread layout changes. Track down by instrumenting the specific handle values (0x10FC, 0x1014, 0x1104, 0x10DC/0x10F0) in a future session. + +## Files + +- [xenia-cpu/src/scheduler.rs](xenia-rs/crates/xenia-cpu/src/scheduler.rs) — workhorse (~35 tests covering all 5 axes) +- [xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — `KernelState::set_affinity` orchestrator, `call_export` ctx swap via `ThreadRef` +- [xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) — `ke_set_affinity_thread` (0x97), `ke_set_base_priority_thread` (0x99), `ke_query_base_priority_thread` (0x81), `ke_set_ideal_processor` (0x98), `ke_query_ideal_processor` (0x82), `nt_set_information_thread` (0xFB) +- [xenia-kernel/src/objects.rs](xenia-rs/crates/xenia-kernel/src/objects.rs) — waiter lists as `Vec` +- [xenia-kernel/src/interrupts.rs](xenia-rs/crates/xenia-kernel/src/interrupts.rs) — `injected_ref: Option` (not `injected_hw: u8`) + +## Metrics added + +- `scheduler.spawn.ok` — successful spawns +- `scheduler.spawn.rejected` — spawn failures (should stay 0) +- `scheduler.deadlock_recoveries` — force-wake events (non-zero post-Axis-4; see caveat) +- `scheduler.deadlock_halts` — halts under `--halt-on-deadlock` diff --git a/migration/claude-memory/project_xenia_rs_sylpheed_event_chain_2026_04_29.md b/migration/claude-memory/project_xenia_rs_sylpheed_event_chain_2026_04_29.md new file mode 100644 index 0000000..9d4b51d --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_sylpheed_event_chain_2026_04_29.md @@ -0,0 +1,158 @@ +--- +name: Sylpheed event 0x1004 producer trace (2026-04-29) +description: Identified the C++ singleton structure for tid=10 worker, found the destructor is the ONLY signaler of event 0x1004; the runtime "wake worker for work" path is unidentified. +type: project +originSessionId: c44cbfc2-438f-45c9-996c-06eddf9dcb93 +--- +## What was traced + +Goal: identify what should signal `Event/Manual` handle 0x1004 (waited on by tid=10, sub_82178950) so that the renderer worker can wake and progress. + +Used `--trace-handles` audit at -n 500M and DuckDB analysis DB. Key chain: + +### The singleton at 0x828F3EC0 + +- 4-byte struct, 180 bytes long, zero-initialized by [`sub_822C1D00`](xenia-rs/sylpheed.db). +- Initialized by [`sub_821783D8`](xenia-rs/sylpheed.db) which: + - Registers in BST via `sub_82454498` + `sub_82454580` (the same BST validator chain we hit the throw on) + - Calls Sylpheed's CreateEvent wrapper [`sub_824A9F18`](xenia-rs/sylpheed.db) at 0x821784F4 → returns event handle 0x1004 + - Stores handle at `singleton[+120]` (so `mem[0x828F3F38] = 0x1004`) +- Singleton accessor (init-on-first-use): [`sub_8217C850`](xenia-rs/sylpheed.db) — checks init flags at 0x828F4888 / 0x828F4898, calls initializer, registers DESTRUCTOR via atexit (`sub_825ED268`). + +### The thread-spawning chain + +- [`sub_82178E50`](xenia-rs/sylpheed.db) is the worker's "start" function. Called from [`sub_821737F0`](xenia-rs/sylpheed.db). +- It enters CS, sets `singleton[+132] |= 1` (running flag), calls `bl 0x8216DE98` (queues work via copy-into-buffer, NO event signal here), calls `sub_82178F60` (which itself calls `sub_82175F10` — yes, the same `sub_82175F10` from the throw investigation), and if successful calls `sub_82172370` (ExCreateThread wrapper) with entry=`sub_82178950` and start_ctx=`0x828F3EC0`. +- Stores returned thread handle at `singleton[+116]`. + +### The worker's wait pattern + +`sub_82178950` (tid=10's entry): +1. Init helpers (allocate workspace via `sub_82455730(0x7FFFFFFF)`) +2. **`bl sub_824AA330`** with `r3 = singleton[+120] = 0x1004`, `r4 = -1` → wait FOREVER on event 0x1004 +3. After waking, check `singleton[+132] & 0x10000` (bit 16): if set → exit; else → jump to work loop at 0x82178DB8 + +### The only known signaler — the destructor + +Function at `0x82178600` (no direct callers found in xrefs): +- Reads `singleton[+132] & 0x1` (running flag); if not set, skip. +- Else: set `singleton[+132] |= 0x10000` (bit 16 = "exit requested"), `bl sub_824AA2F0` (SetEvent wrapper) on `singleton[+120] = 0x1004`, then `bl sub_824AA330` to wait_single on `singleton[+116]` (join the thread), then enter CS and cleanup. + +This is registered as a destructor via atexit (`sub_825ED268`) inside `sub_8217C850` at instruction 0x8217C8B0 with arg=trampoline 0x8284C9E0 (which loads `0x828F3EC0` and tail-calls the destructor function). atexit destructors only fire at program exit — never during normal runtime. + +## The mystery + +**During normal runtime, NOTHING signals event 0x1004 to wake tid=10 for actual work.** The destructor only runs at program exit. Audit confirms: 0 signals on event 0x1004 at -n 500M. tid=10 sits in `bl sub_824AA330` forever, holding up the renderer chain. + +This is true for the same pattern with events 0x100c, 0x15e4, 0x42450b5c (all Manual reset, all ``). + +## What's NOT the answer + +- It's NOT a missing direct call: 89 callers of the SetEvent wrapper sub_824AA2F0 were enumerated; none signal these specific events. +- It's NOT a vtable dispatch we missed: searches for stores of 0x82178600/0x82178608 to static data find nothing. +- It's NOT in `sub_8216DE98` (the queue-work fn): it's a `memcpy`-into-ringbuffer-style copy, no SetEvent. +- It's NOT in `sub_82178F60` (the spawn-prep fn): it does string compares with config keys ("PATH"/"SETTINGS") and ends up calling `sub_82175F10` (the throw site) — but no SetEvent on 0x1004. + +## Hypothesis for next session + +The signaling must come via one of these mechanisms (not yet checked): + +1. **Indirect dispatch via a function pointer stored at runtime**. Sylpheed's BST registry (the same one that caused the throw) might store function pointers as part of "registered objects". When some event happens (e.g., a frame deadline), the registry's registered callbacks fire — and one of those callbacks signals the event. The fact that `sub_821783D8` registers `singleton[+56]` in the BST is highly suspicious. +2. **A timer-driven dispatch**. The 32-times-waited timer 0x15c0 in the audit is interesting: it's `Timer/Auto` with deadline driven by NtSetTimerEx. The timer fires periodically. Some periodic callback might signal the worker events. +3. **Polling vs event-driven mismatch**. Sylpheed might EXPECT the worker to wake periodically (e.g., from a timer or vsync interrupt) rather than from a discrete event. We may be holding the wait too tightly when the game expects spurious wakeups. + +## Concrete next steps for a future session + +### Step 1 (quick) — DONE: the BST callback walker doesn't exist as expected +Searched for it: in BST module 0x82454000..0x82455000 (20 functions, 4 KiB), only TWO indirect calls (`bcctrl`) exist — both are byte-level string transforms (`sub_82454278` iterates over a buffer of bytes, `sub_82454170` is a single-shot vtable call), NOT a "for each registered object call method" walker. Functions that call BOTH `sub_82454498` (BST getter) AND `sub_824AA2F0` (SetEvent wrapper) are only 4: `sub_821783D8` (DB-misattributed; actually the destructor at 0x82178600), `sub_8228D760` (lazy event create+signal for one specific object, not a walker), `sub_822AE1F0` (config processing), `sub_8280C2C0` (renderer-wide static initializer). NONE of them iterate the BST and dispatch callbacks. **The BST is purely a validity registry, not a callback registry.** + +### Step 2 — RUNTIME APPROACH (recommended next): Canary boot trace diff +With static analysis exhausted, this is the most likely productive direction. Run `xenia-canary` against the same ISO with verbose threading + event logging (`--log-debug=1` flag in canary). Capture: +- The sequence of NtCreateEvent / NtSetEvent / NtWaitForSingleObjectEx calls +- Which threads (by tid mapping) signal which handles +- The order of guest-LR addresses at each NtSetEvent call + +Diff against our audit. The first signal call on Canary that we never make is the missing piece. If Canary never signals 0x1004 either, then it must not block on it the same way — meaning some upstream HLE returns differently and the worker takes a different code path that doesn't reach the wait at sub_82178950+0x821789C4. + +### Step 3 — Targeted instrumentation: dump per-handle histogram from our run +Modify `nt_set_event` and `nt_wait_*` to log `(handle, lr, tid)` to a structured trace file. Add same to `wake_eligible_waiters`. Run for `-n 5_000_000_000` (~3 min). Cross-correlate per-handle. This gives a per-handle time series that the audit's bounded ring can't. + +### Step 4 — Hypothesis: the worker at 0x82178DB8 *should* be reachable WITHOUT signal +The post-wait code at sub_82178950+0x821789C8 reads `singleton[+132] & 0x10000`; if NOT set, jumps to 0x82178DB8 to do work. The wait at 0x821789C4 is INFINITE. So the worker requires SOME wake. But what if the "first wake" is supposed to come from the spawner thread completing some work BEFORE spawning the worker, then immediately signaling? Trace `sub_82178E50`'s exact return path with instrumentation: does it `bl 0x824AA2F0` on `singleton[+120]` after spawning that we missed? (The destructor signals; check carefully whether the START path in `sub_82178E50`'s "already_running" branch (0x82178E80..0x82178EB0) signals via `bl sub_8216DE98` — note bl 0x8216DE98 *queues* but doesn't signal. Look one level up at the caller `sub_821737F0`.) + +## Files unchanged this session + +No code changes. Investigation only. Throw-fix from [project_xenia_rs_sylpheed_throw_fix_2026_04_29.md](project_xenia_rs_sylpheed_throw_fix_2026_04_29.md) is the only modification active. + +## Open question for the user + +Step 1 was attempted in this session and ruled out: the BST is not a callback registry. The most promising next direction is Step 2 (Canary diff) — pure analytical work has exhausted what's tractable; we need a ground-truth comparison. + +## Final session findings (2026-04-29 update) + +Traced one level up: the caller of `sub_82178E50` (spawner) is `silph::Silph::Impl::OnInit` at 0x82173990 (no DB function entry — discovered via xref + `addi r5, r11, 6076` string load matching the 'silph::Silph::Impl::OnInit' message). The OnInit body runs: + +1. `bl 0x8217C850` — singleton accessor (returns thread_obj_ptr) +2. `bl sub_82178E50` — spawn tid=10 +3. Various config processing (string lookups, no event signals) +4. `bl 0x821835E0` — string-table lookup (29 entries at 0x820A_0000+5680), returns index or 30 if not found. **Returns 28 means specific string match → success path; else returns 30 → exit with failure**. +5. If success: printf "RenderDevice initialized. spend %d ms.\r\n" + more cleanup + +**No event 0x1004 signal anywhere in OnInit's execution after spawning tid=10.** The renderer init completes without ever waking its worker. + +`sub_82178F60` (the spawn-prep) ALWAYS returns 0 (literal `addi r3, r0, 0` at 0x8217919C). And `sub_82178E50`'s spawn condition is `bc 4, 4*cr6+eq` which means **branch on NOT-EQ → branch is NOT taken when r3=0** → fall through to spawn. So thread is always spawned. ✓ Matches the 18 ExCreateThread observation. + +## Significant new insight: the throw might be REQUIRED for proper init + +Sub_82178F60 (in both r4=0 and r4=1 paths) calls into `sub_82175F10` (the throw site we silenced). With proper C++ SEH, the throw at sub_82454770(lhs=0x828F3F68) would propagate up: sub_82454770 → sub_82175F10 → sub_82178F60 → sub_82178E50 → silph::OnInit → ... — looking for a `__try`/catch. + +Our current "fix" forces sub_82454600 to return valid, AVOIDING the throw entirely. But if the catch handler is responsible for **lazy registration** (creating the missing object AND signaling its event), our fix bypasses the registration logic. Result: object pretends valid → workers spawn → workers wait for signal that never comes. + +Canary "swallows" the throw similarly to us, but Canary also doesn't implement SEH dispatch (per the `xboxkrnl_debug.cc:131-151` comment "TODO(benvanik): unwinding. This is going to suck."). So Canary EITHER: +- (a) hits a different upstream HLE that pre-registers the object (no throw needed) +- (b) faces the same deadlock but progresses anyway (somehow) + +Given our audit shows no signal-fires-but-fails pattern, hypothesis (a) is most likely. The "missing pre-registration" must come from some HLE we implement differently. + +## REVISED recommendation + +The throw fix from [project_xenia_rs_sylpheed_throw_fix_2026_04_29.md](project_xenia_rs_sylpheed_throw_fix_2026_04_29.md) is now SUSPECT — it may mask the real bug. Two paths forward: + +### Path A: Roll back the throw fix and implement minimal SEH dispatch (Branch B from original plan) +This is the multi-day work originally deferred. Implement enough of `__CxxFrameHandler3` + .pdata/.xdata parsing to dispatch the catch. If the catch handler (when actually run) signals events and registers objects, this resolves the deadlock at root. + +### Path B: Find the upstream HLE difference (Stage 2 Branch A from original plan) +Identify what UPSTREAM HLE Canary returns differently such that the lhs=0x828F3F68 lookup succeeds WITHOUT needing the throw. The key test: in our environment, what populated the BST registry's static-data range at 0x828F3F68? Find the function that registers `0x828F3F68` in the BST. Trace its callers. Find where it WOULD have been called but wasn't. The missing call is the upstream HLE divergence. + +Path B is the original recommendation and remains the cleanest fix. We should: +1. Search xrefs for `bl sub_82454580` (BST insert) where the inserted address ends up being `0x828F3F68`. This is hard because the address is data-flow dependent. +2. Easier: search for code that loads `0x828F3F68` as an immediate/constant. If found, that's the "should-register-this" site. Trace why it doesn't run. + +## Decisive finding: 0x828F3F68 IS registered just before validation + +Searched xrefs to `0x828F3F68`. Found 5 references: +- `sub_82175E68` (twice, at 0x82175EA0 + 0x82175EC4) — **the BST REGISTRATION site** +- `sub_821766A0` (twice, at 0x821766C4 + 0x821767C8) — registers a list of 16 elements + 0x828F4068 +- 0x8284C9C4 — singleton trampoline data + +**`sub_82175E68` is called from sub_82178F60 at instruction `0x82179134`, EIGHT instructions before the throw site call to `sub_82175F10` at `0x82179144`.** Same function, same thread, sequential execution. So: + +``` +sub_82178F60: + ... + 0x82179134: bl sub_82175E68 ; REGISTERS 0x828F3F68 in BST + 0x82179144: bl sub_82175F10 ; VALIDATES 0x828F3F68 (PPC says: not in BST → throw!) +``` + +The throw shouldn't fire normally. The PPC's validator failing to find 0x828F3F68 in the just-populated BST is the PRIMARY bug. **Our throw fix is masking the primary bug, not fixing it.** And the same memory-coherence (or whatever-else) bug that prevents the validator from seeing the registration likely also prevents OTHER reads from seeing OTHER writes — which explains why event 0x1004 never wakes its waiter (the worker's poll-bit might not see the producer's set). + +This makes the **paradox from [project_xenia_rs_sylpheed_throw_2026_04_28.md](project_xenia_rs_sylpheed_throw_2026_04_28.md) the load-bearing problem**: PPC reads not seeing PPC writes from the same thread. + +## Strongly recommended next direction + +**Investigate the memory-coherence paradox at the emulator level**. Build a smaller reproducer: +1. In `step_block` (the interpreter loop), trace one specific BST node's `[+8]` slot for the failing thread. Log every WRITE to address `0x40249F68` and every READ from it, with PC + instruction. +2. Cross-check: did the PPC actually execute a write to `0x40249F68` between enter-CS and the validator's read of that slot? If no write, our Rust CEIL is wrong about it being there. If yes, there's a write-visibility bug. +3. If write-visibility bug: candidates are (a) the basic-block cache pre-decoding stale instructions, (b) memory MMU bug for that page, (c) instruction-cache vs data-memory aliasing. + +This is a TRACE/INSTRUMENTATION job, not pure static analysis. It's what should have been the focus of the previous session but was deferred. diff --git a/migration/claude-memory/project_xenia_rs_sylpheed_stage3_2026_04_29.md b/migration/claude-memory/project_xenia_rs_sylpheed_stage3_2026_04_29.md new file mode 100644 index 0000000..6f18798 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_sylpheed_stage3_2026_04_29.md @@ -0,0 +1,96 @@ +--- +name: Sylpheed post-throw-fix Stage 3 thread-state analysis (2026-04-29) +description: Detailed thread state at -n 4B post-throw-fix. 18 workers spawn, throw fully silenced, but renderer is deadlocked on unsignaled events. Stage 3 gate (draws > 0) NOT met. +type: project +originSessionId: c44cbfc2-438f-45c9-996c-06eddf9dcb93 +--- +## Run results post-throw-fix (-n 4B) + +``` +swaps=2, draws=0, resolves=0, packets=1.7B, imports=46M, RtlRaiseException=0 +VdSwap=2, gpu.interrupt.delivered{source=0}=26630 (vsync), source=1: 2 +NtSetEvent=3334, NtCreateEvent=394, NtWaitForSingleObjectEx=1.5M +ExCreateThread=18, scheduler.spawn.ok=18 +``` + +The single throw is fully silenced; only ONE `VALIDATOR forced` log line per run (call_n=32, lhs=0x828F3F68). Game progresses to spawning all 18 workers + opening renderer resource files (`hidden/Resource3D/ptc_pack.xpr`, `Common.xpr`). + +## Thread state at -n 500M (deadlock snapshot) + +3 threads `Ready`: tid=1 (main, polling loop), tid=12, tid=15, tid=17, tid=7, tid=19. **10 threads `Blocked` on events/semaphores. 2 suspended (tid=8, 9). 1 exited (tid=11)**. + +### Blocked-on-WaitAny worker map + +| tid | hw | entry func | waits on | object state | +|---|---|---|---|---| +| 2 | 5 | sub_82181830 | handle 0x100c | Event(sig=false, **mr=true**) | +| 3 | 3 | sub_8245A5D0 | handle 0x1014 | Semaphore(0/maxint) | +| 4 | 3 | sub_82450A28 | handles 0x1038, 0x103c (deadline=1600) | Event mr=false + Semaphore 0/max | +| 5 | 5 | sub_82457EF0 | handles 0x10b0, 0x10b4 (deadline=1600) | Event mr=false + Semaphore 0/max | +| 6 | 2 | sub_824CD458 | handle **0x42450b5c** (deadline=3000) | Event(sig=false, **mr=true**) — heap-pointer object | +| 10 | 5 | sub_82178950 | handle 0x1004 | Event(sig=false, **mr=true**) | +| 13 | 1 | sub_822C6870 | handle 0x12f8 | Semaphore(0/maxint) shared with tid=14 | +| 14 | 3 | sub_822C6870 | handle 0x12f8 | (same) | +| 16 | 4 | sub_82170430 | handle 0x15e4 | Event(sig=false, **mr=true**) | +| 18 | 3 | sub_823DDB50 | handles 0x15fc, 0x01000000 | Event mr=false + something | + +### What's most suspicious + +Four `mr=true` events with `sig=false` waiters that never get signaled: +- 0x1004 (tid=10, sub_82178950) +- 0x100c (tid=2, sub_82181830) +- 0x15e4 (tid=16, sub_82170430) +- 0x42450b5c (tid=6, sub_824CD458) — guest heap-ptr "kernel object" form + +These are in the 0x82170-0x82181 / 0x824CD address ranges — Sylpheed's renderer worker entries. + +## tid=1 (main) state analysis + +PC=0x822F1E00 in [sub_822F1AA8](xenia-rs/sylpheed.db) — a **frame-loop / poll loop**. The flow: +1. Check bit 3 of `r30[0]` (frame-ready flag). +2. If set, do timing work via thunk `bl 0x8284E45C` (KeQueryPerformanceCounter or similar). +3. Loop back to 0x822F1BCC. + +This is the game's normal main loop. Bit 3 of r30[0] is the "running" / "frame ready" flag, polled in a tight loop. It's healthy — main is just waiting for a frame to be ready. + +## Hypothesis: the Sylpheed renderer signal-chain has multiple breaks + +Even with the throw fixed, the workers can't progress because their wakeup events are tied to GPU/frame events that depend on: +1. Draws completing (won't happen until renderer init finishes) +2. Renderer init finishing (won't happen until workers process their queues) +3. Workers processing queues (won't happen until events are signaled) + +This is a cascade. The throw fix was necessary but not sufficient. + +## Side observation — `gpu.interrupt.delivered{source=1}=2` matches `VdSwap=2` + +Source-1 interrupts (likely "swap complete") fire only on VdSwap. They're a chicken-and-egg dependency on draws. + +## Recommended next directions (in priority order) + +### Option A — Find the missing producer for one specific event + +Pick the simplest cascading dependency. Suggestion: tid=10's wait on event 0x1004 (mr=true). Trace: +1. Identify guest function that calls `NtCreateEvent` to create handle 0x1004 (it's an early event since handle is small — likely first or second event created). +2. Find xrefs from same function to `NtSetEvent` — that's the SHOULD-signal site. +3. Compare which guest function should call NtSetEvent for handle 0x1004 vs what currently runs. + +### Option B — Trace per-handle NtSetEvent counts + +Add per-handle telemetry to `nt_set_event` to dump which events fire how many times. Cross-reference with the blocked-on list. Identify events that are CREATED but never SIGNALED. + +### Option C — Compare with Canary + +Run the same boot path under xenia-canary with logging enabled. See which NtSetEvent / KeSetEvent calls fire on Canary that don't fire on us. The diff is what we're missing. + +### Option D — Accept Stage 3 gate is unreachable without a multi-day investigation + +The cascade structure of Sylpheed's renderer init means small fixes won't help; we need to identify the FIRST event in the chain that needs to fire. That's not visible from a single-session diagnostic. Defer Stage 3 / 4 to a focused multi-day investigation. + +## Files unchanged this session + +The throw-fix from [project_xenia_rs_sylpheed_throw_fix_2026_04_29.md](project_xenia_rs_sylpheed_throw_fix_2026_04_29.md) remains in place. No code changes this session — only diagnostic runs. + +## Open question for the user + +Take Option A on tid=10/event 0x1004 as a focused next session, or step back and pursue Option C (Canary diff)? diff --git a/migration/claude-memory/project_xenia_rs_sylpheed_throw_2026_04_28.md b/migration/claude-memory/project_xenia_rs_sylpheed_throw_2026_04_28.md new file mode 100644 index 0000000..13d5208 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_sylpheed_throw_2026_04_28.md @@ -0,0 +1,106 @@ +--- +name: Sylpheed VdSwap=2 plateau — C++ throw diagnostic landed (2026-04-28) +description: One-shot RtlRaiseException stack-walk diagnostic + identified the std::runtime_error("lhs is not valid instance") that gates Sylpheed renderer init +type: project +originSessionId: c44cbfc2-438f-45c9-996c-06eddf9dcb93 +--- +## Problem + +Sylpheed boots cleanly to VdSwap=2 then plateaus indefinitely. At -n 100M / 500M: +- `swaps=2` unchanged across both +- `packets≈206 M` but `draws=0, resolves=0, render_targets=0, shader_blobs=0, texture_decodes=0` +- vsync ISRs scale linearly (628 / 3,295), `unimpl=0` +- 18 worker threads spawn; renderer-side workers park on `WaitAny` on per-thread events nobody signals + +## Stage-1 diagnostic landed + +[`xenia-rs/crates/xenia-kernel/src/exports.rs:1878`](xenia-rs/crates/xenia-kernel/src/exports.rs#L1878) `rtl_raise_exception` rewritten to: +1. Use the correct EXCEPTION_RECORD layout (`info[0..]` starts at `+0x14`, not `+0x18`; old comment was off by 4 — Canary parity at `xenia-canary/src/xenia/kernel/kernel.h:227-236`). +2. Read `info[0]` (magic, expected `0x19930520`). +3. On first fire of code `0xE06D7363`, walk the PPC frame chain ~6 levels using the back-chain convention (`prev_sp = mem[sp]`, `saved_lr = mem[prev_sp - 8]`). +4. Decode the runtime_error object's `_Mystr` (offset `+0x0C` per the destructor disasm at `sub_8216DBC0`). Layout: `vtbl(0), _Mywhat(4), _Mydofree(8), _Mystr(0xC)`. +5. One-shot via new `KernelState::cxx_throw_logged: bool` (initialized in `with_gpu`). + +Diagnostic added a `cxx_throw_logged` field to [`xenia-rs/crates/xenia-kernel/src/state.rs`](xenia-rs/crates/xenia-kernel/src/state.rs) struct (after `thunks_by_ordinal`). + +## What the diagnostic shows + +Single throw at ~1.2s on tid=1. Stack walk: + +``` +L0: fp=0x700ff2a0 lr=0x82612b50 → _CxxThrowException +0x70 (after bl RtlRaiseException) +L1: fp=0x700ff350 lr=0x825f2444 → __CxxThrow wrapper sub_825F23D8 +0x6C +L2: fp=0x700ff3e0 lr=0x824547e8 → THROW SITE: sub_82454770 +0x78 +L3: fp=0x700ff4c0 lr=0x82176134 → caller: sub_82175F10 +0x224 +L4: fp=0x700ff600 lr=0x82179148 → caller: sub_82178F60 +0x1E8 +L5: fp=0x700ff6e0 lr=0x82178ee4 → caller: sub_82178E50 +0x94 +L6: fp=0x700ff760 lr=0x82173a4c → caller: unnamed fn @ 0x82173990 +0xBC +``` + +Throw call at PC `0x824547e4` (`bl 0x825F23D8`). Throwinfo at `0x82117388`. Class is **`std::runtime_error`** (verified via `CatchableTypeArray @ 0x8211737c → TypeDescriptor @ 0x8289aed4` containing mangled `.?AVruntime_error@std@@`). + +**Message string literal**: `0x820B0000 + 22160 = 0x820B5690` = `"lhs is not valid instance"`. Adjacent at +22188 = `"rhs is not valid instance"`. Read directly from the .pe via Python (object decode in the diagnostic returned mysize=0/myres=25/heap_ptr=0x48bd64d1 — the basic_string layout differs from MSVC-default and needs re-investigation if precise decode is wanted; the literal is unambiguous from the disassembly). + +## Subsystem identified + +L6's enclosing function (no entry in `functions` table — gap; starts at `0x82173990` with `mfspr r12, LR; stwu r1, -288(r1)` prologue) processes config sections by name. Literal cluster around `0x820A1860..0x820A18B0`: + +- 6044, 6056=`SYSTEM`, 6064=`ENTRY_POINT`, 6076, 6088=`'silph::Silph::Impl::OnInit - RenderDevice initialized. spend %d ms.\r\n'` +- 6264=`SOUNDS`, 6272=`FILES`, 6280=`STAGES`, 6288=`PARAM`, 6296=`PATH`, 6304=`SETTINGS`, 6320=`'unnamed_namespaces::...::BankSlots::get_bgm_data_area - Failed : previous bgm is available. call release_bgm()'` + +So we're inside **`silph::Silph::Impl::OnInit`**'s config-tree walker. + +## What `sub_82454770` actually does + +It's a generic guarded list-swap helper. Pattern: `if (!is_valid_instance(lhs)) throw runtime_error("lhs is not valid instance"); if (!is_valid_instance(rhs)) throw "rhs ..."; if (*lhs != *rhs) swap_nodes(lhs, rhs);`. Has 29 guest callers across many subsystems (sub_82175F10, sub_82176880, sub_82180708, sub_82187...A0, sub_821A5F10, etc.) — it's a Sylpheed-internal utility, not a single subsystem. + +`sub_82454600` (the validator) walks an intrusive linked list of "registered instances" guarded by a critical section at `0x828F3DA8`. Each node has `is_valid` byte at `+17` and a key/range field at `+12`. The list is initialized lazily by `sub_82454498` (registry singleton getter) which Initializes the CS with spin=256 + creates a sentinel head + flips the init flag at `0x828F3EB0`. + +Reading the validator more carefully: it walks and compares `node[+12]` against `r30` (lhs) using `subfc/subfe` — a range-membership test. So **lhs being "invalid" means it's not inside any registered range**. + +## Canary parity + +`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_debug.cc:131-151` — Canary's `RtlRaiseException` is also a stub (only handles `0x406D1388` thread-name + `0xE06D7363` C++ exception, logs and returns). `RtlUnwind`/`RtlVirtualUnwind`/`RtlLookupFunctionEntry` are exported but have **no implementations**. Comment: `// TODO(benvanik): unwinding. This is going to suck.` So Canary's behavior is identical to ours on this code path. Either: +- (a) Canary doesn't trip this throw on its boot (some upstream HLE returns a value that keeps lhs in the registered range), OR +- (b) Canary trips the same throw, swallows it the same way, and the post-throw state is good enough that the renderer still progresses. + +We don't yet have a concrete data point that distinguishes (a) from (b). + +## Why current swallow plausibly breaks the renderer + +After our `RtlRaiseException` returns, `_CxxThrowException` returns to `sub_825F23D8` (the `__CxxThrow` wrapper), which returns to the throw-site function `sub_82454770` at PC `0x824547e8`. Post-throw code in the throw-site function then **runs as if the throw didn't happen**. In `sub_82175F10` specifically, the post-bl code at `0x82176134` is `stw r18, 0(r25)` — an unconditional store via `r25` whose validity was not yet established. With swallow, this store may corrupt memory at an arbitrary address. + +Real C++ behavior would be: `bl _CxxThrowException` never returns; control transfers to the matching catch handler via .pdata/.xdata-driven SEH dispatch. We don't implement that (Canary doesn't either). + +Only **one** throw fires (`kernel.calls{name=RtlRaiseException} = 1` at -n 100M and -n 500M). So the corruption happens once during early boot and persists. + +## Next-session candidates (in priority order) + +1. **Fix the runtime_error decoder.** The current diagnostic gets the message via `_Mystr` at `+0x0C` but the basic_string layout returned mysize=0/myres=25 — wrong field order or different size_t alignment. Dump 32 bytes of the object via `mem.read_bytes` and log as hex to nail the layout once. The literal is already known from disasm (0x820B5690) so this is just for completeness. +2. **Identify why lhs is "not in any registered range"**. Two angles: + - **Trace `sub_82454498`** (the registry singleton) — log every node added to the intrusive list (instrumented hook on the `bl 0x82454498` from any of its ~hundreds of callers). See what ranges it tracks, compare against the `r30` value at the throw moment (`r30 = 0x???` — wasn't dumped this session, add to L0 frame log). + - **Trace `sub_82454770` itself** — log each call's `lhs/rhs` and whether the validator passes. Identify the FIRST call that fails. Look at what allocated `lhs` upstream. +3. **Implement minimal SEH dispatch** as a backstop. Parse the XEX `.pdata` (RUNTIME_FUNCTION table) + `.xdata` (UNWIND_INFO + `__CxxFrameHandler3` blob with magic `0x19930520`). For the throw-site function (`sub_82454770`), find the calling function's try-block map; on match, restore non-volatile regs + transfer to catch funclet. This is multi-day work but is the proper fix for any future C++ throw. +4. **wgpu→ShadowEdram readback** is still deferred (no point until draws fire). + +## What stays + +- `cxx_throw_logged` field in `KernelState` and the corresponding init in `with_gpu`. +- Rewritten `rtl_raise_exception` with correct EXCEPTION_RECORD offsets, `info[0]` magic capture, 6-level frame walk, and runtime_error decoder (still slightly off — see point 1 above). + +## Verification command + +```bash +./target/release/xenia-rs check "../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" -n 100M 2>&1 | grep -E "cxx_throw|RtlRaiseException" +``` + +## Files touched + +- [xenia-rs/crates/xenia-kernel/src/state.rs](xenia-rs/crates/xenia-kernel/src/state.rs) — added `cxx_throw_logged: bool` field + init. +- [xenia-rs/crates/xenia-kernel/src/exports.rs:1878](xenia-rs/crates/xenia-kernel/src/exports.rs#L1878) — replaced `rtl_raise_exception` with diagnostic version. + +No regressions introduced; the new code only fires on the first MSVC C++ throw; everything else is unchanged. + +## Open question for the user + +Whether to invest in (a) tracing `sub_82454498`/`sub_82454770` to find the missing registration (one focused session), or (b) implementing minimal SEH dispatch (multi-session). The plan file at `/home/fabi/.claude/plans/yes-take-any-action-noble-dragon.md` covered both as Branch A vs Branch B. diff --git a/migration/claude-memory/project_xenia_rs_sylpheed_throw_fix_2026_04_29.md b/migration/claude-memory/project_xenia_rs_sylpheed_throw_fix_2026_04_29.md new file mode 100644 index 0000000..df3abf1 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_sylpheed_throw_fix_2026_04_29.md @@ -0,0 +1,89 @@ +--- +name: Sylpheed C++ throw silenced via r31 override (2026-04-29) +description: Force-corrected sub_82454600 return value at leave-CS HLE — throw eliminated but draws=0 plateau persists due to downstream blockers +type: project +originSessionId: c44cbfc2-438f-45c9-996c-06eddf9dcb93 +--- +## Outcome + +The single `std::runtime_error("lhs is not valid instance")` throw at PC `0x824547e4` is **fully silenced** by overriding `ctx.gpr[31]` from 32 → 14 in `rtl_leave_critical_section` whenever: +- `cs_ptr == 0x828F3DA8` (BST registry) +- `ctx.lr == 0x824546C8` (leave-CS return addr in sub_82454600) +- Our Rust CEIL search finds the lhs in the BST (`match_found=true`) +- BUT the PPC computed `r31=32` (no-match) + +**Verification command** (-n 100M, 500M, 1B all show same): +``` +./target/release/xenia-rs check sylpheed.iso -n 1000000000 --out /tmp/d.json +``` +Result: `kernel.calls{name=RtlRaiseException}` is **absent from metrics summary** (was 1 before). One `VALIDATOR forced` log line for call_n=32, lhs=0x828F3F68. + +## What unlocks vs. what doesn't + +**Unlocked** (post-fix, -n 200M..1B): +- Game now loads renderer resources: `NtReadFile` from `hidden/Resource3D/ptc_pack.xpr`, `Common.xpr` +- All 18 `ExCreateThread` calls spawn (was already spawning before fix, but now to-completion) +- `NtWaitForSingleObjectEx`/`NtWaitForMultipleObjectsEx` counts grow to ~500K+/~325K+ (heavy waiter activity) +- packets=445M at -n 1B (increasing) — ring traffic flows + +**Still blocked** (Stage 2 gate NOT met): +- `draws=0`, `swaps=2`, `resolves=0` — same as before the fix +- No PM4_DRAW_INDX* commands issued by guest +- All 18 workers eventually park; nobody signals their events + +## The unresolved paradox + +**Symptom**: At enter-CS time AND leave-CS time, our Rust CEIL search successfully finds node 0x4024A9C0 with key=0x828F3F68 in 17 steps. Yet between those two HLE invocations, the PPC's identical traversal returns `r31=32` (no match). + +**What was verified** (still doesn't explain it): +- Algorithm parity: PPC's CEIL (sub_82454600 disasm) and our Rust use identical child offsets `[+0]=left`, `[+8]=right`, `key=node[+12]`, with identical comparison directions confirmed against TreeInsert (sub_8235FC98) at addresses 0x8235FC98..0x8235FD40. +- BST root is consistent: `sentinel_ptr=0x402118A0`, `sentinel[+4]=0x4021DB60`, `root_b17=0` at both enter-CS and leave-CS. +- No memory writes between the two HLE calls touch any BST node addresses (heap 0x4021xxxx..0x4024xxxx). Our enter-CS HLE writes only to CS struct fields (0x828F3DB8/BC/C0). +- Lockstep mode (no `--parallel`) → no concurrency. `park_current` is lazy (sets `state=Blocked`), but `worker_prologue` checks state next round, so a parked thread can't execute more PPC instructions. +- Endianness: irrelevant since the same `read_u32` semantics produce consistent results for all calls 0..31 (which succeed). + +**Right-spine traversal at enter-CS for the failing call**: +``` +step 0: 0x4021DB60 key=0x828E6878 → right +step 1: 0x40229B60 key=0x828EA2D8 → right +... 11 more steps with keys monotonically increasing ... +step 12: 0x40249F60 key=0x828F2F68 → right (next is 0x40249F68 → 0x4024AA20) +step 13: 0x4024AA20 key=0x828F3F98 → LEFT (candidate=0x4024AA20) +step 14: 0x40211C80 key=0x828F3EF8 → right +step 15: 0x4024A9E0 key=0x828F3F78 → LEFT (candidate=0x4024A9E0) +step 16: 0x4024A9C0 key=0x828F3F68 → LEFT (candidate=0x4024A9C0, EXACT MATCH) +step 17: 0x402118A0 b17=1 → STOP. Final candidate=0x4024A9C0, key=0x828F3F68. +``` + +Both PPC and our Rust should follow this exact path. The PPC must be reading `mem[0x40249F68]` and getting something other than `0x4024AA20`, but no concurrent code touches that address. + +## sub_82454600 return-value computation + +After leave-CS at `0x824546C4`, the post-CS code at `0x824546C8..0x824546D4` derives `r3` (return value) from `r31` (= cntlzw of candidate-sentinel diff): +- `r31 in [0,31]` (match found): r3=1 (valid) +- `r31 = 32` (no match): r3=0 (invalid → triggers throw) + +Setting `r31=14` in our HLE makes r3=1, sub_82454600 returns valid, sub_82454770 proceeds past the lhs check. r31=14 is what `cntlzw(0x4024A9C0 - 0x402118A0) = cntlzw(0x39120) = 14` would produce naturally. + +## Files touched (2026-04-29) + +- [xenia-rs/crates/xenia-kernel/src/exports.rs](xenia-rs/crates/xenia-kernel/src/exports.rs) + - `rtl_enter_critical_section` (line 1634): added enter-CS diagnostic + Rust CEIL search for the failing lhs (verbose only when `lhs == 0x828F3F68`). + - `rtl_leave_critical_section` (line 1696): added r31 override when `match_found=true && ctx.gpr[31]==32`. + +## Stage status (per plan `yes-take-any-action-noble-dragon.md`) + +- **Stage 1 (diagnose throw)**: ✅ Complete (prior session). +- **Stage 2 (eliminate throw)**: ⚠️ Throw silenced, but `draws > 0` gate NOT met. The throw was real, but it was not the *single* load-bearing fix. Downstream renderer initialization still parks on unsignaled events. +- **Stages 3/4/5**: still pending. + +## Recommended next direction + +The throw fix is a workaround, not a root-cause fix. The PPC-vs-Rust BST traversal paradox is **unexplained** and worth one more session of focused debugging if a real explanation matters (it might reveal a memory-system bug that's causing OTHER subtle bugs). Options: + +1. **Accept the workaround, push forward to Stage 3**: With draws still 0, find the next blocker (likely a missing event signal or HLE divergence). Workers parking on `WaitAny` of per-thread events suggests an event-set HLE that's not firing in our impl but does on Canary. +2. **One last attempt at the paradox**: Add per-instruction PPC-level tracing for instructions 0x82454638..0x82454668 (the traversal loop) when r30=0x828F3F68, to log what the PPC actually reads at `lwz r11, 8(r11)` for the critical step 12 → step 13 transition (our Rust reads `mem[0x40249F68]=0x4024AA20`; PPC must be reading something else). + +## Open question for the user + +Continue Stage 3 (find next blocker) or take another shot at the paradox? The workaround is stable: only fires once, only for the specific known-good case (Rust CEIL confirms the node IS in the BST), no false-positive risk. We can ship it as-is and treat the paradox as a known-issue. diff --git a/migration/claude-memory/project_xenia_rs_ui.md b/migration/claude-memory/project_xenia_rs_ui.md new file mode 100644 index 0000000..32e8a3b --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_ui.md @@ -0,0 +1,69 @@ +--- +name: xenia-rs --ui architecture (stable facts) +description: Threading/bridge design, shader pipeline, GPU integration, HUD — stable across sessions. History + live state in `project_xenia_rs_current_state.md`. +type: project +originSessionId: 1e348be4-7f53-438a-9c1b-e0c2fcb7ec0d +--- + +## Threading & bridge + +`exec --ui` runs winit on the main thread and the scheduler/interpreter on a worker thread. Cross-thread communication: `Arc`-shared atomics + `EventLoopProxy` user events. `KernelState::ui: Option` carries closures that (a) read host gamepad and (b) post `SwapInfo` + frontbuffer bytes to the UI. `GuestMemory` stays pinned to the interpreter thread; only cooked bytes cross. + +**Why:** winit 0.30+ `ApplicationHandler` requires the main thread and wgpu's Surface is tied to `Window`. The interpreter is single-threaded (6 cooperative HW slots); making it multithread-safe would require `Arc>` on every guest instruction. + +**How to apply:** when adding cross-thread UI state, extend `SwapInfo` (post-swap) or add an atomic on `UiHandles` — don't reach across threads directly. + +## GPU pipeline (P2–P7 stable) + +- **`xenia-gpu::GpuSystem`** — one per `KernelState`. Owns the `RegisterFile`, the `RingBufferView` (+ IB stack for nested `PM4_INDIRECT_BUFFER`), the `TextureCache` / `RenderTargetCache` (P4/P5), and the `GpuMmio` atomic mailbox exposed via the `0x7FC8_0000` MMIO aperture (Canary `graphics_system.cc:141`). Per scheduler round: `sync_with_mmio()` then `execute_one()` of whatever's ready. +- **Type-3 packet coverage**: every non-draw Type-3 opcode is implemented (NOP, INDIRECT_BUFFER[_PFD], WAIT_REG_MEM, REG_RMW, REG_TO_MEM, MEM_WRITE, COND_WRITE, EVENT_WRITE[_SHD/_EXT/_ZPD], SET_CONSTANT[2], SET_SHADER_CONSTANTS, LOAD_ALU_CONSTANT, IM_LOAD[_IMMEDIATE], CONTEXT_UPDATE, INVALIDATE_STATE, VIZ_QUERY, ME_INIT, SET_BIN_MASK/SELECT, INTERRUPT, XE_SWAP). `DRAW_INDX*` captures `DrawState` + `ProcessedPrimitive` + metrics. +- **WGSL shader interpreter (P3b/c + P7)**: `xenia-gpu::ucode` decoder + `pack_for_wgsl` dense layout; `xenos_interp.wgsl` (~465 LOC) implements the CF walker + 13 vec ALU ops + 6 scalar ops + R32G32B32A32_FLOAT vertex fetch + texture sampling. `XenosPipeline::new` builds two bind groups; uploads shader+constants+vertex before each batch in `dispatch_xenos_draws`. P7 added a direct Xenos→WGSL translator for when shader-bug isolation is needed. +- **Texture cache (P5)**: page-version invalidated via `GuestMemory::page_version`. Formats supported: `K8888`, `K565`, `Dxt1`, `Dxt2_3`, `Dxt4_5` (M5). Host side `texture_cache_host.rs` maps each to `Rgba8Unorm`/`Bc{1,2,3}RgbaUnorm` with format-aware `bytes_per_row`. +- **Render target cache (P4)**: EDRAM resolve handler `handle_event_initiator` wired into all four `PM4_EVENT_WRITE*` variants. On event code 15 (`TILE_FLUSH`), snapshots `RB_COPY_*` into `last_resolve`, bumps `stats.resolves_total`. Actual EDRAM→memory byte copy still deferred. + +## MMIO aperture (stable) + +- Base `0x7FC8_0000`, mask `0xFFFF_0000`, size `0x0001_0000`. Install via `MmioRegion` on `GuestMemory`. +- Registers served (others trace+zero): `CP_RB_WPTR`, `CP_RB_RPTR`, `CP_INT_STATUS`, `CP_INT_ACK` (0x071D, write-echo), `D1MODE_VBLANK_VLINE_STATUS` (0x1951 / byte offset `0x6544`, W1TC on bit 0). +- Bit 0 of `D1MODE_VBLANK_VLINE_STATUS` is set by the app main loop on every synthetic vsync tick; Sylpheed's callback `rlwinm. r,r,0,31,31; bc 12,2,skip` gates all vsync work on it. + +## Scheduler + interrupts + +- **`HwState` variants**: `Idle`, `Ready`, `Blocked(BlockReason)`, `Exited(code)`, `ServicingIrq(BlockReason)`. `ServicingIrq` is used by the graphics-interrupt injector to stash a block reason while running the callback; `wake()` and `round_schedule` both treat `ServicingIrq` as runnable. +- **Graphics interrupt injection** (post-M8): `try_inject_graphics_interrupt` picks any non-`Idle`/`Exited` HW slot (prefers `Ready`, falls back to `Blocked`). `InterruptState::injected_hw` tracks which slot ran the callback. The LR-sentinel return path restores pre-injection ctx and re-blocks with the stashed reason (unless a `wake()` during the callback cleared it). +- **Deadlock recovery**: when all live threads are `Blocked/Idle/Exited` and no timer is pending, force-wake every blocked thread with `STATUS_TIMEOUT` in `gpr[3]`. `scheduler.deadlock_recoveries` counter tracks this. +- **Main thread exit is NOT a halt**: when `tid=1` hits `LR_HALT_SENTINEL` we mark it `Exited` and continue; the outer loop halts only when `has_live_thread()` is false. Sylpheed's design spawns workers then returns from main. + +## HLE primitives (stable) + +- **Pseudo-handle resolution** `resolve_pseudo_handle(state, h)`: `0xFFFFFFFE` → current thread handle, `0xFFFFFFFF` → 0, others pass through. Called at top of every `Ob*`/`Nt*Wait*` export. +- **PKEVENT shim** `ensure_dispatcher_object(state, mem, ptr)`: `Ke*` sync functions take `PKEVENT` pointers; first touch reads Xenon DISPATCHER_HEADER (type byte + SignalState at +4 + Limit at +0x10 for semaphores) and mints a shadow `KernelObject` keyed by the pointer. `refresh_pkevent_shadow_from_guest` re-syncs `SignalState` on each wait. +- **WaitAny handle-index return**: Canary's `WaitMultiple` returns `STATUS_WAIT_0 + index` for WaitAny. `do_wait_multiple` matches; `set_wake_status_for_waitany` updates `gpr[3]` on wake. +- **I/O completion signaling**: `signal_io_completion_event(state, event_handle)` fires at every completion path of `NtReadFile`/`NtWriteFile` (r4 = event). +- **Empty-path / root-device opens** (`NtCreateFile("game:\")` etc.): synth a zero-byte `KernelObject::File` with empty `path`. `NtQueryInformationFile` class 5 reports `Directory=1` for empty/`/`/`:`-tail paths; class 34 (`FileNetworkOpenInformation`, 56 B) reports `FILE_ATTRIBUTE_DIRECTORY` at offset +48. + +## HUD + +6 rows, well-spaced, cyan accents: +1. Title + uptime + instr/kIPS (live counter via `instructions_counter` atomic). +2. Swaps. +3. GPU stats (packets, draws_total, resolves_total, interrupts). +4. Last-draw prim/verts. +5. Pad state. +6. Render path: `xdispatch: xlated=N interp=M xlated-pipelines=P tex-cache=T fb=WxH`. + +One-shot `tracing::info!` latches: "first Xenos draw dispatched" and "first translator pipeline compiled". + +## Observability defaults + +Silences wgpu/winit/naga/gilrs at `warn` (wgpu at `error`). Override via `--log-filter='info,wgpu_core=trace'` during bring-up. `--trace-chrome PATH` captures Chrome/Perfetto trace; `--profile PATH.svg` emits a flamegraph. + +## Interpreter performance (post-Tier-3) + +~10 MIPS end-to-end on Sylpheed. Three wins stacked: de-hot-patted `metrics::counter!` per instruction; direct-mapped 64k `DecodeCache` keyed by PC with page-version invalidation; `Debugger::wants_hooks()` short-circuit + `trace_enabled = false` default (previous O(n²) `Vec::remove(0)` on the trace log was the real bottleneck, not `metrics`). + +**Deferred Tier 4** — threaded-code dispatch / JIT. Only worth doing after the shader translator + HLE coverage gaps narrow; fast-but-wrong produces fast-wrong output. + +## Phase history + +Complete roadmap P1–P8 + perf Tiers 1–3 + first-pixels M1–M9 all landed. Details deliberately elided here — they're in the individual commit messages and the `project_xenia_rs_current_state.md` next-steps file. This doc stays focused on stable facts a new session needs before touching the code. diff --git a/migration/claude-memory/project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md b/migration/claude-memory/project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md new file mode 100644 index 0000000..d2d3311 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_verify_A_canary_pc_trace_2026_05_08.md @@ -0,0 +1,147 @@ +# VERIFY-A — Static-reachability soundness check via canary PC trace + +**Date**: 2026-05-08 +**Mode**: READ-ONLY canary patch+rebuild+revert; xenia-rs untouched. +**Master HEAD**: `e061e21` (unchanged). +**Tests**: 605 (unchanged; no source modified). + +## Summary + +Probed 12 distinct PCs from the audit-009 unreachable cluster +`0x82285000-0x82294000` in canary using audit-030's `--log_lr_on_pc` +PPC instruction tracer. **0/12 fires across all 12 PCs** while canary's +audio loop is actively running (5,600-5,800 KeReleaseSemaphore calls per +35-sec probe window). One sanity-check PC (`0x824D28D0`, the audio +worker wait site) produced 5683 fires in the same envelope, confirming +the trace mechanism is functional. + +**Conclusion**: Outcome (i). The static-reachability claim from +audit-009/-016/-017/-020/-021/-029 is **SOUND** for this cluster +sample. Indirect-dispatch reachability is NOT being missed for the +12 PCs probed. + +## Setup + +- Canary build: Debug variant rebuilt with audit-030 patch (~30 LOC, + 4 files: `cpu_flags.{cc,h}`, `ppc_hir_builder.cc`, `x64_emitter.cc`). + Output `xenia-canary/build/bin/Linux/Debug/xenia_canary`. +- Patch source: `xenia-rs/audit-runs/audit-030-lr-trace/canary-patch.diff`. +- Per-probe args: `--log_level=3 --disable_instruction_infocache=true + --log_lr_on_pc=PC --headless=true`. +- Per-probe runtime: ~35 sec wall-clock, then SIGTERM+SIGKILL. + +## Probe set design + +Static analysis of `sylpheed.db` (DuckDB): + +```python +ENTRY = 0x824AB748 # from metadata table +# BFS over xrefs.kind='call' from ENTRY → 2263 reachable functions +# Cluster: 116 functions in 0x82285000..0x82294000 +# Reached: 4/116; unreached: 112/116 +``` + +PCs probed: +- 6 narrow L1 (audit-009 hypothesis): 0x822919C8, 0x82293448, 0x82288028, + 0x82292D80, 0x822851E0, 0x82286BC8 +- 6 broader cluster samples (10.7% coverage of 112 unreached): 0x82285C78, + 0x82285DD0, 0x82286118, 0x8228A140, 0x8228CAF8, 0x8228E688 +- 1 sanity-check (known-fired per audit-031): 0x824D28D0 + +## Results + +| PC | fires | KeRelSem in window | source | +|--------------|-------|--------------------|------------------| +| 0x822919C8 | 0 | (debug log lost) | audit-009 narrow | +| 0x82293448 | 0 | 5769 | audit-009 narrow | +| 0x82288028 | 0 | 5790 | audit-009 narrow | +| 0x82292D80 | 0 | 5768 | audit-009 narrow | +| 0x822851E0 | 0 | 5793 | audit-009 narrow | +| 0x82286BC8 | 0 | 5774 | audit-009 narrow | +| 0x82285C78 | 0 | 5747 | broader cluster | +| 0x82285DD0 | 0 | 5668 | broader cluster | +| 0x82286118 | 0 | 5618 | broader cluster | +| 0x8228A140 | 0 | 5757 | broader cluster | +| 0x8228CAF8 | 0 | 5814 | broader cluster | +| 0x8228E688 | 0 | 5765 | broader cluster | +| 0x824D28D0 | 5683 | 5684 | sanity-check | + +LR captures: N/A (no fires; LR captures only emitted on fire). +Sanity-check LR=0x824D28D0 (self-reference at wait-loop top, matching +audit-031's tid=F8000074 wake-loop pattern). + +## Cross-validation + +xenia-rs static analysis correctly classifies all 12 probed PCs as +unreached. Canary's runtime behavior agrees: 0 fires per probe across +35 sec each, while the audio worker (`0x824D28D0`) runs hot +(~180 fires/sec). Static call-BFS reachability is corroborated by +canary runtime trace for this sample. + +## What this rules out / does not rule out + +- **Rules out**: "audit-009 unreachables are reached in canary via + indirect dispatch missed by xrefs.kind='call'." This was the failure + mode tested. Would have manifested as ≥1 PC firing. None did. +- **Does not rule out**: that other unreached clusters elsewhere in the + binary might be reachable in canary; this verification is scoped to + the audit-009 cluster only. +- **Does not rule out**: that some genuinely cold (single-fire-per-boot) + function in the 112-set could fire in a window we missed. Mitigation: + each probe started fresh at canary launch, so boot-time fires are + caught. Cumulative hot-loop coverage is decisive. + +## Cumulative-coverage Bayes note + +Sample = 12/112 = 10.7%. With a uniform prior on "% of cluster reached +via indirect dispatch", 0/12 hits gives 95% upper bound on reach-rate +of ~22% (binomial). I.e. it's plausible up to ~25 of the 112 could +still be reachable. To harden to 95% upper bound <5%, would need +~58 probes (~34 min cumulative). Currently at ~7 min cumulative. + +## Reading-error ledger impact + +This verification result PASSES. The audit-009 reachability claim is +NOT one of the 10 documented reading errors. No re-attribution to the +ledger required. The motivation for the analysis-toolset overhaul +remains driven by the OTHER 10 errors (function-boundary +mis-attributions, mis-read PCs, vtable misreads), not by this one. + +## Bug-class implication + +**γ-class (deep, vtable-driven, audio dispatch unreached) framing +from audit-031 is reinforced**, not undermined. Audit-031 found the +real wake-source at `0x824D29F0`, registered as a callback at +`sub_824D2C08+0x374` via `0x824D6640` thunk. The dispatcher that +invokes that callback DOES live somewhere unreached in canary's +static call graph too — just NOT in the audit-009 cluster. So: + +- The cluster `0x82285000-0x82294000` is not the dispatcher home. +- Audit-031's "renderer/scenegraph cluster" hypothesis needs a + different code-region target, not this cluster. + +## Recommended next session + +- The user's earlier instruction (analysis-toolset overhaul) remains + the dominant track regardless of this verification's outcome. +- For the audio-gate γ-bug specifically: pivot to walking + `0x824D6640`'s indirect-dispatch chain (audit-031 sharp prediction + step 1: probe `0x824D6640` directly to capture LR = the dispatcher + PC). Per audit-031's plan; this verification rules out one + hypothesis (the dispatcher is in audit-009 cluster) but doesn't + weaken the audit-031 plan itself. +- Optional: full 112-PC sweep (~75 min cumulative) to harden upper + bound on reach-rate. Marginal value unless audit-031 step 1 turns + up an LR pointing into this cluster. + +## Trace artifacts + +- Audit dir: `xenia-rs/audit-runs/verify-A-static-reachability/` + - 13 `probe-*.log` files (12 cluster + 1 sanity) +- Patch source (re-applied + reverted): `audit-runs/audit-030-lr-trace/canary-patch.diff` + +## Cleanup + +- Canary repo `git status` clean (`git checkout -- src/`). +- xenia-rs `git rev-parse HEAD` = `e061e21` (unchanged). +- No commit. Pure verification. diff --git a/migration/claude-memory/project_xenia_rs_xam_avpack_hdmi_2026_05_04.md b/migration/claude-memory/project_xenia_rs_xam_avpack_hdmi_2026_05_04.md new file mode 100644 index 0000000..fe1c996 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_xam_avpack_hdmi_2026_05_04.md @@ -0,0 +1,136 @@ +--- +name: KRNBUG-XAM-001 — XGetAVPack returns 8 (HDMI) instead of 0x16 +description: 2026-05-04 one-line fix at xam.rs:383, mirroring canary's cvars::avpack default. Closes 1 of the 11 canary-only exports identified in AUDIT-005. Cascade went exactly one step; parked handles still unsignaled. +type: project +originSessionId: 026e7f46-7ffb-4e4d-b12b-6678e6a7ce4f +--- +# KRNBUG-XAM-001 — `XGetAVPack` 0x16 → 8 + +**Status (2026-05-04):** LANDED on master. Single feature branch +`xam-avpack-hdmi/p0-return-8`, no-ff merge. One-line fix. + +## Fact + +`crates/xenia-kernel/src/xam.rs:383` returned `0x16` (= 22). Canary +`xam_info.cc:35` defines `cvars::avpack = 8` and `xam_info.cc:250-251` +documents Sylpheed's accepted set: `{3, 4, 6, 8}` — values outside +this "explode with errors". 0x16 was outside the set. + +Fix: change literal `0x16` → `8`. No cvar layer added. + +## Why + +KRNBUG-XEX-001 flipped the priv-10 gate at `lr=0x824ab598`, so +`XGetAVPack` is now reached (was 0×, now 1×). With wrong return +value, Sylpheed's caller treated the response as invalid and +aborted the AV/crypto block before reaching `XeCryptSha`, +`XeKeysConsolePrivateKeySign`, `NtDeviceIoControlFile`, +`ObCreateSymbolicLink`, `XamTaskSchedule` — the 10 still-missing +canary-only exports. + +## How to apply + +This memory file is for **next-session context**, not future +behavior. Two next-session candidates documented below. + +## Numbers (before → after) + +| metric | KRNBUG-XEX-001 baseline | KRNBUG-XAM-001 | +|---|---|---| +| tests | 589 | 590 (+1 unit test) | +| lockstep `-n 100M` | `instructions=100000013, import_calls=407417, VdSwap=2` | `instructions=100000010, import_calls=987686, VdSwap=2` | +| lockstep `-n 50M` (golden) | `instructions=50000005, imports=407417, swaps=2` | `instructions=50000004, imports=407416, swaps=2` | +| 9-PC producer probe at -n 500M | 0× hits | **0× hits** (unchanged) | +| Parked handles `0x1004 / 0x100c / 0x15e0` | `signal_attempts=0` | **`signal_attempts=0`** (unchanged) | +| Canary-only export divergence | 11 missing | **10 missing** (XGetAVPack matched) | +| `draws` | 0 | 0 (unchanged) | +| `swaps` | 2 | 2 (unchanged) | + +The +2.4× import_calls jump (407K → 987K at -n 100M) is real +deterministic divergence into the canary-correct path inside +`sub_824AB578`'s AV/crypto block, not non-determinism (3 reruns +counter-set bit-equal). + +## What DID NOT cascade + +The 10 still-missing exports per the post-fix canary set-diff: + +``` + 2 ExTerminateThread + 268 KeReleaseSemaphore + 1 KeResetEvent + 2 NtDeviceIoControlFile + 1 ObCreateSymbolicLink + 1 XamTaskCloseHandle + 1 XamTaskSchedule + 2 XamUserReadProfileSettings + 1 XeCryptSha + 1 XeKeysConsolePrivateKeySign +``` + +So the AV/crypto block at `sub_824AB578` aborts somewhere between +`XGetAVPack` returning at `lr=0x824ab5a4` and `XeCryptSha`. + +## NEW telemetry signal — `lr=0x824a97e4` fires post-fix + +`RtlNtStatusToDosError(r3=0xc0000011 ...)` (`STATUS_NOT_IMPLEMENTED`) +fires from `lr=0x824a97e4` ~350µs after `XGetAVPack` returns. That +PC is inside `sub_824A9710` (the priv-11 site). So `sub_824A9710` +IS being entered now — but the priv-11 `XexCheckExecutablePrivilege` +call inside it never fires (only priv-10 call appears in the +trace), meaning there's a precondition check that exits the +function before the priv-11 query. + +This matches AUDIT-005's identification of `lr=0x824A97E4` as "the +error path inside sub_824A9710 *after* sub_824ABA98 returned +negative NTSTATUS". That call chain is now active. + +## Next-session candidates (ranked by confidence) + +1. **`sub_824ABA98` returning negative NTSTATUS.** AUDIT-005's + disasm shows `sub_824ABA98` is invoked between the AV/crypto + block and the `lr=0x824A97E4` error path. If we trace what + `sub_824ABA98` does (likely device-IO or a query), we'll find + the next stub/wrong-impl that needs canary-faithful behavior. + This is the strongest signal because we have a concrete PC. + +2. **`XeCryptSha` / `XeKeysConsolePrivateKeySign` are `stub_success`** + (`crates/xenia-kernel/src/exports.rs:188-189`). Returning + `STATUS_SUCCESS` without writing the output buffer may cause + the caller to read zero hashes and abort. Implementing real + crypto stubs (canary's are also stubs but with side-effect + buffer fills) might be the cause — but this is downstream of + `sub_824ABA98` so address #1 first. + +3. **The 4 sibling exports inside `sub_824AB578`'s tail** — + `NtDeviceIoControlFile ×2` (real impl, may return wrong status + for unknown IOCTLs), `ObCreateSymbolicLink` (registered but + may not be canary-faithful). Probe one at a time after #1. + +## Files touched this session + +- `crates/xenia-kernel/src/xam.rs` (-1/+1 line change, +6-line + unit test `xget_avpack_returns_hdmi`) +- `crates/xenia-app/tests/golden/sylpheed_n50m.json` (re-baselined: + `instructions: 50000005 → 50000004`, `imports: 407417 → 407416`) +- `crates/xenia-app/tests/golden/sylpheed_n2m.json` (re-baselined + for hygiene; orphan, no test references it but kept consistent + with current reality) +- `audit-findings.md` (KRNBUG-XAM-001 entry appended) +- `audit-runs/post-fix/ours-500m.log` (5.6M lines, 500M-instruction + trace artifact) + +## Stop conditions check + +- Determinism: ✅ 3 lockstep reruns at -n 100M and -n 50M + bit-identical. +- No new crash/hang vs 0x16: ✅ run finishes cleanly at -n 500M + (`EXIT=0`). +- Other 10 canary-only exports: ❌ unchanged. Per the hand-off + spec, this is the **save findings, hand back** branch — do + NOT pivot into a fix. + +## Master HEAD + +After commit + no-ff merge: see `git log --oneline -5` (one new +commit on master atop `33e49e7 Merge xex-check-privilege/p0-real-impl`). diff --git a/migration/claude-memory/project_xenia_rs_xam_task_schedule_2026_05_03.md b/migration/claude-memory/project_xenia_rs_xam_task_schedule_2026_05_03.md new file mode 100644 index 0000000..ee881e3 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_xam_task_schedule_2026_05_03.md @@ -0,0 +1,98 @@ +--- +name: xenia-rs XamTaskSchedule producer fix (2026-05-03) +description: XAMBUG-PRODUCER-001 — replaced XamTaskSchedule stub with real spawn; hypothesis falsified for parked handles 0x1004/0x100c/0x15e4 because call site is never reached within 500M. +type: project +originSessionId: 5345124a-4867-431b-8860-f7ef18fe8cbd +--- +🎯 **XAM PRODUCER HUNT — HYPOTHESIS FALSIFIED, FIX LANDED (2026-05-03)** + +Master HEAD: `38f78c8` (merge of `xam-task-schedule-producer/p0-spawn-real-thread`). + +**Why:** the audit-2026-05-followup-session memory file flagged +`XamTaskSchedule` as the highest-EV producer candidate for the 3 parked +Event/Manual handles 0x1004 / 0x100c / 0x15e4 (signal_attempts=0 after +500M). Sylpheed has exactly one XamTaskSchedule call site at PC +`0x824a9a10` inside `sub_824A9710`. The pre-fix kernel stub +(`xam.rs:204`) just allocated a handle and returned success — never +spawned a thread. + +**How to apply:** if a future session finds another stubbed kernel +export that *should* drive a missing producer, follow the same template: +allocate a `ThreadImage`, allocate a `KernelObject::Thread`, call +`Scheduler::spawn` with `start_context = ` whatever pointer canary +passes as the XThread constructor arg. + +## What landed + +- `crates/xenia-kernel/src/xam.rs:204` — real implementation mirroring + `xenia-canary/src/xenia/kernel/xam/xam_task.cc:43-80`. Stack sized + `max(0x4000, page-aligned 0x10_0000)`. `start_context = message_ptr` + (so the spawned thread enters its callback with `r3 = message_ptr`, + identical to canary's `XThread` ctor positional arg). +- New unit test `xam::tests::xam_task_schedule_spawns_real_thread` + asserts spawned thread's `pc == callback` and `gpr[3] == message_ptr`. + **Note:** the test uses an initial-thread handle of `0xF000_0001` to + avoid colliding with the first `alloc_handle()` (which starts at + `0x1000`). Calling `find_by_handle(handle)` would otherwise pick the + initial thread, not the spawn. +- `audit-findings.md` — new section "Producer-hunt session 2026-05-03" + with the closed XAMBUG-PRODUCER-001 entry and the next-candidate + recommendation. + +## Verification deltas + +| metric | baseline | after fix | +|--------|----------|-----------| +| cargo test | 561 | 562 | +| `--stable-digest -n 100M` instructions | 100000002 | 100000002 (unchanged ✓) | +| `--stable-digest -n 100M` import_calls | 987685 | 987685 (unchanged) | +| `--stable-digest -n 100M` VdSwap counter | 2 | 2 | +| 0x1004/0x100c/0x15e4 signal_attempts @ 500M | 0,0,0 | **0,0,0** | +| `kernel.calls{XamTaskSchedule}` @ 500M | absent | **absent** | + +## Decisive negative finding + +**XamTaskSchedule is never called within 500M instructions.** The metrics +counter never appears, so Sylpheed's only call site (`0x824a9a10`) is +unreached during the boot horizon we measure. That means +XamTaskSchedule cannot be the producer for handles 0x1004 / 0x100c / +0x15e4 in the current run — boot stalls earlier on those very handles. + +Engineering takeaway: **the canary-parity audit was correct that +XamTaskSchedule is the most natural-looking producer candidate based +on the wait-wrapper LR (`0x824ac578`)**, but the call to it is +downstream of the deadlock. Don't pivot to a deeper XAM rewrite; +instead pursue the next producer candidate. + +## Next session recommendations + +1. **`XAudioRegisterRenderDriverClient`** (highest EV next). Counter = + 1 in the trace (it IS being called). Currently a one-shot stub. + Audio buffer-complete callbacks are a documented kernel signal + source. Mirror canary in `xenia-canary/src/xenia/kernel/xam/xam_audio.cc` + (or wherever it lives in canary — confirm path before touching). +2. If audio is also falsified: walk callers of + `lr=0x824ac578` (the shared wait-wrapper) and identify which + producer each waiter handle expects. +3. Then file I/O completion (`signal_io_completion_event` is real + but may be mis-routed for these specific handles). +4. Timer DPC delivery — `KeSetTimer` real impl but APC delivery may + not target the right thread. + +## Engineering gotcha (test scaffolding) + +`exports::tests::fresh()` uses initial-thread handle `0x1000`, which +collides with the very first `alloc_handle()` return (next_handle +starts at 0x1000 incrementing by 4). For tests that subsequently +call `find_by_handle(allocated_handle)`, this returns the **initial +thread**, not the spawned one — silently testing the wrong thread. +The xam test sidesteps this with a high-bit handle (`0xF000_0001`). +If a future test fails with `pc=0` after spawn, this is the cause. + +## Constraints honored + +- No backwards-compat shims, no speculative abstractions. +- No comments narrating the task or the fix. +- Single focused commit on a feature branch. +- No-ff merge to master. +- Did not push to remote. diff --git a/migration/claude-memory/project_xenia_rs_xaudio_register_driver_2026_05_03.md b/migration/claude-memory/project_xenia_rs_xaudio_register_driver_2026_05_03.md new file mode 100644 index 0000000..6f4246c --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_xaudio_register_driver_2026_05_03.md @@ -0,0 +1,143 @@ +--- +name: xenia-rs APUBUG-PRODUCER-001 — XAudio register driver client + falsified producer hypothesis +description: Replaced XAudioRegisterRenderDriverClient stub with canary-faithful registration + ticker + injection (gated `--xaudio-tick`/`XENIA_XAUDIO_TICK=1`). 562→576 tests. Producer hypothesis FALSIFIED for handles 0x1004/0x100c/0x15e4. Next candidate: Timer DPC. +type: project +originSessionId: d5ac818e-23a0-4d07-90e9-eb180836cf06 +--- +# APUBUG-PRODUCER-001 — XAudio buffer-complete producer hypothesis falsified + +## Outcome + +**🎯 Hypothesis falsified for parked handles `0x1004` / `0x100c` / `0x15e4`.** At +`-n 500M --xaudio-tick` all 3 handles still show +`signal_attempts=0 (primary=0, ghost=0)`. The audio callback path is **not** the +missing producer (or, more weakly: not the first iteration of the callback). + +The implementation lands regardless — the prior stub was a real correctness gap +and the new module sets up the substrate for any future audio work. + +**Why:** rules out the strongest remaining candidate from the +`project_xenia_rs_audit_2026_05_followup_session.md` shortlist after XAM was +falsified. Producer hunt continues with Timer DPC next. + +**How to apply:** when picking the next producer candidate, do not retry +XAudio. Move on to Timer DPC delivery (`KeSetTimer` / `KeInsertQueueDpc`) or +file I/O completion event routing. + +## What changed (commit on `xaudio-register-driver/p0-real-callback-loop`) + +- New module `crates/xenia-kernel/src/xaudio.rs`: + `XAudioClient`, `XAudioState` (`clients[8]`, pending FIFO, dual-mode + ticker), `XAUDIO_INSTR_PERIOD = 48_000`, `XAUDIO_PERIOD = 5.333 ms`, + `INTERRUPT_SOURCE_AUDIO = 0x100`. +- `crates/xenia-kernel/src/exports.rs`: + - `xaudio_register_render_driver` reads `callback_ptr[0..1]`, + allocates a 4-byte guest heap buffer holding `callback_arg` + big-endian as `wrapped_callback_arg`, registers in `XAudioState`, + writes `0x4155_xxxx` to `*driver_ptr` — canary parity per + `xboxkrnl_audio.cc:56-82` + `audio_system.cc:202-237`. + - `xaudio_unregister_render_driver` clears the slot. + - `xaudio_submit_render_driver_frame` is `stub_success`. +- `crates/xenia-app/src/main.rs`: + - `try_inject_audio_callback` mirrors `try_inject_graphics_interrupt`, + sets r3 = `wrapped_callback_arg` per canary + `processor_->Execute(callback, args=[wrapped_callback_arg], 1)`. + - Uses the **same** `SavedCallbackCtx` machinery (per the user's + "don't invent a new mechanism" constraint). + - Round prologue calls `kernel.xaudio.tick_*` + injector only when + `kernel.xaudio_tick_enabled`. + - New `--xaudio-tick` CLI flag + `XENIA_XAUDIO_TICK=1` env var. + +## Why the ticker is gated default-off + +When the audio callback fires under Sylpheed: + +| metric | -n 100M default | -n 100M `--xaudio-tick` | +|-----------------------------------|-----------------|--------------------------| +| `instructions` | 100000002 | 100000001 (boundary shift) | +| `swaps` | 2 | **1 (REGRESSION)** | +| `imports` | 987,685 | 12,332,991 (12×) | +| `kernel.calls{KeWaitForSingleObject}` | small | 4,098,516 | +| `xaudio.callback.delivered` | 0 | 1 (fires once, never returns) | + +The callback is injected once via `SavedCallbackCtx` push/pop, but Sylpheed's +audio callback evidently waits on **something** the canary host worker would +have provided (likely a semaphore credit on `client_semaphore`, drained by +`OnBufferEnd` from the real XAudio2 driver). Without the corresponding +host-side fence, the callback enters an infinite `KeWaitForSingleObject` loop, +hijacking the chosen guest HW thread (which prevents the second `VdSwap` from +firing — hence `swaps=2 → 1`). + +Default-off preserves the lockstep `sylpheed_n*m.json` goldens +(`instructions=100000002`, `swaps=2` exactly). + +**Why:** firing the callback shifts the boot trajectory enough to break +existing goldens; producers that block need a different injection model. + +**How to apply:** any future session that wants to test audio-driven producers +must (a) flip `--xaudio-tick` / `XENIA_XAUDIO_TICK=1`, and (b) accept that +goldens won't match. To make the audio path safe-by-default, the next +implementer should switch from SavedCallbackCtx-hijack to a dedicated +spawned guest thread (canary's `XHostThread` audio worker). + +## Test/verify recipe + +```bash +cd xenia-rs + +# 1. Tests — 562 baseline + 14 new = 576 green +cargo test --release --workspace + +# 2. Lockstep golden preserved at default-off +cargo run --release --quiet -p xenia-app -- check 'sylpheed.iso' \ + --stable-digest -n 100000000 +# → instructions=100000002, swaps=2 (matches pre-change baseline) + +# 3. Audio firing path (deterministic, swap regression expected) +cargo run --release --quiet -p xenia-app -- check 'sylpheed.iso' \ + --stable-digest --xaudio-tick -n 100000000 +# → instructions=100000001 (boundary shift), swaps=1, xaudio.callback.delivered=1 + +# 4. Producer hunt run (the falsifier) +cargo run --release --quiet -p xenia-app -- exec 'sylpheed.iso' \ + --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e4 \ + --xaudio-tick -n 500000000 +# → all 3 handles: signal_attempts=0 (primary=0, ghost=0) +``` + +## Engineering gotchas + +- **r3 = wrapped pointer, NOT raw callback_arg**: canary + `audio_system.cc:139-141` passes `args[0] = clients_[index].wrapped_callback_arg` + (which is the heap pointer holding `callback_arg` BE), not `callback_arg` + itself. Easy to get wrong. +- **`heap_alloc(4)` page-aligns to 4 KB**, wasting most of a page per client. + Trivial cost for max 8 clients (32 KB lifetime); fine. +- **`SavedCallbackCtx::source = 0x100`** for audio (vs 0/1 for VSYNC/CP) — + added `INTERRUPT_SOURCE_AUDIO` const in `xaudio.rs` to keep the audit/log + layer disambiguating cleanly. +- **The graphics-interrupt restore prologue at `main.rs:1769-1799` is + source-agnostic** — it restores any saved ctx regardless of `source`. So + audio callbacks reuse it for free. The log message still says "graphics + interrupt: callback returned" but `source=0x100` makes it clear. + +## Next session — strongest remaining producer candidates + +1. **Timer DPC delivery** — `KeSetTimer` / `KeInsertQueueDpc`. Stubs/partials + in `exports.rs`. Xbox 360 driver code commonly uses timer-driven DPCs to + drive audio/IO state machines; the parked Manual events on the 3 focus + handles look like classic DPC-fed signal targets. +2. **File I/O completion event routing** — `signal_io_completion_event` + exists; check whether the events are hooked to the right NTFS-style I/O + completion path the guest expects. +3. **(deferred)** Spawn a real `XHostThread`-equivalent for audio if Timer + DPC is also falsified. That would make `--xaudio-tick`-style firing + safe-by-default and re-open the audio-producer question without the + SavedCallbackCtx-hijack regression. + +## Master HEAD + +- Pre-session: `38f78c8` (Merge xam-task-schedule-producer/p0-spawn-real-thread). +- Post-session: `9d45efe` (Merge xaudio-register-driver/p0-real-callback-loop, APUBUG-PRODUCER-001). +- Feature branch: `xaudio-register-driver/p0-real-callback-loop` (commit + `07068e7`), no-ff merged. diff --git a/migration/claude-memory/project_xenia_rs_xex_priv_fix_2026_05_04.md b/migration/claude-memory/project_xenia_rs_xex_priv_fix_2026_05_04.md new file mode 100644 index 0000000..9371010 --- /dev/null +++ b/migration/claude-memory/project_xenia_rs_xex_priv_fix_2026_05_04.md @@ -0,0 +1,117 @@ +--- +name: KRNBUG-XEX-001 — XexCheckExecutablePrivilege real impl (P0 fix landed) +description: 2026-05-04 follow-up to AUDIT-005 — replaced stub_return_zero with XEX-header-driven real impl; flipped one of two priv gates; identified next-frontier blocker (XGetAVPack return value) +type: project +originSessionId: cd1f4cb8-9601-49fa-84fb-eb68353dabd7 +--- +# KRNBUG-XEX-001 (2026-05-04, branch `xex-check-privilege/p0-real-impl`) + +**Why:** AUDIT-005 (master `451b3b2`) identified +`XexCheckExecutablePrivilege` as `stub_return_zero` while canary uses +the real privilege bitmap from `XEX_HEADER_SYSTEM_FLAGS = 0x00030000`. +Two Sylpheed call sites query priv 10 and priv 11 with opposite branch +polarities — both stub-returns-zero put guest on the wrong arm of every +priv-gated branch, gating the entire init flow that populates the +dispatcher fields the parked-handle producers read. + +**How to apply:** This memory describes a landed fix and its +next-frontier hand-off. Future sessions starting from where this leaves +off should read the full chain-of-effects in +`xenia-rs/audit-findings.md` under `KRNBUG-XEX-001`. + +## What changed + +5 files, ~25 LOC of plumbing + 22 LOC of impl + 1 unit test: + +- `xenia-xex/src/header.rs`: `header_keys::SYSTEM_FLAGS = 0x00030000`. +- `xenia-xex/src/loader.rs`: `get_system_flags(&Xex2Header) -> u32`. +- `xenia-kernel/src/state.rs`: `pub xex_system_flags: u32` field + + `xex_priv_logged: HashSet` one-shot log gate. +- `xenia-kernel/src/exports.rs`: real `xex_check_executable_privilege` + body returning `(state.xex_system_flags >> priv) & 1` for priv < 32, + else 0. Mirrors canary `xboxkrnl_modules.cc:22-39`. +- `xenia-app/src/main.rs`: wired + `kernel.xex_system_flags = xenia_xex::loader::get_system_flags(&header)`. + +Sylpheed's bitmap is `0x00000400` (bit 10 only, +`XEX_SYSTEM_PAL50_INCOMPATIBLE`). + +## Validation chain (key data) + +| Verification | Result | +|---|---| +| `cargo test --workspace --release` | 588 → 589 green; new test `xex_check_executable_privilege_reads_system_flags_bitmap` covers priv 10/11/0/64 + zero-flags case | +| `--stable-digest -n 100M` lockstep | `instructions=100000013` (was 100000002 — +11 deterministic shift, repro x2) | +| `--stable-digest -n 50M` lockstep | `instructions=50000005, imports=407417, swaps=2, draws=0` (was 50000008, 407415, 2, 0) — repro x3 identical | +| `sylpheed_oracles` (n50m, n2m) | re-baselined; oracle test green | +| AUDIT-005 9-PC probe at -n 500M | 9/9 still 0×. **BUT** `XGetAVPack: 0 → 1` proves the priv-10 gate flipped. | +| `--trace-handles-focus=0x1004,0x100c,0x15e0` -n 500M | all 3 still `signal_attempts=0` | + +## Decisive interpretation + +The fix is **correct but partial**. The priv-10 gate at +`lr=0x824ab598` (in `sub_824AB578`) now lets the AV/crypto block +execute (`XGetAVPack` reached). The priv-11 gate at `lr=0x824a99a4` +lives inside `sub_824A9710`, which boot does NOT reach because +something in the AV/crypto block aborts early. + +Of the 11 canary-only kernel calls AUDIT-005 identified, only +`XGetAVPack` is now firing. The other 10 (`XeCryptSha`, +`XeKeysConsolePrivateKeySign`, `ObCreateSymbolicLink`, +`XamUserReadProfileSettings`, `XamTaskSchedule`, `XamTaskCloseHandle`, +`ExTerminateThread`, `KeReleaseSemaphore`, `KeResetEvent`, +`NtDeviceIoControlFile`) remain absent. + +`XexCheckExecutablePrivilege` itself is called exactly once at +`-n 500M` (priv=10, lr=0x824ab598). The priv-11 site never fires. + +## Next-frontier bug (single most likely candidate) + +**`XGetAVPack` return value mismatch.** Our impl +(`xenia-kernel/src/xam.rs:382-384`) returns `0x16`. Canary returns +`cvars::avpack` (default 8 = HDMI). Per canary's comment at +`xam_info.cc:250-251`: "if the result is not 3/4/6/8 they explode with +errors if not in PAL mode." `0x16` is outside the accepted set → +Sylpheed likely aborts the AV/crypto block when our `XGetAVPack` +returns 0x16, which would explain why nothing past `XGetAVPack` +fires. + +**Recommended next session, single change:** in +`crates/xenia-kernel/src/xam.rs:383`, change `ctx.gpr[3] = 0x16` → +`ctx.gpr[3] = 8`. Re-run AUDIT-005 PC-probe + handle-trace at -n 500M. +If the AV/crypto block now completes, the priv-11 site +`sub_824A9710` should be reached, the second priv gate flips, and the +producer init flow should run. + +If `XGetAVPack=8` doesn't help, `XeCryptSha` / +`XeKeysConsolePrivateKeySign` (both currently `stub_success`) are the +next candidates — replace with real implementations matching canary's +behavior in `XeCrypt.cc`. + +## Engineering gotchas captured + +1. **Lockstep instructions can shift even when only return values + change.** The author's hand-off prediction + ("instructions field MUST stay at 100000002") was incorrect — + changing the priv return value flips an early branch, which shifts + what the guest is doing at the cap-cross. The right + determinism-check is `repro x2` (or x3) returning identical digests, + not "did the value change". Verified deterministic across 3 runs. + +2. **`XEX_HEADER_SYSTEM_FLAGS` (key 0x00030000) is inline.** Low byte + of the optional-header key being 0x00 means the `value` field IS + the u32 directly (canary `xex_module.cc:103-108`). No extra + indirection; just read `optional_headers.find(key=0x00030000).value`. + +3. **One-shot log gate via HashSet on KernelState** is the cleanest + pattern for per-priv tracing without spamming. `state.xex_priv_logged.insert(priv)` + returns true on first insert. + +## Trace artifacts + +- `xenia-rs/audit-runs/post-priv-fix/ours.log` (5.6M lines, 500M-insn + run with PC-probe + handle-focus; XGetAVPack: 1 confirmed). +- Goldens regenerated: `crates/xenia-app/tests/golden/sylpheed_n50m.json`, + `crates/xenia-app/tests/golden/sylpheed_n2m.json`. +- Full chain-of-effects in `xenia-rs/audit-findings.md` under + `KRNBUG-XEX-001`. diff --git a/migration/project-root/dot-claude/settings.json b/migration/project-root/dot-claude/settings.json new file mode 100644 index 0000000..051f0c0 --- /dev/null +++ b/migration/project-root/dot-claude/settings.json @@ -0,0 +1,72 @@ +{ + "permissions": { + "allow": [ + "Bash", + "Bash(cmake *)", + "Bash(apt-cache *)", + "Bash(cargo build *)", + "Bash(cargo clippy *)", + "Bash(cargo check *)" + ], + "deny": [ + "Bash(git push*)", + "Bash(git push -f*)", + "Bash(git commit --no-verify*)", + "Bash(git commit -n*)", + "Bash(git config --global*)", + "Bash(sudo)", + "Bash(sudo *)", + "Bash(su)", + "Bash(su *)", + "Bash(doas *)", + "Bash(curl *)", + "Bash(wget *)", + "Bash(ssh *)", + "Bash(scp *)", + "Bash(rsync *)", + "Bash(nc *)", + "Bash(ncat *)", + "Bash(telnet *)", + "Bash(ftp *)", + "Bash(sftp *)", + "Bash(ping *)", + "Bash(rm -rf /)", + "Bash(rm -rf /*)", + "Bash(rm -rf ~)", + "Bash(rm -rf ~/*)", + "Bash(rm -rf $HOME*)", + "Bash(rm -rf .)", + "Bash(rm -rf ..)", + "Bash(rm -rf *)", + "Bash(dd *)", + "Bash(mkfs*)", + "Bash(fdisk *)", + "Bash(parted *)", + "Bash(mount *)", + "Bash(umount *)", + "Bash(shutdown*)", + "Bash(reboot*)", + "Bash(poweroff*)", + "Bash(halt*)", + "Bash(systemctl *)", + "Bash(service *)", + "Bash(crontab *)", + "Bash(iptables *)", + "Bash(chmod -R 777 *)", + "Bash(chown -R *)" + ] + }, + "hooks": { + "Stop": [ + { + "hooks": [ + { + "type": "command", + "command": "n=0; for name in xenia_canary xenia-rs; do pids=$(pgrep -x \"$name\" 2>/dev/null || true); if [ -n \"$pids\" ]; then cnt=$(echo \"$pids\" | wc -l); n=$((n + cnt)); kill $pids 2>/dev/null; sleep 0.2; kill -9 $pids 2>/dev/null || true; fi; done; if [ \"$n\" -gt 0 ]; then printf '{\"systemMessage\":\"Stop hook killed %d stale xenia process(es)\"}' \"$n\"; fi", + "timeout": 5 + } + ] + } + ] + } +} diff --git a/migration/project-root/ppc-manual/README.md b/migration/project-root/ppc-manual/README.md new file mode 100644 index 0000000..f74f2b4 --- /dev/null +++ b/migration/project-root/ppc-manual/README.md @@ -0,0 +1,115 @@ +# PowerPC Instruction Manual (Xenia Xbox 360 Subset) + +A reference for the **Xenon** PowerPC dialect used by the Xbox 360. Its +primary audience is an AI agent translating PPC assembly functions into +equivalent C. The content is derived from the two authoritative sources in +this repository — **xenia-canary** (C++ emulator) and **xenia-rs** (Rust +rewrite) — and may be deepened with the IBM AIX PowerPC reference. + +- **455** distinct XML-level instructions (one page each). +- **350** instruction family pages (VMX128 siblings folded). +- **598** assembly mnemonics once runtime `Rc`/`OE`/`LK` variants are expanded — all resolvable through `index.json`. + +## How to use this manual (translation agent) + +1. Parse the 32-bit instruction word and identify the mnemonic. Resolve it + through [`index.json`](index.json): every assembly form (including + `add.`, `addo.`, `bclrl`, …) is a top-level key pointing at a page. +2. Open the page referenced by `index.json[mnem].page`. The page is in a + fixed format — see the "Page anatomy" section below. +3. Emit a C translation consistent with the page's pseudocode, the + registers-affected list, and the status-register effects. + +## Page anatomy + +Every instruction page has the same sections, in this order: + +| Section | Purpose | +| --- | --- | +| **Assembler Mnemonics** | Table of every runtime variant (Rc/OE/LK) the base XML entry covers, plus VMX128 siblings. | +| **Syntax** | Canonical assembly template with `[OE]`/`[Rc]`/`[LK]` bracketed-modifier notation. | +| **Encoding** | Form name, opcode word, primary/extended opcodes, and bit-layout table. | +| **Operands** | Every bit-field operand, its role per variant, and its meaning. | +| **Register Effects** | Unconditional vs. conditional reads and writes, per variant. | +| **Status-Register Effects** | CR0/CR1/CR6, XER[CA/OV/SO], FPSCR, VSCR updates. | +| **Operation** | PPC-style pseudocode (`RT <- …`, `EXTS(…)`, `MEM(EA, n)`). | +| **C Translation Example** | Minimal idiomatic C rendering a translator could emit. | +| **Implementation References** | Direct links into `xenia-canary/` and `xenia-rs/` with line numbers. | +| **Special Cases & Edge Conditions** | RA=0, alignment, endian byte-reverse, reservation, SPR remapping, VMX128 fusion. | +| **Related Instructions** | Sibling cross-links. | +| **IBM Reference** | Optional link to IBM AIX PPC reference for canonical pseudocode. | + +Sections between the `` and `` +sentinels are produced by [`generator/generate_manual.py`](generator/generate_manual.py) +and re-generated on every run. Sections outside the sentinels are +hand-written and preserved across re-runs. + +## Conventions + +- **Bit numbering** follows PowerPC (big-endian, bit 0 = MSB). +- **GPRs** are 64-bit. 32-bit operations operate on bits `[32:63]` and + conventionally write the low 32 bits with zero- or sign-extension into + the high 32 bits. Page pseudocode makes this explicit when it matters. +- **Vector registers** are 128-bit with **lane 0 at the most-significant + byte** (big-endian lane indexing). On x86 hosts byte-swap is applied at + load/store to preserve this invariant. +- **CR** is 8 × 4-bit fields `CR0..CR7`, each `{LT, GT, EQ, SO}`. The record + form of arithmetic instructions writes CR0 (integer) or CR1 (FPU); the + record form of vector compare writes CR6 = `{all-true, 0, all-false, 0}`. +- **XER** holds `SO`, `OV`, and `CA` at bits 32, 33, 34 respectively + (PPC bit numbering), plus a 7-bit string length used by `lswi`/`stswi`. + +## Categories + +| Category | Families | XML entries | Description | +| --- | --- | --- | --- | +| [Integer ALU](categories/alu.md) | 70 | 70 | Fixed-point add/sub/multiply/divide, logical, rotate, shift, compare, count-leading-zeros, sign-extension, trap-on-condition. | +| [Branch & System](categories/branch.md) | 9 | 9 | Unconditional / conditional branches, branch to LR/CTR, traps, system call. | +| [Control / CR / SPR](categories/control.md) | 26 | 26 | Condition-register logical ops, CR field moves, mfspr/mtspr/mtcrf, time-base reads, synchronisation (sync, isync, eieio). | +| [Floating-Point](categories/fpu.md) | 33 | 33 | IEEE-754 add/sub/mul/div/sqrt, fused multiply-add, conversions, compares, FPSCR moves. | +| [Memory](categories/memory.md) | 56 | 112 | Loads/stores for byte, half, word, doubleword, float, multiple and string; cache management (dcbt, dcbf, dcbz); reservation pair lwarx/stwcx. | +| [VMX (Altivec)](categories/vmx.md) | 144 | 193 | 128-bit SIMD over 32 registers V0–V31. Integer/float arithmetic, logical, compare, permute/merge, pack/unpack, saturation helpers. | +| [VMX128](categories/vmx128.md) | 12 | 12 | Xbox-360-specific Altivec extension that widens the vector register file to 128 registers (V0–V127). Register IDs are encoded with bit-fusion across non-contiguous fields. | + +## Forms + +| Form | Count | Page | +| --- | --- | --- | +| `A` | 21 | [forms/A.md](forms/A.md) | +| `B` | 1 | [forms/B.md](forms/B.md) | +| `D` | 40 | [forms/D.md](forms/D.md) | +| `DCBZ` | 2 | [forms/DCBZ.md](forms/DCBZ.md) | +| `DS` | 5 | [forms/DS.md](forms/DS.md) | +| `I` | 1 | [forms/I.md](forms/I.md) | +| `M` | 3 | [forms/M.md](forms/M.md) | +| `MD` | 4 | [forms/MD.md](forms/MD.md) | +| `MDS` | 2 | [forms/MDS.md](forms/MDS.md) | +| `SC` | 1 | [forms/SC.md](forms/SC.md) | +| `VA` | 14 | [forms/VA.md](forms/VA.md) | +| `VC` | 13 | [forms/VC.md](forms/VC.md) | +| `VX` | 117 | [forms/VX.md](forms/VX.md) | +| `VX128` | 34 | [forms/VX128.md](forms/VX128.md) | +| `VX128_1` | 16 | [forms/VX128_1.md](forms/VX128_1.md) | +| `VX128_2` | 1 | [forms/VX128_2.md](forms/VX128_2.md) | +| `VX128_3` | 15 | [forms/VX128_3.md](forms/VX128_3.md) | +| `VX128_4` | 2 | [forms/VX128_4.md](forms/VX128_4.md) | +| `VX128_5` | 1 | [forms/VX128_5.md](forms/VX128_5.md) | +| `VX128_P` | 1 | [forms/VX128_P.md](forms/VX128_P.md) | +| `VX128_R` | 5 | [forms/VX128_R.md](forms/VX128_R.md) | +| `X` | 117 | [forms/X.md](forms/X.md) | +| `XFL` | 1 | [forms/XFL.md](forms/XFL.md) | +| `XFX` | 4 | [forms/XFX.md](forms/XFX.md) | +| `XL` | 12 | [forms/XL.md](forms/XL.md) | +| `XO` | 21 | [forms/XO.md](forms/XO.md) | +| `XS` | 1 | [forms/XS.md](forms/XS.md) | + +## Regenerating this manual + +```bash +python3 generator/generate_manual.py +``` + +Re-running the generator is safe — it only rewrites sections between +`` / `` sentinels. Add +your hand-written content below the `END` marker and it will be +preserved. diff --git a/migration/project-root/ppc-manual/TEMPLATE.md b/migration/project-root/ppc-manual/TEMPLATE.md new file mode 100644 index 0000000..8010c00 --- /dev/null +++ b/migration/project-root/ppc-manual/TEMPLATE.md @@ -0,0 +1,159 @@ +# Page Template — Canonical Structure + +This file is the **reference for every instruction page** in the manual. +It documents the section order, what each section is for, and the +formatting conventions that the Phase 1 generator emits and Phase 2 +reviewers enhance. + +Do **not** copy this file to create a new page. Instruction pages are +produced by `generator/generate_manual.py` and should stay under the +generator's control. + +--- + +## Page anatomy (section order) + +Every page follows this skeleton: + +```markdown +# `` — + +> **Category:** [](../categories/.md) · **Form:** [
](../forms/.md) · **Opcode:** `0x........` + + + +## Assembler Mnemonics + + +## Syntax + + +## Encoding + + +## Operands + + +## Register Effects + + +## Status-Register Effects + + +## Operation (pseudocode) + + +## C Translation Example + + +## Implementation References + block with the frozen interpreter body snapshot> + + + +## Special Cases & Edge Conditions + + +## Related Instructions + + +## IBM Reference + +``` + +### Sentinel markers + +The `` / `` pair +separates machine-generated content from hand-written enhancement. + +- **Inside the sentinels:** rewritten on every generator run. Do not + edit — your changes will be overwritten. +- **Outside the sentinels:** preserved across regenerations. Put all + Phase 2 enhancements there. + +If a page's generated section is missing the `END` sentinel the +generator assumes a human has fully taken over and leaves the file +untouched. + +--- + +## Writing conventions + +### Pseudocode + +Follow IBM's AIX reference style: + +``` +EA <- (RA|0) + EXTS(d) +RT <- ZEXT32_to_64(MEM(EA, 4)) +``` + +- `<-` for assignment, `(X)` for "value of register X". +- `(RA|0)` for RA0 fields — literal 0 when the encoded register is 0. +- `EXTS(x)` = sign-extend, `ZEXT/SEXT_to_(x)` explicit when helpful. +- `||` for bit concatenation. +- `MEM(EA, n)` for an n-byte big-endian memory read. +- `CR[BF] <- ...` for CR field updates. +- Register bit numbering in pseudocode is **PowerPC big-endian** (bit 0 + is the MSB). `(RS)[56:63]` is the low byte of RS. + +### C translation + +- Use a pseudo-context of `r[]` (GPRs), `f[]` (FPRs), `v[]` (128-bit + vectors), `cr[]` (CR fields), and scalars `xer`, `lr`, `ctr`, `pc`. +- Prefer `int64_t`/`uint64_t` for 64-bit ops; cast to `int32_t` for + 32-bit sub-word ops and explicitly sign/zero-extend. +- Use `mem_read_u32_be` / `mem_write_vec128_be` style helpers to make + the big-endian memory model explicit. +- Show the base form always; add one annotated variant (e.g. `if + (insn.Rc) ...`) only when it genuinely changes the translation. + +### Bit ordering + +PowerPC big-endian bit numbering is used throughout the manual (bit 0 +is MSB). The same convention is used by IBM's reference and by the XML +source. Do not switch to Intel-style LSB-first numbering. + +### Vector lane indexing + +Altivec / VMX uses **big-endian lane indexing**: lane 0 is the +most-significant 16 bytes (or 4 words / 2 doublewords) of the vector. +On little-endian hosts (x86-64) byte-swap is applied at load/store to +preserve this invariant — call that out in the "Special Cases" section +when relevant. + +--- + +## Example: fully-reviewed page + +See `alu/addx.md` after Phase 2 review for the canonical look. It +demonstrates: + +- Complete operand descriptions (not TODO stubs). +- Pseudocode with explicit CR/XER updates. +- C translation covering the base form plus Rc and OE variants. +- "Special Cases" calling out 32-bit vs 64-bit overflow tracking. +- Cross-links to `addcx`, `addex`, `subfx`. + +--- + +## Golden-path spot-check list (Phase 2 review order) + +These pages are reviewed first; they anchor the expected quality bar: + +| Page | Reason | +| --- | --- | +| `alu/addx.md` | XO form with Rc+OE — representative ALU | +| `alu/addi.md` | D form with RA0 semantics — representative immediate | +| `memory/lwz.md` | D form load family (lwz/lwzu/lwzx/lwzux) | +| `memory/stvx.md` | Vector store with alignment mask | +| `memory/lvsl.md` | Permute-control generator | +| `branch/bclrx.md` | BO/BI conditional with LK | +| `branch/bx.md` | Absolute/relative branch | +| `control/mfspr.md` | SPR halves-swap encoding | +| `control/mtcrf.md` | CR field-mask update | +| `fpu/faddx.md` | Double-precision FPU + CR1 via Rc | +| `vmx/vaddfp.md` | VMX + VMX128 sibling on one page | +| `vmx/vperm.md` | Byte permute — lane-indexing discipline | +| `vmx128/vpkd3d128.md` | VMX128 orphan (no non-128 sibling) | diff --git a/migration/project-root/ppc-manual/alu/addcx.md b/migration/project-root/ppc-manual/alu/addcx.md new file mode 100644 index 0000000..617a2a2 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addcx.md @@ -0,0 +1,138 @@ +# `addcx` — Add Carrying + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000014` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addc` | `addcx` | — | Add Carrying | +| `addco` | `addcx` | OE=1 | Add Carrying | +| `addc.` | `addcx` | Rc=1 | Add Carrying | +| `addco.` | `addcx` | OE=1, Rc=1 | Add Carrying | + +## Syntax + +```asm +addc[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `addcx` — form `XO` + +- **Opcode word:** `0x7c000014` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `10` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addcx: read | Source GPR (`r0`–`r31`). | +| `RB` | addcx: read | Source GPR. | +| `RD` | addcx: write | Destination GPR. | +| `CA` | addcx: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `OE` | addcx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | addcx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addcx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `addcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RT <- (RA) + (RB) +CA <- carry_out_of_32_or_64_bit_add((RA), (RB)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:64`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L64) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:861`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L861) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:190-205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L190-L205) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addcx => { + // PPCBUG-013+020: 32-bit truncation; CA from u32 unsigned compare. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let result32 = ra32.wrapping_add(rb32); + ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Carry-out is mandatory.** `XER[CA]` is updated unconditionally — `addcx` exists *to* produce the carry. It seeds a multi-word add chain that continues with [`addex`](addex.md) for middle words and [`addzex`](addzex.md)/[`addmex`](addmex.md) for the final word. +- **Carry detection by overflow comparison.** Xenia computes `CA = (result < RA)` — the standard unsigned-add overflow test. Equivalent to `CA = (RA + RB) >> 64` mathematically. This is correct for the 64-bit operand width that the Xenon implements; the spec also allows a 32-bit width selected by the implementation but the 970/Xenon use 64-bit add throughout. +- **No trap on signed overflow.** `addco`/`addco.` only set `XER[OV]` and sticky `XER[SO]`; they do not raise an exception. Xenia-rs leaves the `OE` branch as a `// TODO` (see [`addx`](addx.md) for the same gap). +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** The `Rc=1` CR0 compare reads `result as i32 as i64` in [`interpreter.rs:97`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L97); spec demands the full 64-bit signed compare. Flag this as a xenia-rs quirk if you need bit-exact behaviour. +- **`XER[SO]` is sticky** — only `mcrxr` clears it. The `Rc=1` form folds it into `CR0[SO]`. +- **Operand aliasing is legal**, just like [`addx`](addx.md). `addc r3, r3, r3` simply doubles `r3` and records whether the result wrapped. + +## Related Instructions + +- [`addx`](addx.md) — same operation, but does **not** update `XER[CA]`. +- [`addex`](addex.md) — `RA + RB + XER[CA]`; chains a multi-word add after `addcx`. +- [`addmex`](addmex.md), [`addzex`](addzex.md) — terminate a carry chain by adding `−1` or `0` to `XER[CA]`. +- [`addic`](addic.md), [`addicx`](addicx.md) — D-form immediate variants that also write `XER[CA]`. +- [`subfcx`](subfcx.md) — the dual: produces a borrow-out in `XER[CA]`. + +## IBM Reference + +- [AIX 7.3 — `addc` (Add Carrying)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addc-add-carrying-instruction) +- PowerISA v2.07B, Book I, §3.3.8 — Fixed-Point Add with Carry; defines `XER[CA]` semantics independent of operand width. diff --git a/migration/project-root/ppc-manual/alu/addex.md b/migration/project-root/ppc-manual/alu/addex.md new file mode 100644 index 0000000..94eeaaf --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addex.md @@ -0,0 +1,139 @@ +# `addex` — Add Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000114` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `adde` | `addex` | — | Add Extended | +| `addeo` | `addex` | OE=1 | Add Extended | +| `adde.` | `addex` | Rc=1 | Add Extended | +| `addeo.` | `addex` | OE=1, Rc=1 | Add Extended | + +## Syntax + +```asm +adde[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `addex` — form `XO` + +- **Opcode word:** `0x7c000114` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `138` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addex: read | Source GPR (`r0`–`r31`). | +| `RB` | addex: read | Source GPR. | +| `CA` | addex: read | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `RD` | addex: write | Destination GPR. | +| `OE` | addex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | addex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addex` + +- **Reads (always):** `RA`, `RB`, `CA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `addex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- (RA) + (RB) + CA +CA <- carry_out_of_the_add +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:83`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L83) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:868`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L868) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:206-222`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L206-L222) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addex => { + // PPCBUG-014+020: 32-bit truncation; CA from u32 unsigned compare. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = ra32.wrapping_add(rb32).wrapping_add(ca); + ctx.xer_ca = if result32 < ra32 || (ca != 0 && result32 == ra32) { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128) + (ca as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Carry-in is consumed and carry-out is produced.** `addex` is the middle link of a multi-word add chain seeded by [`addcx`](addcx.md): `RT ← RA + RB + XER[CA]`, then `XER[CA] ← carry_out`. +- **Carry-out detection handles both edges.** Xenia checks `result < ra OR (ca != 0 && result == ra)` — that second clause covers the case where adding the carry-in alone causes the result to *exactly equal* `RA` (i.e. `RB == ~0 && CA == 1`), which still constitutes overflow. The naive `result < ra` test misses it. +- **No trap on signed overflow.** `addeo`/`addeo.` only update `XER[OV]` and sticky `XER[SO]`; xenia-rs leaves the `OE` branch unimplemented. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** The `Rc=1` arm uses `result as i32 as i64`. For multi-word adds whose final word is the high 32 bits of a 64-bit value, this distinction matters; see [`addx`](addx.md). +- **`XER[CA]` must be initialised** by an earlier [`addcx`](addcx.md), [`subfcx`](subfcx.md), or `mtspr` to XER. Reading stale `CA` from an unrelated instruction is the most common bug in hand-written multi-word arithmetic. +- **`XER[SO]` is sticky** until cleared by `mcrxr`; `Rc=1` copies it into `CR0[SO]`. + +## Related Instructions + +- [`addcx`](addcx.md) — seeds the carry chain (no `CA` read, sets `CA`). +- [`addmex`](addmex.md), [`addzex`](addzex.md) — terminate the chain (`RA + −1 + CA`, `RA + 0 + CA`). +- [`addx`](addx.md) — plain add without `XER[CA]`. +- [`subfex`](subfex.md) — dual: `~RA + RB + XER[CA]`, used for multi-word subtract. +- [`addic`](addic.md) / [`addicx`](addicx.md) — immediate carrying adds that *initialise* a chain. + +## IBM Reference + +- [AIX 7.3 — `adde` (Add Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-adde-add-extended-instruction) +- PowerISA v2.07B, Book I, §3.3.8 — defines the `RA + RB + CA` carry-chain composition. diff --git a/migration/project-root/ppc-manual/alu/addi.md b/migration/project-root/ppc-manual/alu/addi.md new file mode 100644 index 0000000..b5e8390 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addi.md @@ -0,0 +1,116 @@ +# `addi` — Add Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x38000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addi` | `addi` | — | Add Immediate | + +## Syntax + +```asm +addi [RD], [RA0], [SIMM] +``` + +## Encoding + +### `addi` — form `D` + +- **Opcode word:** `0x38000000` +- **Primary opcode (bits 0–5):** `14` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | addi: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `SIMM` | addi: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | addi: write | Destination GPR. | + +## Register Effects + +### `addi` + +- **Reads (always):** `RA0`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if RA = 0 then RT <- EXTS(SIMM) +else RT <- (RA) + EXTS(SIMM) +``` + +## C Translation Example + +```c +/* addi RT, RA, SIMM — RA=0 means literal 0 */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +r[insn.RT] = base + (uint64_t)(int64_t)(int16_t)insn.SIMM; +``` + +## Implementation References + +**`addi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:103`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L103) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:338`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L338) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:114-120`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L114-L120) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addi => { + // PPCBUG-001: 32-bit ABI. `li rT, -1` (= addi rT, r0, -1) must produce + // 0x00000000_FFFFFFFF, not 0xFFFFFFFF_FFFFFFFF (sign-extended simm16). + let ra_val = if instr.ra() == 0 { 0 } else { ctx.gpr[instr.ra()] }; + ctx.gpr[instr.rd()] = ra_val.wrapping_add(instr.simm16() as i64 as u64) as u32 as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA0` semantics.** When the encoded `RA` field is `0` the operand is the literal constant `0`, **not** the value of `r0`. This lets `addi rT, 0, SIMM` load a constant (the `li rT, SIMM` simplified mnemonic). To use `r0`'s value you must use a register-register add (`add RT, r0, RB` through a temp) or an instruction without `RA0` semantics. +- **No flags written.** Unlike `add`, `addi` cannot be `Rc` or `OE` — no CR or XER update. Use [`addic`](addic.md) if you need `XER[CA]`, or [`addicx`](addicx.md) (`addic.`) if you need both `XER[CA]` and a CR0 update. +- **Immediate is 16-bit signed** (`SIMM`, range `−32768 … +32767`), sign-extended to 64 bits before the add. No carry/overflow is produced regardless of the result. +- **Simplified mnemonics.** Assemblers recognise several aliases that all assemble to `addi`: + - `li RT, SIMM` ≡ `addi RT, 0, SIMM` (load immediate; relies on `RA0`). + - `la RT, D(RA)` ≡ `addi RT, RA, D` (load address; purely syntactic). + - `subi RT, RA, SIMM` ≡ `addi RT, RA, −SIMM`. +- **PC-relative idiom.** `addi RT, RA, D` is the low-half completion of a two-instruction address load preceded by [`addis`](addis.md) `RT, 0, HI`. The assembler emits `@ha`/`@l` relocations so the low half can be negative without corrupting the high half (add-compensation). + +## Related Instructions + +- [`addis`](addis.md) — same encoding family but the immediate is shifted left by 16 bits. Together they build any 32-bit constant or PC-relative address. +- [`addic`](addic.md), [`addicx`](addicx.md) — D-form adds that **do** set `XER[CA]` (and CR0 for the record form). +- [`addx`](addx.md) — the register-register form. +- [`subfic`](subfic.md) — reverse-subtract immediate (`imm − RA`) with carry. +- [`ori`](ori.md), [`oris`](oris.md) — the alternative D-form constant-building instructions (but these don't add, they OR). + +## IBM Reference + +- [AIX 7.3 — `addi` (Add Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addi-add-immediate-instruction) +- [AIX 7.3 — `li` (Load Immediate, simplified mnemonic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-li-load-immediate) diff --git a/migration/project-root/ppc-manual/alu/addic.md b/migration/project-root/ppc-manual/alu/addic.md new file mode 100644 index 0000000..288c4eb --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addic.md @@ -0,0 +1,123 @@ +# `addic` — Add Immediate Carrying + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x30000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addic` | `addic` | — | Add Immediate Carrying | + +## Syntax + +```asm +addic [RD], [RA], [SIMM] +``` + +## Encoding + +### `addic` — form `D` + +- **Opcode word:** `0x30000000` +- **Primary opcode (bits 0–5):** `12` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addic: read | Source GPR (`r0`–`r31`). | +| `SIMM` | addic: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | addic: write | Destination GPR. | +| `CA` | addic: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `addic` + +- **Reads (always):** `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `addic`: **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RT <- (RA) + EXTS(SIMM) +CA <- carry_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addic`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addic"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:117`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L117) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:336`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L336) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:135-144`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L135-L144) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addic => { + // PPCBUG-002: 32-bit ABI. CA must be from a 32-bit unsigned compare; + // canary's `AddDidCarry` truncates both operands to int32 first. + let ra32 = ctx.gpr[instr.ra()] as u32; + let imm32 = instr.simm16() as i32 as u32; + let result32 = ra32.wrapping_add(imm32); + ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Immediate is sign-extended.** `SIMM` is a 16-bit signed value extended to 64 bits before the add. So `addic r3, r4, -1` adds `0xFFFFFFFFFFFFFFFF` to `r4` — it does not zero-extend. +- **`XER[CA]` always written.** Unlike [`addi`](addi.md), this instruction exists to seed a multi-word add chain with an immediate. Carry-out is computed with the same `result < ra` unsigned-overflow check as [`addcx`](addcx.md). +- **No `Rc` bit available.** This is the *non-record* form. For a record-form variant that also updates `CR0`, use [`addicx`](addicx.md) (`addic.`). +- **No `OE` bit either.** `addic` cannot raise / observe signed overflow — only the carry. If you need `XER[OV]` you must use the XO-form [`addcx`](addcx.md) with `OE=1`. +- **`RA = 0` reads register r0.** Unlike [`addi`](addi.md), `addic` does **not** treat the `RA` field of zero as a literal zero. The PowerISA gives this instruction the regular `RA` semantics, not `RA0`. +- **Subtract immediate carrying via negation.** There is no `subic` mnemonic; assemblers synthesise `subic RT, RA, value` as `addic RT, RA, -value` (when `value` fits in 16 bits signed). + +## Related Instructions + +- [`addicx`](addicx.md) — same operation plus `Rc=1` CR0 update. +- [`addi`](addi.md) — D-form add immediate without `XER[CA]`. +- [`addis`](addis.md) — shifted form (immediate << 16). +- [`addcx`](addcx.md) — XO-form: register operands, sets `XER[CA]`. +- [`subfic`](subfic.md) — D-form: `RT ← SIMM − RA` with `XER[CA]`. + +## IBM Reference + +- [AIX 7.3 — `addic` (Add Immediate Carrying)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addic-add-immediate-carrying-instruction) diff --git a/migration/project-root/ppc-manual/alu/addicx.md b/migration/project-root/ppc-manual/alu/addicx.md new file mode 100644 index 0000000..972971a --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addicx.md @@ -0,0 +1,132 @@ +# `addic.` — Add Immediate Carrying and Record + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x34000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addic.` | `addic.` | — | Add Immediate Carrying and Record | + +## Syntax + +```asm +addic. [RD], [RA], [SIMM] +``` + +## Encoding + +### `addic.` — form `D` + +- **Opcode word:** `0x34000000` +- **Primary opcode (bits 0–5):** `13` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addic.: read | Source GPR (`r0`–`r31`). | +| `SIMM` | addic.: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | addic.: write | Destination GPR. | +| `CA` | addic.: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `CR` | addic.: write | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addic.` + +- **Reads (always):** `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA`, `CR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `addic.`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always).; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addic.`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addic."`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:127`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L127) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:337`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L337) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:145-154`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L145-L154) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addicx => { + // PPCBUG-003: same fix as addic plus CR0 i32 view. + let ra32 = ctx.gpr[instr.ra()] as u32; + let imm32 = instr.simm16() as i32 as u32; + let result32 = ra32.wrapping_add(imm32); + ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ctx.update_cr_signed(0, result32 as i32 as i64); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`Rc` bit is implicit, not encoded.** `addic.` has its *own* primary opcode (13) distinct from `addic`'s (12); there is no `Rc` field to set. The two forms are sibling D-form instructions, not flag variants of one encoding. +- **CR0 update is unconditional.** Unlike XO-form `Rc=1` instructions, `addic.` always updates `CR0` from the result; the `.` is part of the mnemonic itself. +- **Common idiom: `addic. rN, rN, -1`** — decrements `rN` and sets `CR0[EQ]` when it reaches zero, in a single instruction. Frequently used as a loop counter (often paired with `bne+ loop`). +- **`XER[CA]` written same as [`addic`](addic.md).** The carry-out from the unsigned 64-bit add is recorded; the `.` only adds the CR update on top. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:65`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L65) computes `result as i32 as i64`; spec demands a full 64-bit compare-to-zero. The truncation is a documented xenia-rs quirk shared with the rest of the carrying-add family. +- **`SIMM` is sign-extended** to 64 bits before the add — `addic. r3, r4, -1` adds `~0` and never sets `CR0[EQ]` unless `r4 == 1`. + +## Related Instructions + +- [`addic`](addic.md) — same op without the CR0 update. +- [`addi`](addi.md), [`addis`](addis.md) — immediate adds without `XER[CA]`. +- [`addcx`](addcx.md) — XO-form register equivalent. +- [`subfic`](subfic.md) — `RT ← SIMM − RA` with `XER[CA]` (no record form exists). +- [`cmpi`](cmpi.md) — explicit immediate compare when the carry side-effect would be unwanted. + +## IBM Reference + +- [AIX 7.3 — `addic.` (Add Immediate Carrying and Record)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addic-add-immediate-carrying-record-instruction) diff --git a/migration/project-root/ppc-manual/alu/addis.md b/migration/project-root/ppc-manual/alu/addis.md new file mode 100644 index 0000000..ae14a45 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addis.md @@ -0,0 +1,121 @@ +# `addis` — Add Immediate Shifted + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x3c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addis` | `addis` | — | Add Immediate Shifted | + +## Syntax + +```asm +addis [RD], [RA0], [SIMM] +``` + +## Encoding + +### `addis` — form `D` + +- **Opcode word:** `0x3c000000` +- **Primary opcode (bits 0–5):** `15` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | addis: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `SIMM` | addis: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | addis: write | Destination GPR. | + +## Register Effects + +### `addis` + +- **Reads (always):** `RA0`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if RA = 0 then RT <- EXTS(SIMM) << 16 +else RT <- (RA) + (EXTS(SIMM) << 16) +``` + +## C Translation Example + +```c +/* addis RT, RA, SIMM — RA=0 means literal 0 */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +r[insn.RT] = base + ((uint64_t)(int64_t)(int16_t)insn.SIMM << 16); +``` + +## Implementation References + +**`addis`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addis"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:138`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L138) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:339`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L339) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:121-134`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L121-L134) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addis => { + // Xbox 360 user mode is 32-bit ABI (MSR.SF=0), so addis must + // produce a value whose upper 32 bits don't pollute downstream + // 64-bit arithmetic. The PPC ISA in 64-bit mode sign-extends + // simm16 before the shift, producing 0xFFFFFFFF_xxxx0000 for + // negative simm16 (high bit set). When this value flows into + // a 64-bit subfc against a zero-extended lwz value, the unsigned + // 64-bit comparison yields wrong CA. Truncate to 32 bits to + // simulate 32-bit ABI behavior. + let ra_val = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let result = ra_val.wrapping_add((instr.simm16() as i64 as u64) << 16); + ctx.gpr[instr.rd()] = result as u32 as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA0` semantics.** When the `RA` field encodes 0, the operand is the literal 64-bit zero, **not** `r0`. This makes `addis RT, 0, hi16` the canonical "load high half" idiom. To use `r0`'s actual value as a base, copy it via `mr` first or use a different opcode. +- **Immediate is sign-extended *then* shifted left 16.** So `addis r3, 0, 0x8000` writes `0xFFFFFFFF80000000`, not `0x000000008000_0000`. The 32-bit sign extension surprise is the most common bug in hand-written PPC assembly. +- **Forms the high half of a 32-bit immediate.** The classic `lis rT, hi; ori rT, rT, lo` (or `lis`/`addi`) sequence builds a full 32-bit constant. `lis rT, val` is a simplified mnemonic for `addis rT, 0, val`. +- **No `XER[CA]`, no `XER[OV]`, no `Rc`.** This instruction has no status side-effects whatsoever. Use [`addic`](addic.md) or [`addcx`](addcx.md) if a carry is required. +- **64-bit `RA` operand.** The shift-and-add is 64-bit on the Xenon; the immediate's sign-extension fills the high 48 bits. So `addis r3, r4, -1` adds `0xFFFFFFFFFFFF0000` to a 64-bit `r4`. +- **No overflow detection.** `lis r3, 0x7FFF; addis r3, r3, 0x7FFF` happily wraps without comment. + +## Related Instructions + +- [`addi`](addi.md) — D-form add immediate, no shift; same `RA0` rule. +- [`addic`](addic.md), [`addicx`](addicx.md) — immediate adds that also write `XER[CA]`. +- [`oris`](oris.md), [`ori`](ori.md) — pair with `addis`/`lis` to build 32-bit constants without affecting CR or XER. +- [`addx`](addx.md), [`addcx`](addcx.md) — XO-form register adds. +- `lis` (simplified) — assembler shorthand for `addis RT, 0, value`. + +## IBM Reference + +- [AIX 7.3 — `addis` (Add Immediate Shifted)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addis-add-immediate-shifted-instruction) +- [AIX 7.3 — `lis` (Load Immediate Shifted, simplified mnemonic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-li-lis-load-immediate-load-immediate-shifted) diff --git a/migration/project-root/ppc-manual/alu/addmex.md b/migration/project-root/ppc-manual/alu/addmex.md new file mode 100644 index 0000000..d476a79 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addmex.md @@ -0,0 +1,136 @@ +# `addmex` — Add to Minus One Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0001d4` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addme` | `addmex` | — | Add to Minus One Extended | +| `addmeo` | `addmex` | OE=1 | Add to Minus One Extended | +| `addme.` | `addmex` | Rc=1 | Add to Minus One Extended | +| `addmeo.` | `addmex` | OE=1, Rc=1 | Add to Minus One Extended | + +## Syntax + +```asm +addme[OE][Rc] [RD], [RA] +``` + +## Encoding + +### `addmex` — form `XO` + +- **Opcode word:** `0x7c0001d4` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `234` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addmex: read | Source GPR (`r0`–`r31`). | +| `CA` | addmex: read; addmex: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `RD` | addmex: write | Destination GPR. | +| `OE` | addmex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | addmex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addmex` + +- **Reads (always):** `RA`, `CA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `addmex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RT <- (RA) + CA + 0xFFFF_FFFF_FFFF_FFFF +CA <- carry_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addmex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addmex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:152`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L152) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:873`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L873) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:239-254`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L239-L254) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addmex => { + // PPCBUG-016+020: 32-bit truncation. RT = RA + CA - 1. + let ra32 = ctx.gpr[instr.ra()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = ra32.wrapping_add(ca).wrapping_sub(1); + ctx.xer_ca = if ra32 != 0 || ca != 0 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = (ra32 as i32 as i128) + (ca as i128) - 1; + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No `RB` field used.** `addmex` is encoded in XO-form but ignores the `RB` slot — assemblers must still emit a value (typically zero). Disassemblers that parse a non-zero `RB` should not flag it as illegal; it is simply unused. +- **Operation is `RA + CA + (−1)`**, i.e. `RA - 1 + CA`. Used to terminate a multi-word *subtract* chain when the high source word is implicitly all-ones (e.g. computing `-x` as `~x + 1` across 128 bits). +- **Carry-out predicate is `RA != 0 OR CA != 0`.** Equivalently, `CA' = NOT(RA == 0 AND CA == 0)`. Adding `−1` to anything except a zero-with-no-carry produces a carry-out (no borrow needed). This terse form in xenia-rs is correct but easy to misread. +- **Overflow not implemented in xenia-rs.** The `OE=1` path is silently a no-op; spec says set `XER[OV]` if the signed result wraps. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:139`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L139) — same quirk as the rest of the add family. +- **`XER[CA]` must be initialised** by an earlier carrying instruction. `addme` is a *terminator*, not a seed. + +## Related Instructions + +- [`addzex`](addzex.md) — terminate a chain with `RA + 0 + CA` (no `−1`). +- [`addex`](addex.md) — middle-of-chain `RA + RB + CA`. +- [`addcx`](addcx.md) — seeds a chain. +- [`subfmex`](subfmex.md) — subtract dual: `~RA + (−1) + CA`. +- [`negx`](negx.md) — single-instruction two's-complement negate. + +## IBM Reference + +- [AIX 7.3 — `addme` (Add to Minus One Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addme-add-minus-one-extended-instruction) diff --git a/migration/project-root/ppc-manual/alu/addx.md b/migration/project-root/ppc-manual/alu/addx.md new file mode 100644 index 0000000..0e0003f --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addx.md @@ -0,0 +1,148 @@ +# `addx` — Add + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000214` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `add` | `addx` | — | Add | +| `addo` | `addx` | OE=1 | Add | +| `add.` | `addx` | Rc=1 | Add | +| `addo.` | `addx` | OE=1, Rc=1 | Add | + +## Syntax + +```asm +add[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `addx` — form `XO` + +- **Opcode word:** `0x7c000214` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `266` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addx: read | Source GPR (`r0`–`r31`). | +| `RB` | addx: read | Source GPR. | +| `RD` | addx: write | Destination GPR. | +| `OE` | addx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | addx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `addx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- (RA) + (RB) +``` + +## C Translation Example + +```c +/* add / add. / addo / addo. (XO-form) */ +uint64_t a = r[insn.RA], b = r[insn.RB]; +uint64_t result = a + b; +r[insn.RT] = result; +if (insn.OE) { bool ov = (~(a ^ b) & (a ^ result)) >> 63; + if (ov) { xer.OV = 1; xer.SO = 1; } else xer.OV = 0; } +if (insn.Rc) update_cr0_signed((int64_t)result); +``` + +## Implementation References + +**`addx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:50`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L50) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:875`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L875) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:175-189`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L175-L189) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addx => { + // PPCBUG-012+020: 32-bit ABI writeback truncation + CR0 i32 view. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let result32 = ra32.wrapping_add(rb32); + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = (ra32 as i32 as i128) + (rb32 as i32 as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +RT <- (RA) + (RB) ; modulo 2^64, carry discarded +if OE then + XER[OV] <- (~(RA ^ RB) & (RA ^ RT))[0] ; signed overflow: same-sign inputs, opposite-sign result + XER[SO] <- XER[SO] | XER[OV] +if Rc then + CR0[LT,GT,EQ] <- signed_compare(RT, 0) ; 64-bit comparison on the Xenon + CR0[SO] <- XER[SO] +``` + +## Special Cases & Edge Conditions + +- **No trap on overflow.** `addo` / `addo.` record overflow in `XER[OV]` and sticky-set `XER[SO]`. A trap can only be produced by a separate `td`/`tw` instruction examining the result. +- **Signed-overflow predicate.** Overflow occurs iff both addends share a sign bit and the result has the opposite sign bit: `OV = ((~(a ^ b)) & (a ^ rt)) >> 63`. Unsigned carry is *not* tracked — use [`addcx`](addcx.md) when you need `XER[CA]`. +- **`XER[SO]` is sticky.** Once set, it remains set until cleared by `mcrxr`. The `.` record forms copy it into `CR0[SO]`. +- **64-bit CR update on Xenon.** The Xbox 360 Xenon CPU is 64-bit, so `add.` compares the full 64-bit result against zero. **Xenia-rs presently truncates to 32 bits** before the CR update (`result as i32 as i64` in [`interpreter.rs:95`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L95)). If your translator must match xenia bit-for-bit, emit a 32-bit compare; if it must be spec-correct, emit a 64-bit compare. Most Xbox 360 object code works either way because results that overflow 32 bits are rare outside of explicit 64-bit math. +- **OE overflow detection not emulated in xenia-rs.** The `addo` / `addo.` branch in `interpreter.rs` is a `TODO` stub. A faithful translator should still emit the overflow check — titles rarely observe `XER[OV]`, but it's occasionally used by profiling / sanity-checking code paths. +- **Operand aliasing.** `add r3, r3, r3`, `add r3, r3, r4`, `add r3, r4, r3` are all legal. The addition reads both source operands before writing `RT`. +- **No immediate form.** For `RT = RA + imm` use [`addi`](addi.md) / [`addis`](addis.md). Those are distinct opcodes, not a flag on `add`. + +## Related Instructions + +- [`addcx`](addcx.md) — produces the carry-out in `XER[CA]`. +- [`addex`](addex.md) — sums `(RA) + (RB) + XER[CA]` (carry-in chain). +- [`addmex`](addmex.md), [`addzex`](addzex.md) — add to `−1` or `0` with carry-in (used to propagate borrows across multiword subtracts). +- [`addi`](addi.md), [`addis`](addis.md) — D-form immediate adds; no `Rc`/`OE`. +- [`addic`](addic.md), [`addicx`](addicx.md) — D-form adds that set `XER[CA]`. +- [`subfx`](subfx.md) — the dual: `RT ← (RB) − (RA)`. +- [`negx`](negx.md) — two's-complement negate. + +## IBM Reference + +- [AIX 7.3 — `add` (Add)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-add-instruction) +- [PowerISA v2.07B, Book I, §3.3.8 — Fixed-Point Arithmetic Instructions](https://openpowerfoundation.org/specifications/isa/) (overflow predicate, CR0 / `XER[SO]` semantics). diff --git a/migration/project-root/ppc-manual/alu/addzex.md b/migration/project-root/ppc-manual/alu/addzex.md new file mode 100644 index 0000000..49f32e7 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/addzex.md @@ -0,0 +1,135 @@ +# `addzex` — Add to Zero Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000194` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `addze` | `addzex` | — | Add to Zero Extended | +| `addzeo` | `addzex` | OE=1 | Add to Zero Extended | +| `addze.` | `addzex` | Rc=1 | Add to Zero Extended | +| `addzeo.` | `addzex` | OE=1, Rc=1 | Add to Zero Extended | + +## Syntax + +```asm +addze[OE][Rc] [RD], [RA] +``` + +## Encoding + +### `addzex` — form `XO` + +- **Opcode word:** `0x7c000194` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `202` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | addzex: read | Source GPR (`r0`–`r31`). | +| `CA` | addzex: read; addzex: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `RD` | addzex: write | Destination GPR. | +| `OE` | addzex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | addzex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `addzex` + +- **Reads (always):** `RA`, `CA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `addzex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RT <- (RA) + CA +CA <- carry_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`addzex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="addzex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:172`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L172) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:8`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L8) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:870`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L870) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:223-238`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L223-L238) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::addzex => { + // PPCBUG-015+020: 32-bit truncation. + let ra32 = ctx.gpr[instr.ra()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = ra32.wrapping_add(ca); + ctx.xer_ca = if result32 < ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = (ra32 as i32 as i128) + (ca as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No `RB` field used.** Like [`addmex`](addmex.md), this XO-form instruction ignores the `RB` slot. Assemblers emit zero there. +- **Operation is `RA + 0 + CA` ≡ `RA + CA`.** Used to terminate the *high word* of a multi-word add chain seeded by [`addcx`](addcx.md). After the low-word `addc` produces the carry, all middle words use [`addex`](addex.md), and the final word that has no register operand uses `addze`. +- **Carry-out is the simple unsigned overflow test** `result < ra` — same predicate as [`addcx`](addcx.md). `CA' = 1` only if `RA == ~0 && CA == 1`. +- **`OE=1` not implemented in xenia-rs.** The interpreter has no overflow branch at all; spec asks for the standard signed-overflow detect. +- **64-bit CR update on Xenon, 32-bit in xenia-rs** (truncation in [`interpreter.rs:128`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L128) — see [`addx`](addx.md) for context). +- **Common idiom: extracting a carry as a 0/1.** `addze rT, 0` (or `addze rT, rN` where `rN == 0`) materialises `XER[CA]` into `rT` as a plain integer. + +## Related Instructions + +- [`addmex`](addmex.md) — terminate with `RA + (−1) + CA` instead of `+0`. +- [`addex`](addex.md) — middle of a multi-word add chain. +- [`addcx`](addcx.md) — seeds the chain. +- [`subfzex`](subfzex.md) — subtract dual: `~RA + 0 + CA`. + +## IBM Reference + +- [AIX 7.3 — `addze` (Add to Zero Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-addze-add-zero-extended-instruction) diff --git a/migration/project-root/ppc-manual/alu/andcx.md b/migration/project-root/ppc-manual/alu/andcx.md new file mode 100644 index 0000000..7b1ce28 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/andcx.md @@ -0,0 +1,123 @@ +# `andcx` — AND with Complement + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000078` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `andc` | `andcx` | — | AND with Complement | +| `andc.` | `andcx` | Rc=1 | AND with Complement | + +## Syntax + +```asm +andc[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `andcx` — form `X` + +- **Opcode word:** `0x7c000078` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `60` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | andcx: read | Source GPR (alias for RD in some stores). | +| `RB` | andcx: read | Source GPR. | +| `RA` | andcx: write | Source GPR (`r0`–`r31`). | +| `CR` | andcx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `andcx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `andcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- (RS) & ~(RB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`andcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="andcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:647`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L647) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:9`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L9) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:768`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L768) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:534-541`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L534-L541) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::andcx => { + // PPCBUG-033: !rb on u64 flips upper 32 bits — active poisoning. + let rs32 = ctx.gpr[instr.rs()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = (rs32 & !rb32) as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`andc RA, RS, RB` computes `RS AND (NOT RB)`.** The complement is applied to `RB`, not `RS`. Useful for clearing a bitmask: `andc r3, r3, r4` clears in `r3` every bit set in `r4`. +- **Common idiom: `andc r3, r3, r3`** zeroes `r3` (every bit ANDed with its own complement). Cheaper-looking than `xor r3, r3, r3` on some pipelines but functionally identical; the assembler often prefers the `xor` idiom. +- **Operand convention is the X-form one** (`RA` is the destination, `RS` and `RB` are sources). Same gotcha as [`andx`](andx.md). +- **No `OE`/`XER` side effects.** Only `CR0` is updated when `Rc=1`. +- **64-bit operation** on Xenon; the AND is computed across all 64 bits of `RS` and `~RB`. Xenia-rs uses Rust's bitwise `!` on `u64`, which is the correct full-width complement. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:352`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L352) — same truncation pattern. + +## Related Instructions + +- [`andx`](andx.md) — plain AND (no complement). +- [`nandx`](nandx.md) — NAND (`~(RS & RB)`). +- [`orcx`](orcx.md) — OR with complement; sister `c` form. +- [`eqvx`](eqvx.md) — `~(RS ^ RB)` (NXOR / equivalence). +- [`norx`](norx.md) — NOR. + +## IBM Reference + +- [AIX 7.3 — `andc` (AND with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-andc-complement-instruction) diff --git a/migration/project-root/ppc-manual/alu/andisx.md b/migration/project-root/ppc-manual/alu/andisx.md new file mode 100644 index 0000000..f57fd43 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/andisx.md @@ -0,0 +1,127 @@ +# `andis.` — AND Immediate Shifted + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x74000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `andis.` | `andis.` | — | AND Immediate Shifted | + +## Syntax + +```asm +andis. [RA], [RS], [UIMM] +``` + +## Encoding + +### `andis.` — form `D` + +- **Opcode word:** `0x74000000` +- **Primary opcode (bits 0–5):** `29` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | andis.: read | Source GPR (alias for RD in some stores). | +| `UIMM` | andis.: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | andis.: write | Source GPR (`r0`–`r31`). | +| `CR` | andis.: write | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `andis.` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `andis.`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`andis.`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="andis."`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:665`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L665) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:9`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L9) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:352`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L352) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:505-511`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L505-L511) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::andisx => { + // PPCBUG-023: 32-bit ABI CR0 view. `andis. rA, rS, 0x8000` to test + // sign bit of a 32-bit word now correctly classifies bit 31 = 1 as LT. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] & ((instr.uimm16() as u64) << 16); + ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Always `Rc=1`.** Like [`andix`](andix.md), the dot is part of the mnemonic; no plain `andis` exists. +- **Immediate is shifted left 16, zero-extended.** Effective mask is `(UIMM << 16) & 0xFFFFFFFF`, so the only bits that can survive in `RA` are bits 32–47 (in PowerISA bit numbering, equivalent to bits 16–31 of the low 32 bits) of `RS`. Bits 0–31 and bits 48–63 of `RA` are forced to zero. +- **Together with `andi.` covers the entire low 32 bits.** Any 32-bit mask can be applied with `andis. + andi.` (two instructions). Larger masks need `rlwinm` or a constructed register operand to [`andx`](andx.md). +- **High 32 bits of result are always zero.** Because the immediate is at bits 32–47, no information from `RS[0:31]` survives. Useful as a quick "extract bits 32–47, zero the rest" primitive. +- **CR0 update is unconditional** and uses the standard signed-compare-to-zero semantics with `XER[SO]` folded into `SO`. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** The `result as i32 as i64` truncation in [`interpreter.rs:326`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L326) is harmless: the result is bounded by `0x00000000_FFFF0000`, which fits the 32-bit window exactly. + +## Related Instructions + +- [`andix`](andix.md) — companion (immediate not shifted). +- [`andx`](andx.md), [`andcx`](andcx.md) — register AND. +- [`oris`](oris.md), [`xoris`](xoris.md) — sister immediate-shifted logicals. +- [`rlwinmx`](rlwinmx.md) — full mask-and-rotate when the bits of interest aren't aligned to a 16-bit boundary. + +## IBM Reference + +- [AIX 7.3 — `andis.` (AND Immediate Shifted)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-andis-immediate-shifted-instruction) diff --git a/migration/project-root/ppc-manual/alu/andix.md b/migration/project-root/ppc-manual/alu/andix.md new file mode 100644 index 0000000..0b40aa2 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/andix.md @@ -0,0 +1,126 @@ +# `andi.` — AND Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x70000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `andi.` | `andi.` | — | AND Immediate | + +## Syntax + +```asm +andi. [RA], [RS], [UIMM] +``` + +## Encoding + +### `andi.` — form `D` + +- **Opcode word:** `0x70000000` +- **Primary opcode (bits 0–5):** `28` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | andi.: read | Source GPR (alias for RD in some stores). | +| `UIMM` | andi.: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | andi.: write | Source GPR (`r0`–`r31`). | +| `CR` | andi.: write | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `andi.` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `andi.`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`andi.`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="andi."`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:657`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L657) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:9`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L9) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:351`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L351) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:499-504`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L499-L504) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::andix => { + // PPCBUG-020: 32-bit ABI CR0 view. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] & (instr.uimm16() as u64); + ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Always `Rc=1`.** There is no `andi` (without the dot). The mnemonic is `andi.` and the encoding always updates `CR0`. If you need a non-record AND-with-immediate, you have to materialise the immediate first (e.g. with `li`/`lis`) and use [`andx`](andx.md). +- **Immediate is zero-extended.** The 16-bit `UIMM` is widened with zeros, so `andi. rA, rS, 0xFFFF` masks `rS` to its low 16 bits — the high 48 bits of the 64-bit register are forced to zero. +- **Cannot mask the high half of a register in one instruction.** The immediate covers bits 48–63 only; for higher bits use [`andisx`](andisx.md) (covers bits 32–47) or compose with `rlwinm`/`rldicl`. +- **CR0 update is unconditional.** This is part of the encoding, not a flag — the primary opcode (28) *is* `andi.`. +- **Common idiom: `andi. r0, rN, mask`** to test bits without disturbing the source — but note `r0` is overwritten and `CR0` is set. If you only need the CR result, prefer `extrwi`/`rlwinm.` for arbitrary masks. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** Since the AND result has zeros in bits 0–47, the low-32 truncation in [`interpreter.rs:321`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L321) is harmless here — the result fits in 16 bits, so spec and xenia agree. + +## Related Instructions + +- [`andisx`](andisx.md) — same op with immediate shifted left 16 (covers bits 32–47). +- [`andx`](andx.md), [`andcx`](andcx.md) — register AND family. +- [`ori`](ori.md), [`oris`](oris.md), [`xori`](xori.md), [`xoris`](xoris.md) — sister immediate logicals (notably *without* a record form). +- [`rlwinmx`](rlwinmx.md) — for masks that don't fit into a 16-bit immediate. + +## IBM Reference + +- [AIX 7.3 — `andi.` (AND Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-andi-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/andx.md b/migration/project-root/ppc-manual/alu/andx.md new file mode 100644 index 0000000..84936a9 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/andx.md @@ -0,0 +1,122 @@ +# `andx` — AND + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000038` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `and` | `andx` | — | AND | +| `and.` | `andx` | Rc=1 | AND | + +## Syntax + +```asm +and[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `andx` — form `X` + +- **Opcode word:** `0x7c000038` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `28` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | andx: read | Source GPR (alias for RD in some stores). | +| `RB` | andx: read | Source GPR. | +| `RA` | andx: write | Source GPR (`r0`–`r31`). | +| `CR` | andx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `andx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `andx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- (RS) & (RB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`andx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="andx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:637`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L637) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:9`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L9) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:760`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L760) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:528-533`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L528-L533) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::andx => { + // PPCBUG-032+020: 32-bit ABI CR0 view (latent under clean inputs). + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] & ctx.gpr[instr.rb()]; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operand convention is reversed.** Unlike the arithmetic XO-form (`add RT, RA, RB`), the logical X-form writes `RA` and reads `RS`/`RB`: `and RA, RS, RB`. The destination is the **second** operand encoded. This convention applies to the entire and/or/xor family; mixing them up is a frequent disassembly error. +- **No `OE`, no `XER[CA]`, no `XER[OV]`.** Logical operations never affect XER. Only `Rc=1` updates `CR0` (signed compare against zero, with `SO ← XER[SO]`). +- **64-bit AND on Xenon.** Both inputs are 64-bit GPRs; the result is the bitwise AND of all 64 bits. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** The interpreter's `Rc=1` path in [`interpreter.rs:347`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L347) compares `result as i32 as i64`. For an AND whose high 32 bits are non-zero but low 32 bits are zero (e.g. `r3 = 0x1_0000_0000`, `and. r4, r3, r3`), spec sets CR0 to GT but xenia would set EQ. Flag this if reproducing CR-sensitive behaviour. +- **Operand aliasing.** `and RA, RA, RA` is a no-op except for the optional CR0 update — this is the canonical "test register against zero" pattern when no `cmpwi` is desired (though `cmpwi` is more typical). +- **No simplified mnemonic for AND-immediate.** Use [`andix`](andix.md) (`andi.`) or [`andisx`](andisx.md) (`andis.`) for immediate operands; both are *always* record forms (no plain `andi`). + +## Related Instructions + +- [`andcx`](andcx.md) — AND with complement: `RA ← RS & ~RB`. +- [`andix`](andix.md), [`andisx`](andisx.md) — D-form AND immediate (always `Rc=1`). +- [`nandx`](nandx.md) — NAND. +- [`orx`](orx.md), [`orcx`](orcx.md), [`xorx`](xorx.md), [`eqvx`](eqvx.md), [`norx`](norx.md) — sister logical instructions. +- [`cmp`](cmp.md), [`cmpi`](cmpi.md) — explicit zero/value test when CR-only effect is wanted without overwriting `RA`. + +## IBM Reference + +- [AIX 7.3 — `and` (AND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-instruction) +- [AIX 7.3 — Reference: PowerPC instruction set](https://www.ibm.com/docs/en/aix/7.3.0?topic=reference-instruction-set) diff --git a/migration/project-root/ppc-manual/alu/cmp.md b/migration/project-root/ppc-manual/alu/cmp.md new file mode 100644 index 0000000..2432066 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cmp.md @@ -0,0 +1,153 @@ +# `cmp` — Compare + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cmp` | `cmp` | — | Compare | + +## Syntax + +```asm +cmp [CRFD], [L], [RA], [RB] +``` + +## Encoding + +### `cmp` — form `X` + +- **Opcode word:** `0x7c000000` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `L` | cmp: read | Operand-length bit for compare instructions (`0 ⇒ 32-bit`, `1 ⇒ 64-bit`). | +| `RA` | cmp: read | Source GPR (`r0`–`r31`). | +| `RB` | cmp: read | Source GPR. | +| `CRFD` | cmp: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `cmp` + +- **Reads (always):** `L`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if L = 0 then a,b <- EXTS((RA)[32:63]), EXTS((RB)[32:63]) +else a,b <- (RA), (RB) +CR[BF] <- signed_compare(a, b) || XER[SO] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cmp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cmp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:523`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L523) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:13`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L13) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:749`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L749) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:863-885`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L863-L885) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cmp => { + let bf = instr.crfd(); + if instr.l() { + let ra = ctx.gpr[instr.ra()] as i64; + let rb = ctx.gpr[instr.rb()] as i64; + ctx.cr[bf] = crate::context::CrField { + lt: ra < rb, + gt: ra > rb, + eq: ra == rb, + so: ctx.xer_so != 0, + }; + } else { + let ra = ctx.gpr[instr.ra()] as i32; + let rb = ctx.gpr[instr.rb()] as i32; + ctx.cr[bf] = crate::context::CrField { + lt: ra < rb, + gt: ra > rb, + eq: ra == rb, + so: ctx.xer_so != 0, + }; + } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +if L = 0 then ; 32-bit compare + a <- EXTS((RA)[32:63]) ; sign-extend low word to 64 + b <- EXTS((RB)[32:63]) +else ; 64-bit compare + a <- (RA) + b <- (RB) +CR[BF] <- { LT: a s b, EQ: a = b, SO: XER[SO] } ; signed +``` + +## Special Cases & Edge Conditions + +- **`BF` is a CR field (0–7), not a bit.** The `crfD` operand encodes which of the eight 4-bit CR fields is updated. Assemblers write it as `crN` where `N ∈ 0..7`. The simplified mnemonic `cmpw RA, RB` ≡ `cmp cr0, 0, RA, RB` is universal in Xbox 360 code. +- **`L` bit selects width.** `L = 0` (the usual `cmpw` / `cmpd`-is-rare path) performs a *32-bit* signed compare of `RA[32:63]` and `RB[32:63]`, both sign-extended to 64 bits. `L = 1` (`cmpd`) performs a full 64-bit signed compare. +- **Signed.** Use [`cmpl`](cmpl.md) / [`cmpli`](cmpli.md) for unsigned comparisons. Confusing signed/unsigned is the most common compare-family bug in hand-written asm. +- **SO is always copied from `XER[SO]`.** This makes overflow observable across arithmetic/compare sequences: an `addo.` followed by `beq` can branch on the record-form flag while `bso` can inspect the sticky overflow. +- **`cr0` is the default for record-form ALU**; by convention assemblers and generators reserve `cr0` for the chain of `Rc=1` instructions and use `cr1..cr7` (or `cmp` to an explicit field) for standalone compares. Don't assume `cmp` writes `cr0` unless the `BF` operand says so. +- **No register is written** beyond the 4-bit CR field. `cmp` has no `Rc` or `OE` bit. +- **Xenia-rs quirk.** The interpreter recomputes `EQ` after the signed compare to guard against a subtract-cancellation edge case; this is a defensive belt-and-braces against the 32-bit narrowing path. Functionally equivalent to the spec. + +## Related Instructions + +- [`cmpi`](cmpi.md) — signed compare against a 16-bit immediate. +- [`cmpl`](cmpl.md), [`cmpli`](cmpli.md) — unsigned versions. +- [`cmpw`](cmp.md), [`cmpd`](cmp.md) — simplified mnemonics selecting `L`. +- [`mcrxr`](mcrxr.md) — move `XER[SO..CA]` into a CR field and clear them; used to reset sticky overflow. +- Every `Rc=1` ALU instruction ([`addx`](addx.md), [`subfx`](subfx.md), [`andx`](andx.md), …) — these implicitly perform a signed-compare-to-zero into `cr0`; use explicit `cmp` only when comparing two non-zero values or using a non-zero CR field. + +## IBM Reference + +- [AIX 7.3 — `cmp` (Compare)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cmp-compare-instruction) +- [AIX 7.3 — `cmpw` / `cmpd` (simplified mnemonics)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-cmpw-compare-word) diff --git a/migration/project-root/ppc-manual/alu/cmpi.md b/migration/project-root/ppc-manual/alu/cmpi.md new file mode 100644 index 0000000..b74c758 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cmpi.md @@ -0,0 +1,143 @@ +# `cmpi` — Compare Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x2c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cmpi` | `cmpi` | — | Compare Immediate | + +## Syntax + +```asm +cmpi [CRFD], [L], [RA], [SIMM] +``` + +## Encoding + +### `cmpi` — form `D` + +- **Opcode word:** `0x2c000000` +- **Primary opcode (bits 0–5):** `11` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `L` | cmpi: read | Operand-length bit for compare instructions (`0 ⇒ 32-bit`, `1 ⇒ 64-bit`). | +| `RA` | cmpi: read | Source GPR (`r0`–`r31`). | +| `SIMM` | cmpi: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | cmpi: write | Destination GPR. | +| `CRFD` | cmpi: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `cmpi` + +- **Reads (always):** `L`, `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if L = 0 then a,b <- EXTS((RA)[32:63]), EXTS(SIMM) +else a,b <- (RA), EXTS(SIMM) +CR[BF] <- signed_compare(a, b) || XER[SO] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cmpi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cmpi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:552`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L552) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:13`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L13) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:335`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L335) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:824-849`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L824-L849) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cmpi => { + let bf = instr.crfd(); + if instr.l() { + // 64-bit compare. Compare directly so boundary i64 values + // (e.g. ra=i64::MIN, imm=1) don't mis-sign through a + // wrapped subtract. + let ra = ctx.gpr[instr.ra()] as i64; + let imm = instr.simm16() as i64; + ctx.cr[bf] = crate::context::CrField { + lt: ra < imm, + gt: ra > imm, + eq: ra == imm, + so: ctx.xer_so != 0, + }; + } else { + let ra = ctx.gpr[instr.ra()] as i32; + let imm = instr.simm16() as i32; + ctx.cr[bf] = crate::context::CrField { + lt: ra < imm, + gt: ra > imm, + eq: ra == imm, + so: ctx.xer_so != 0, + }; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Immediate is sign-extended.** `SIMM` is treated as a signed 16-bit value in the range `[-32768, 32767]` and sign-extended to the operand width. Use [`cmpli`](cmpli.md) for unsigned comparisons against a 16-bit value. +- **`L` bit selects width.** `L = 0` (the usual `cmpwi`) compares the low 32 bits of `RA` (sign-extended) against `EXTS(SIMM)`; `L = 1` (`cmpdi`) does a full 64-bit signed compare. Most Xbox 360 code uses `cmpwi` because pointers and counters are 32-bit ABI. +- **Simplified mnemonics dominate disassembly.** `cmpwi crN, RA, SIMM` ≡ `cmpi crN, 0, RA, SIMM` and `cmpdi crN, RA, SIMM` ≡ `cmpi crN, 1, RA, SIMM`. The default CR field is `cr0` if omitted. +- **`BF` is a CR field (0–7), not a bit.** Same convention as [`cmp`](cmp.md). Distinct standalone compares should target `cr1..cr7` to avoid clobbering the implicit `cr0` chain set up by `Rc=1` arithmetic. +- **SO is copied from `XER[SO]`.** This makes overflow observable downstream of an `addo.` / `mulo.` etc. via `bso`/`bns`. +- **Xenia-rs quirk.** The interpreter recomputes `EQ` after the signed subtract, defending against the same 32-bit narrowing edge case noted in [`cmp`](cmp.md). Functionally equivalent to spec. +- **No register written** other than the 4-bit CR field — there is no `Rc` or `OE` bit. + +## Related Instructions + +- [`cmp`](cmp.md) — register-register signed compare. +- [`cmpli`](cmpli.md) — unsigned compare against immediate (zero-extended). +- [`cmpl`](cmpl.md) — unsigned register compare. +- `cmpwi`, `cmpdi` (simplified mnemonics) — select `L=0` / `L=1`. +- [`mcrxr`](mcrxr.md) — clear sticky overflow before a fresh compare sequence. + +## IBM Reference + +- [AIX 7.3 — `cmpi` (Compare Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cmpi-compare-immediate-instruction) +- [AIX 7.3 — `cmpwi` / `cmpdi` (simplified mnemonics)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-cmpwi-compare-word-immediate) diff --git a/migration/project-root/ppc-manual/alu/cmpl.md b/migration/project-root/ppc-manual/alu/cmpl.md new file mode 100644 index 0000000..82ee3a5 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cmpl.md @@ -0,0 +1,127 @@ +# `cmpl` — Compare Logical + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000040` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cmpl` | `cmpl` | — | Compare Logical | + +## Syntax + +```asm +cmpl [CRFD], [L], [RA], [RB] +``` + +## Encoding + +### `cmpl` — form `X` + +- **Opcode word:** `0x7c000040` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `32` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `L` | cmpl: read | Operand-length bit for compare instructions (`0 ⇒ 32-bit`, `1 ⇒ 64-bit`). | +| `RA` | cmpl: read | Source GPR (`r0`–`r31`). | +| `RB` | cmpl: read | Source GPR. | +| `CRFD` | cmpl: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `cmpl` + +- **Reads (always):** `L`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if L = 0 then a,b <- (RA)[32:63], (RB)[32:63] +else a,b <- (RA), (RB) +CR[BF] <- unsigned_compare(a, b) || XER[SO] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cmpl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cmpl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:579`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L579) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:13`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L13) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:761`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L761) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:886-894`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L886-L894) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cmpl => { + let bf = instr.crfd(); + if instr.l() { + ctx.update_cr_unsigned(bf, ctx.gpr[instr.ra()], ctx.gpr[instr.rb()]); + } else { + ctx.update_cr_unsigned(bf, ctx.gpr[instr.ra()] as u32 as u64, ctx.gpr[instr.rb()] as u32 as u64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned compare.** Treats both operands as unsigned magnitudes. The simplified mnemonics are `cmplw` (`L=0`) and `cmpld` (`L=1`). +- **`L = 0`: 32-bit operands.** Xenia narrows both registers via `as u32 as u64` so the high 32 bits of `RA`/`RB` are ignored — this matches spec `(RA)[32:63]` semantics. Most Xbox 360 code uses this mode. +- **`L = 1`: full 64-bit unsigned compare.** Used in 64-bit pointer arithmetic; rare in game code but appears in kernel-side helpers. +- **SO is copied from `XER[SO]`.** `cmpl` does not clear or set sticky overflow; it just exposes the current `SO` in the destination CR field's `SO` slot. +- **`BF` is a CR field 0–7.** Same convention as [`cmp`](cmp.md). Two consecutive `cmpl` instructions with the same `BF` simply overwrite the previous result. +- **Common signed/unsigned bug.** Misusing `cmp` instead of `cmpl` (or vice versa) for pointer comparisons is the canonical bug in PPC porting; pointers are always unsigned in C semantics. Always cross-check the comparison polarity in disassembly. +- **No `Rc`/`OE`** and no GPR write — purely a CR-field producer. + +## Related Instructions + +- [`cmp`](cmp.md) — signed register compare. +- [`cmpli`](cmpli.md) — unsigned compare against a 16-bit immediate. +- [`cmpi`](cmpi.md) — signed immediate compare. +- `cmplw`, `cmpld` (simplified) — preferred forms in disassembly. +- [`mcrxr`](mcrxr.md) — clear sticky overflow. + +## IBM Reference + +- [AIX 7.3 — `cmpl` (Compare Logical)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cmpl-compare-logical-instruction) +- [AIX 7.3 — `cmplw` / `cmpld` (simplified mnemonics)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-cmplw-compare-logical-word) diff --git a/migration/project-root/ppc-manual/alu/cmpli.md b/migration/project-root/ppc-manual/alu/cmpli.md new file mode 100644 index 0000000..526a675 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cmpli.md @@ -0,0 +1,127 @@ +# `cmpli` — Compare Logical Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x28000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cmpli` | `cmpli` | — | Compare Logical Immediate | + +## Syntax + +```asm +cmpli [CRFD], [L], [RA], [UIMM] +``` + +## Encoding + +### `cmpli` — form `D` + +- **Opcode word:** `0x28000000` +- **Primary opcode (bits 0–5):** `10` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `L` | cmpli: read | Operand-length bit for compare instructions (`0 ⇒ 32-bit`, `1 ⇒ 64-bit`). | +| `RA` | cmpli: read | Source GPR (`r0`–`r31`). | +| `UIMM` | cmpli: read | 16-bit unsigned immediate. Zero-extended. | +| `CRFD` | cmpli: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `cmpli` + +- **Reads (always):** `L`, `RA`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if L = 0 then a,b <- (RA)[32:63], UIMM +else a,b <- (RA), (0 || UIMM) +CR[BF] <- unsigned_compare(a, b) || XER[SO] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cmpli`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cmpli"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:608`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L608) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:13`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L13) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:334`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L334) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:850-862`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L850-L862) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cmpli => { + let bf = instr.crfd(); + if instr.l() { + let ra = ctx.gpr[instr.ra()]; + let imm = instr.uimm16() as u64; + ctx.update_cr_unsigned(bf, ra, imm); + } else { + let ra = ctx.gpr[instr.ra()] as u32 as u64; + let imm = instr.uimm16() as u64; + ctx.update_cr_unsigned(bf, ra, imm); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Immediate is zero-extended.** `UIMM` is a 16-bit value extended with zeros, so the comparable range is `[0, 65535]`. To compare against a value with high bits set, materialise it in a register with `lis`/`ori` and use [`cmpl`](cmpl.md). +- **`L` bit selects width.** `L = 0` (`cmplwi`) zero-extends `RA[32:63]` to 64 bits and compares against the 16-bit immediate (also zero-extended). `L = 1` (`cmpldi`) compares the full 64-bit `RA` against the immediate. +- **Simplified mnemonics dominate.** `cmplwi cr0, RA, UIMM` ≡ `cmpli cr0, 0, RA, UIMM`; the assembler injects the `L` bit automatically. +- **No sign-extension surprises.** Unlike [`cmpi`](cmpi.md), the immediate cannot be negative; `cmpli` always tests an unsigned magnitude. +- **Common idiom: `cmplwi rA, 0`** to test a register for zero — slightly clearer in disassembly than `cmpwi rA, 0` because it doesn't suggest signed semantics. Both produce the same `EQ` result for a zero argument. +- **`BF` chooses one of 8 CR fields**; same convention as `cmp`. + +## Related Instructions + +- [`cmpl`](cmpl.md) — register-register unsigned compare. +- [`cmpi`](cmpi.md) — signed compare against a 16-bit immediate. +- [`cmp`](cmp.md) — register-register signed compare. +- `cmplwi`, `cmpldi` (simplified) — most common form seen in disassembly. + +## IBM Reference + +- [AIX 7.3 — `cmpli` (Compare Logical Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cmpli-compare-logical-immediate-instruction) +- [AIX 7.3 — `cmplwi` / `cmpldi` (simplified mnemonics)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-cmplwi-compare-logical-word-immediate) diff --git a/migration/project-root/ppc-manual/alu/cntlzdx.md b/migration/project-root/ppc-manual/alu/cntlzdx.md new file mode 100644 index 0000000..68f3767 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cntlzdx.md @@ -0,0 +1,119 @@ +# `cntlzdx` — Count Leading Zeros Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000074` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cntlzd` | `cntlzdx` | — | Count Leading Zeros Doubleword | +| `cntlzd.` | `cntlzdx` | Rc=1 | Count Leading Zeros Doubleword | + +## Syntax + +```asm +cntlzd [RA], [RS] +``` + +## Encoding + +### `cntlzdx` — form `X` + +- **Opcode word:** `0x7c000074` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `58` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | cntlzdx: read | Source GPR (alias for RD in some stores). | +| `RA` | cntlzdx: write | Source GPR (`r0`–`r31`). | +| `CR` | cntlzdx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `cntlzdx` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `cntlzdx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- number_of_leading_zero_bits((RS)) ; n in 0..64 +RA <- zero_extend(n) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cntlzdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cntlzdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:674`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L674) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:15`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L15) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:767`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L767) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:615-619`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L615-L619) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cntlzdx => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()].leading_zeros() as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Result range is `0..=64`.** When `RS == 0`, every bit is a leading zero and `RA = 64`. When the high bit (`RS[0]`) is set, `RA = 0`. Any intermediate value yields the count of high-order zeros before the first 1-bit. +- **Counts across the full 64 bits.** Use [`cntlzwx`](cntlzwx.md) when you only want to count the low 32 bits. +- **Useful as `floor(log2(x)) = 63 − cntlzd(x)`** for nonzero `x`. Frequently used in fast normalization, priority encoders, and bit-vector operations. +- **`RB` field unused.** This is X-form but only `RS` is read; `RB` is a placeholder slot. +- **`Rc=1` quirk.** `update_cr_signed(0, RA as i64)` is correct in xenia-rs because the result fits in 7 bits and is non-negative. The CR0 result will always be `EQ` (when `RS != 0` and `RA != 0`? — actually `EQ` only when `RS[0] == 1`, i.e. `RA == 0`) or `GT` (when `RS != 0` so `RA > 0`); never `LT`. `EQ` corresponds to "high bit set in `RS`", a useful one-instruction sign test for negative-as-signed values. +- **No `XER` side effects.** Counts neither overflow nor carry. + +## Related Instructions + +- [`cntlzwx`](cntlzwx.md) — 32-bit version (counts only `RS[32:63]`). +- [`sldx`](sldx.md), [`srdx`](srdx.md) — pair with `cntlzd` for normalisation. +- [`rldiclx`](rldiclx.md) — for extracting individual bit positions once the leading-zero count is known. +- [`cmpi`](cmpi.md) / [`cmp`](cmp.md) — alternative for testing the high bit. + +## IBM Reference + +- [AIX 7.3 — `cntlzd` (Count Leading Zeros Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cntlzd-count-leading-zeros-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/cntlzwx.md b/migration/project-root/ppc-manual/alu/cntlzwx.md new file mode 100644 index 0000000..d764154 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/cntlzwx.md @@ -0,0 +1,121 @@ +# `cntlzwx` — Count Leading Zeros Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000034` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cntlzw` | `cntlzwx` | — | Count Leading Zeros Word | +| `cntlzw.` | `cntlzwx` | Rc=1 | Count Leading Zeros Word | + +## Syntax + +```asm +cntlzw[Rc] [RA], [RS] +``` + +## Encoding + +### `cntlzwx` — form `X` + +- **Opcode word:** `0x7c000034` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `26` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | cntlzwx: read | Source GPR (alias for RD in some stores). | +| `RA` | cntlzwx: write | Source GPR (`r0`–`r31`). | +| `CR` | cntlzwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `cntlzwx` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `cntlzwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- number_of_leading_zero_bits((RS)[32:63]) ; n in 0..32 +RA <- zero_extend(n) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cntlzwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cntlzwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:689`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L689) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:15`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L15) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:758`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L758) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:608-614`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L608-L614) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::cntlzwx => { + // Result is 0..=32, fits in u32 with bit 31 always zero, so the + // CR0 view is benign — use the catch-all 32-bit form for consistency. + ctx.gpr[instr.ra()] = (ctx.gpr[instr.rs()] as u32).leading_zeros() as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operates only on the low 32 bits.** `RS[0:31]` is ignored; the count is taken on `RS[32:63]`. Result range is `0..=32`. +- **`RA = 32` when the low 32 bits are zero**, regardless of the high 32 bits. A common pitfall: `cntlzw` after computing a 64-bit value can give a counter-intuitive result when the leading 1-bit lives in the high half. +- **`RA = 0` when bit 32 (the sign bit of the low word) is set.** This makes `cntlzw RA, RS; cmpwi RA, 0` a one-instruction-pair "is the low half negative" test, though `srawi RA, RS, 31` is more idiomatic. +- **High 32 bits of the result are zero.** `RA[0:31] = 0`, `RA[32:63] = count`. +- **`Rc=1` CR0 update is small-positive-only.** Result fits in 6 bits; xenia's `as i32 as i64` truncation in [`interpreter.rs:404`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L404) is harmless. CR0 will be `EQ` only when `RS[32]` (sign bit of low word) is 1, otherwise `GT`. +- **Useful for `floor(log2)` of a 32-bit value.** `31 - cntlzw(x)` for nonzero `x`. + +## Related Instructions + +- [`cntlzdx`](cntlzdx.md) — 64-bit version (counts the full register). +- [`slwx`](slwx.md), [`srwx`](srwx.md), [`srawx`](srawx.md) — 32-bit shifts often paired with `cntlzw` for normalisation. +- [`rlwinmx`](rlwinmx.md) — to mask off bits before counting. +- [`cmpi`](cmpi.md), [`cmpl`](cmpl.md) — alternative ways to detect zero or sign. + +## IBM Reference + +- [AIX 7.3 — `cntlzw` (Count Leading Zeros Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cntlzw-count-leading-zeros-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/divdux.md b/migration/project-root/ppc-manual/alu/divdux.md new file mode 100644 index 0000000..1ea26a1 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/divdux.md @@ -0,0 +1,135 @@ +# `divdux` — Divide Doubleword Unsigned + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000392` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `divdu` | `divdux` | — | Divide Doubleword Unsigned | +| `divduo` | `divdux` | OE=1 | Divide Doubleword Unsigned | +| `divdu.` | `divdux` | Rc=1 | Divide Doubleword Unsigned | +| `divduo.` | `divdux` | OE=1, Rc=1 | Divide Doubleword Unsigned | + +## Syntax + +```asm +divdu[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `divdux` — form `XO` + +- **Opcode word:** `0x7c000392` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `457` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | divdux: read | Source GPR (`r0`–`r31`). | +| `RB` | divdux: read | Source GPR. | +| `RD` | divdux: write | Destination GPR. | +| `OE` | divdux: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | divdux: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `divdux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `divdux`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- (RA) /u (RB) ; undefined if RB=0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`divdux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="divdux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:217`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L217) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:21`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L21) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:876`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L876) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:480-496`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L480-L496) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::divdux => { + let ra = ctx.gpr[instr.ra()]; + let rb = ctx.gpr[instr.rb()]; + let ov = overflow::divd_ov_unsigned(rb); + if ov { + ctx.gpr[instr.rd()] = 0; + } else { + ctx.gpr[instr.rd()] = ra / rb; + } + if instr.oe() { + overflow::apply(ctx, ov); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single undefined case.** Division by zero (`RB == 0`). There is no `INT_MIN/−1` overflow because both operands are unsigned. Xenia-rs returns 0 for the divide-by-zero case ([`interpreter.rs:306`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L306)); spec leaves `RT` boundedly undefined. +- **No trap on Xenon.** As with [`divdx`](divdx.md), the processor does not raise an exception; consuming code must guard `RB` first (typically `cmpdi rb, 0; beq skip`). +- **`OE=1` should set `XER[OV]`** on `RB == 0`; xenia-rs ignores `OE` here. +- **`Rc=1` CR0 update is correctly 64-bit.** [`interpreter.rs:311`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L311) uses `as i64` directly, so the CR0 sign comparison reflects the full 64-bit unsigned quotient cast to signed. For very large unsigned quotients (`> INT64_MAX`) this CR0 will report `LT` even though the unsigned interpretation is positive — a rare but real source of CR-misuse bugs. +- **Slow.** Same ~70-cycle non-pipelined cost as the signed variant; consider reciprocal multiply for hot loops. +- **Truncating quotient.** Same C-style toward-zero rounding (trivially equal to floor for unsigned). + +## Related Instructions + +- [`divdx`](divdx.md) — signed 64-bit divide. +- [`divwux`](divwux.md), [`divwx`](divwx.md) — 32-bit unsigned/signed. +- [`mulldx`](mulldx.md), [`mulhdux`](mulhdux.md) — multiply pair for remainder calculation. +- [`cmpli`](cmpli.md), [`cmpl`](cmpl.md) — guard the divisor. + +## IBM Reference + +- [AIX 7.3 — `divdu` (Divide Doubleword Unsigned)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-divdu-divide-double-word-unsigned-instruction) diff --git a/migration/project-root/ppc-manual/alu/divdx.md b/migration/project-root/ppc-manual/alu/divdx.md new file mode 100644 index 0000000..581b428 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/divdx.md @@ -0,0 +1,136 @@ +# `divdx` — Divide Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0003d2` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `divd` | `divdx` | — | Divide Doubleword | +| `divdo` | `divdx` | OE=1 | Divide Doubleword | +| `divd.` | `divdx` | Rc=1 | Divide Doubleword | +| `divdo.` | `divdx` | OE=1, Rc=1 | Divide Doubleword | + +## Syntax + +```asm +divd[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `divdx` — form `XO` + +- **Opcode word:** `0x7c0003d2` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `489` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | divdx: read | Source GPR (`r0`–`r31`). | +| `RB` | divdx: read | Source GPR. | +| `RD` | divdx: write | Destination GPR. | +| `OE` | divdx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | divdx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `divdx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `divdx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- (RA) /s (RB) ; undefined if RB=0 or (RA=-2^63 and RB=-1) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`divdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="divdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:192`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L192) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:21`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L21) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:878`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L878) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:463-479`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L463-L479) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::divdx => { + let ra = ctx.gpr[instr.ra()] as i64; + let rb = ctx.gpr[instr.rb()] as i64; + let ov = overflow::divd_ov_signed(ra, rb); + if ov { + ctx.gpr[instr.rd()] = 0; + } else { + ctx.gpr[instr.rd()] = (ra / rb) as u64; + } + if instr.oe() { + overflow::apply(ctx, ov); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Two undefined-behaviour cases.** Division by zero (`RB == 0`) and signed-min divided by negative-one (`RA == INT64_MIN && RB == -1`, which would mathematically produce `2^63`, unrepresentable in `i64`). PowerISA leaves `RT` *boundedly undefined* in both cases; **xenia-rs returns 0** ([`interpreter.rs:293`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L293)). Matching this behaviour bit-for-bit is a defacto-spec on Xenon. +- **No exception raised.** Xenon does not trap on either undefined case; the consuming code is expected to have validated `RB` first, e.g. with `cmpdi`/`bne`. If you need a trap, follow the divide with [`tw`](../control/tw.md)/`twi` (these live outside the ALU page set). +- **`OE=1` should set `XER[OV]`** for both undefined cases plus any operand triggering overflow; xenia-rs does not implement the `OE` branch. +- **`Rc=1` CR0 update is correctly 64-bit here.** Unlike most ALU pages, [`interpreter.rs:298`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L298) uses `as i64` (no `as i32` truncation) — divide is one of the few xenia-rs instructions that already matches Xenon spec for the CR0 width. +- **Latency.** Integer divide is the slowest ALU instruction on Xenon — 70+ cycles, non-pipelined. Hot inner loops avoid it via reciprocal-multiply or shift; expect to see `mulhwu`-based reciprocals in optimised disassembly. +- **Rounds toward zero.** The signed quotient truncates toward zero, matching C99/C++11 `/` semantics. Use [`mulldx`](mulldx.md) and a subtract to recover the remainder; there is no `divmod` instruction. + +## Related Instructions + +- [`divdux`](divdux.md) — unsigned 64-bit divide. +- [`divwx`](divwx.md), [`divwux`](divwux.md) — 32-bit signed/unsigned variants. +- [`mulldx`](mulldx.md), [`mulhdx`](mulhdx.md) — multiply pair used to compute the remainder. +- [`cmpi`](cmpi.md), [`cmp`](cmp.md) — guard the divisor before invoking divide. + +## IBM Reference + +- [AIX 7.3 — `divd` (Divide Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-divd-divide-double-word-instruction) +- PowerISA v2.07B, Book I, §3.3.9 — defines the boundedly-undefined behaviour for `RB=0` and `INT_MIN/−1`. diff --git a/migration/project-root/ppc-manual/alu/divwux.md b/migration/project-root/ppc-manual/alu/divwux.md new file mode 100644 index 0000000..713d7f0 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/divwux.md @@ -0,0 +1,137 @@ +# `divwux` — Divide Word Unsigned + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000396` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `divwu` | `divwux` | — | Divide Word Unsigned | +| `divwuo` | `divwux` | OE=1 | Divide Word Unsigned | +| `divwu.` | `divwux` | Rc=1 | Divide Word Unsigned | +| `divwuo.` | `divwux` | OE=1, Rc=1 | Divide Word Unsigned | + +## Syntax + +```asm +divwu[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `divwux` — form `XO` + +- **Opcode word:** `0x7c000396` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `459` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | divwux: read | Source GPR (`r0`–`r31`). | +| `RB` | divwux: read | Source GPR. | +| `RD` | divwux: write | Destination GPR. | +| `OE` | divwux: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | divwux: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `divwux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `divwux`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA)[32:63] /u (RB)[32:63]) zero-extended to 64 ; undefined if RB=0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`divwux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="divwux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:269`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L269) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:21`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L21) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:877`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L877) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:413-430`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L413-L430) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::divwux => { + // PPCBUG-020: 32-bit ABI CR0 view. + let ra = ctx.gpr[instr.ra()] as u32; + let rb = ctx.gpr[instr.rb()] as u32; + let ov = overflow::divw_ov_unsigned(rb); + if ov { + ctx.gpr[instr.rd()] = 0; + } else { + ctx.gpr[instr.rd()] = (ra / rb) as u64; + } + if instr.oe() { + overflow::apply(ctx, ov); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit operands, zero-extended result.** Both `RA` and `RB` are read as their low 32 bits, unsigned (`as u32`); the quotient is computed as `u32`, then *zero-extended* to 64 bits. The high 32 bits of `RA`/`RB` are ignored on input and the high 32 bits of `RT` are zero on output. +- **Single undefined case.** Division by zero (`RB == 0`); xenia-rs returns 0 ([`interpreter.rs:251`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L251)). No `INT_MIN/-1` case because the operands are unsigned. +- **No trap on Xenon.** Same as [`divdx`](divdx.md) — silent undefined result. +- **`OE=1` should set `XER[OV]` on `RB == 0`**; xenia-rs ignores this. +- **`Rc=1` CR0 update.** [`interpreter.rs:256`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L256) uses `as i32 as i64` — for an unsigned 32-bit quotient stored in the low 32 bits with high zeros, this matches spec exactly; the i32 view will be negative iff the unsigned quotient ≥ 2^31. Worth flagging when comparing CR0 against zero after a large `divwu`. +- **Truncating quotient.** Floor division for non-negative integers; matches C `unsigned` semantics. +- **Same slow non-pipelined latency** as `divw`. + +## Related Instructions + +- [`divwx`](divwx.md) — signed 32-bit divide. +- [`divdux`](divdux.md), [`divdx`](divdx.md) — 64-bit variants. +- [`mullwx`](mullwx.md) — pair to recover the remainder. +- [`cmplwi`](cmpli.md) (simplified) — guard the divisor. + +## IBM Reference + +- [AIX 7.3 — `divwu` (Divide Word Unsigned)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-divwu-divide-word-unsigned-instruction) diff --git a/migration/project-root/ppc-manual/alu/divwx.md b/migration/project-root/ppc-manual/alu/divwx.md new file mode 100644 index 0000000..61eea80 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/divwx.md @@ -0,0 +1,138 @@ +# `divwx` — Divide Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0003d6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `divw` | `divwx` | — | Divide Word | +| `divwo` | `divwx` | OE=1 | Divide Word | +| `divw.` | `divwx` | Rc=1 | Divide Word | +| `divwo.` | `divwx` | OE=1, Rc=1 | Divide Word | + +## Syntax + +```asm +divw[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `divwx` — form `XO` + +- **Opcode word:** `0x7c0003d6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `491` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | divwx: read | Source GPR (`r0`–`r31`). | +| `RB` | divwx: read | Source GPR. | +| `RD` | divwx: write | Destination GPR. | +| `OE` | divwx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | +| `CR` | divwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `divwx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `OE`, `CR` + +## Status-Register Effects + +- `divwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA)[32:63] /s (RB)[32:63]) sign-extended to 64 ; undefined if RB=0 or overflow +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`divwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="divwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:242`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L242) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:21`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L21) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:879`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L879) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:394-412`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L394-L412) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::divwx => { + // PPCBUG-010+011 coupled: 32-bit ABI. Quotient zero-extended to u64 + // (canary explicitly uses ZeroExtend(v, INT64_TYPE)). CR0 view via i32. + let ra = ctx.gpr[instr.ra()] as i32; + let rb = ctx.gpr[instr.rb()] as i32; + let ov = overflow::divw_ov_signed(ra, rb); + if ov { + ctx.gpr[instr.rd()] = 0; + } else { + ctx.gpr[instr.rd()] = (ra / rb) as u32 as u64; + } + if instr.oe() { + overflow::apply(ctx, ov); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit operands, sign-extended result.** Both `RA` and `RB` are read as their low 32 bits, signed; the quotient is computed as `i32`, then sign-extended to 64 bits before being stored in `RT`. The high 32 bits of `RA`/`RB` are *ignored*. +- **Two undefined cases:** `RB == 0` and `RA == INT32_MIN && RB == -1` (quotient `2^31` is unrepresentable). Xenia-rs returns 0 for both ([`interpreter.rs:238`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L238)); PowerISA leaves `RT` boundedly undefined. +- **No trap on Xenon.** Like [`divdx`](divdx.md), the processor silently produces an undefined value instead of raising an exception. +- **`OE=1` should set `XER[OV]`** in both undefined cases; xenia-rs does not implement this. +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:243`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L243) uses `as i32 as i64`. This is *correct* for `divw` because the result is already a sign-extended 32-bit value — high bits agree with the low-32 sign extension. So spec and xenia agree for this instruction. +- **Truncating quotient.** Rounds toward zero, matching C `/` for `int32_t`. +- **Slow.** Same ~30-cycle non-pipelined cost as 64-bit divide; faster than `divd` because the underlying datapath is narrower but still much slower than multiply-then-shift reciprocal sequences. + +## Related Instructions + +- [`divwux`](divwux.md) — unsigned 32-bit divide. +- [`divdx`](divdx.md), [`divdux`](divdux.md) — 64-bit variants. +- [`mullwx`](mullwx.md) — pair with subtract to obtain the remainder. +- [`extsw`](extswx.md) — manual sign-extend if you only have a 64-bit value but want 32-bit divide semantics. + +## IBM Reference + +- [AIX 7.3 — `divw` (Divide Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-divw-divide-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/eieio.md b/migration/project-root/ppc-manual/alu/eieio.md new file mode 100644 index 0000000..4e5dac4 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/eieio.md @@ -0,0 +1,111 @@ +# `eieio` — Enforce In-Order Execution of I/O + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0006ac` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `eieio` | `eieio` | — | Enforce In-Order Execution of I/O | + +## Syntax + +```asm +eieio +``` + +## Encoding + +### `eieio` — form `X` + +- **Opcode word:** `0x7c0006ac` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `854` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `eieio` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +enforce in-order execution of I/O +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`eieio`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="eieio"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:749`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L749) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:23`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L23) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:844`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L844) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1691-1693`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1691-L1693) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sync | PpcOpcode::eieio | PpcOpcode::isync => { + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Memory-ordering barrier for caching-inhibited / guarded storage.** `eieio` ensures all preceding loads/stores to caching-inhibited or guarded memory complete before any subsequent such accesses begin. It is *weaker* than [`sync`](sync.md): it does not order cacheable storage and does not flush the store queue. +- **No register or CR effects.** Every operand field is unused; assemblers emit the canonical `0x7c0006ac` word. +- **Used at MMIO boundaries.** Driver code touching device registers (e.g. the GPU command processor on Xenon) typically pairs writes with `eieio` to enforce write ordering at the bus. +- **Xenia-rs is a no-op.** The interpreter trivially advances PC ([`interpreter.rs:1267`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1267)). Because xenia-rs is a single-threaded interpreter targeting userland Xbox 360 binaries — which never see real MMIO — this is correct: the host's natural program order suffices. +- **Categorised under ALU here**, but operationally it's a memory ordering primitive (xenia-canary places it in `ppc_emit_memory.cc`). Disassembly tools may bin it differently. +- **Distinct from `sync` and `isync`.** All three share xenia's no-op arm in [`interpreter.rs:1266`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1266); on real hardware they have very different semantics and latencies. + +## Related Instructions + +- [`sync`](sync.md) — heavy memory barrier (orders *all* storage). +- [`isync`](isync.md) — instruction-fetch barrier; refetches and re-executes after the boundary. +- `lwsync` — lighter weight than `sync`; not in this page set. + +## IBM Reference + +- [AIX 7.3 — `eieio` (Enforce In-Order Execution of I/O)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-eieio-enforce-in-order-execution-i-o-instruction) diff --git a/migration/project-root/ppc-manual/alu/eqvx.md b/migration/project-root/ppc-manual/alu/eqvx.md new file mode 100644 index 0000000..a3800fd --- /dev/null +++ b/migration/project-root/ppc-manual/alu/eqvx.md @@ -0,0 +1,122 @@ +# `eqvx` — Equivalent + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000238` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `eqv` | `eqvx` | — | Equivalent | +| `eqv.` | `eqvx` | Rc=1 | Equivalent | + +## Syntax + +```asm +eqv[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `eqvx` — form `X` + +- **Opcode word:** `0x7c000238` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `284` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | eqvx: read | Source GPR (alias for RD in some stores). | +| `RB` | eqvx: read | Source GPR. | +| `RA` | eqvx: write | Source GPR (`r0`–`r31`). | +| `CR` | eqvx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `eqvx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `eqvx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- ~((RS) ^ (RB)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`eqvx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="eqvx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:704`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L704) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:25`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L25) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:796`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L796) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:578-586`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L578-L586) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::eqvx => { + // PPCBUG-031: `eqv rA, rA, rA` is a common "set to all-ones" idiom; + // 64-bit form gave 0xFFFFFFFFFFFFFFFF but 32-bit ABI expects 0x00000000FFFFFFFF. + let rs32 = ctx.gpr[instr.rs()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = (!(rs32 ^ rb32)) as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **NXOR / equivalence.** `RA ← ~(RS XOR RB)`. A bit in `RA` is 1 iff the corresponding bits of `RS` and `RB` are equal. Useful as a per-bit equality test feeding into a `cntlzw` for run-length analysis. +- **Idiom: `eqv RA, RS, RS`** sets every bit to 1 — a one-instruction `RA = -1`. Cheaper than `li RA, -1` followed by `oris`/`ori` for full 64-bit `-1`. +- **Operand convention is X-form** (`RA` is destination; `RS`, `RB` are sources). +- **64-bit operation** on Xenon; `~` is full 64-bit on `u64`. +- **No `OE`, no `XER` side effects.** Only `Rc=1` updates `CR0`. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:382`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L382) truncates with `as i32 as i64`. Significant when the high 32 bits of the result differ from the low 32 — e.g. `eqv. RA, RS, RB` with `RS == 0x1_0000_0000`, `RB == 0`: spec sees `0xFFFFFFFE_FFFFFFFF` (`LT`), xenia sees `0xFFFFFFFFFFFFFFFF` (`LT`) — actually both negative here, but the *exact* CR contents differ for finer cases. + +## Related Instructions + +- [`xorx`](xorx.md) — base XOR (`eqv` is `xor` then `not`). +- [`andx`](andx.md), [`orx`](orx.md), [`nandx`](nandx.md), [`norx`](norx.md), [`andcx`](andcx.md), [`orcx`](orcx.md) — full logical family. +- [`xori`](xori.md), [`xoris`](xoris.md) — immediate XOR (no immediate `eqv` exists). + +## IBM Reference + +- [AIX 7.3 — `eqv` (Equivalent)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-eqv-equivalent-instruction) diff --git a/migration/project-root/ppc-manual/alu/extsbx.md b/migration/project-root/ppc-manual/alu/extsbx.md new file mode 100644 index 0000000..5e736e7 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/extsbx.md @@ -0,0 +1,121 @@ +# `extsbx` — Extend Sign Byte + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000774` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `extsb` | `extsbx` | — | Extend Sign Byte | +| `extsb.` | `extsbx` | Rc=1 | Extend Sign Byte | + +## Syntax + +```asm +extsb[Rc] [RA], [RS] +``` + +## Encoding + +### `extsbx` — form `X` + +- **Opcode word:** `0x7c000774` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `954` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | extsbx: read | Source GPR (alias for RD in some stores). | +| `RA` | extsbx: write | Source GPR (`r0`–`r31`). | +| `CR` | extsbx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `extsbx` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `extsbx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- EXTS_8_to_64((RS)[56:63]) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`extsbx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="extsbx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:714`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L714) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:25`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L25) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:849`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L849) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:589-595`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L589-L595) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::extsbx => { + // PPCBUG-034: 32-bit ABI — sign-extend byte to i32, write zero-extended. + // PPCBUG-036 (coupled): CR0 must view result as i32, not i64. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i8 as i32 as u32 as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extends the low 8 bits of `RS` to 64 bits.** Bit 56 of `RS` (the sign bit of the byte) becomes bits 0–55 of `RA`; bits 56–63 are copied verbatim. +- **Common after a byte load.** `lbz` zero-extends from memory; `extsb` converts the result to a signed-byte view. Many compilers emit this pair; the recent ISA `lba`/`lbau` family is *not* available on the Xenon, so this two-instruction sequence is the canonical pattern. +- **`Rc=1` updates CR0 from the full 64-bit signed value** — but xenia-rs truncates to 32 bits in [`interpreter.rs:384`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L384). For `extsb.` this is harmless because the result fits in 8 bits sign-extended; the low 32 bits already encode the sign correctly. +- **Operand convention** is the X-form one (`RA` destination, `RS` source). Same as the rest of the logical family. +- **No `XER` side effects.** +- **`RB` field unused.** Set to 0 by assemblers; ignored on decode. +- **Aliasing is fine.** `extsb r3, r3` rewrites `r3` in place. + +## Related Instructions + +- [`extshx`](extshx.md) — sign-extend half-word (16 bits). +- [`extswx`](extswx.md) — sign-extend word (32 bits). +- [`rlwinmx`](rlwinmx.md), [`rldiclx`](rldiclx.md) — for *zero*-extending or extracting non-byte-aligned fields. +- `lbz`, `lha` (memory ops, outside this page set) — pair with this for signed byte loads. + +## IBM Reference + +- [AIX 7.3 — `extsb` (Extend Sign Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-extsb-extend-sign-byte-instruction) diff --git a/migration/project-root/ppc-manual/alu/extshx.md b/migration/project-root/ppc-manual/alu/extshx.md new file mode 100644 index 0000000..509cdc8 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/extshx.md @@ -0,0 +1,121 @@ +# `extshx` — Extend Sign Half Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000734` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `extsh` | `extshx` | — | Extend Sign Half Word | +| `extsh.` | `extshx` | Rc=1 | Extend Sign Half Word | + +## Syntax + +```asm +extsh[Rc] [RA], [RS] +``` + +## Encoding + +### `extshx` — form `X` + +- **Opcode word:** `0x7c000734` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `922` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | extshx: read | Source GPR (alias for RD in some stores). | +| `RA` | extshx: write | Source GPR (`r0`–`r31`). | +| `CR` | extshx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `extshx` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `extshx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- EXTS_16_to_64((RS)[48:63]) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`extshx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="extshx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:727`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L727) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:25`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L25) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:847`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L847) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:596-602`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L596-L602) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::extshx => { + // PPCBUG-035: same shape as extsbx for halfwords. + // PPCBUG-037 (coupled): CR0 i32 view. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i16 as i32 as u32 as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extends the low 16 bits of `RS` to 64 bits.** Bit 48 (the sign bit of the half-word) is replicated through bits 0–47 of `RA`. +- **Pairs with `lhz`** to convert an unsigned half-word load into a signed half-word value. Note that `lha` already does the sign extension on load — `extsh` is mostly emitted when the half-word is computed in a register first. +- **`Rc=1` CR0 update.** Xenia-rs uses `as i32 as i64` ([`interpreter.rs:389`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L389)) — harmless here because the sign-extended 16-bit value fits in 32 bits exactly. +- **Operand convention** is the X-form one (`RA` destination, `RS` source). +- **No `XER` side effects.** +- **`RB` field unused.** +- **Aliasing is fine.** `extsh r3, r3` is the standard "promote `r3`'s low 16 bits to a signed 64-bit value" sequence. + +## Related Instructions + +- [`extsbx`](extsbx.md) — sign-extend byte. +- [`extswx`](extswx.md) — sign-extend word (32 bits). +- [`rlwinmx`](rlwinmx.md) — when masking/zero-extending without sign-extension. +- `lha` (memory op, outside this set) — combined load + sign-extend. + +## IBM Reference + +- [AIX 7.3 — `extsh` (Extend Sign Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-extsh-extend-sign-half-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/extswx.md b/migration/project-root/ppc-manual/alu/extswx.md new file mode 100644 index 0000000..18aaba2 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/extswx.md @@ -0,0 +1,120 @@ +# `extswx` — Extend Sign Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0007b4` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `extsw` | `extswx` | — | Extend Sign Word | +| `extsw.` | `extswx` | Rc=1 | Extend Sign Word | + +## Syntax + +```asm +extsw[Rc] [RA], [RS] +``` + +## Encoding + +### `extswx` — form `X` + +- **Opcode word:** `0x7c0007b4` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `986` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | extswx: read | Source GPR (alias for RD in some stores). | +| `RA` | extswx: write | Source GPR (`r0`–`r31`). | +| `CR` | extswx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `extswx` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `extswx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- EXTS_32_to_64((RS)[32:63]) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`extswx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="extswx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:740`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L740) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:25`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L25) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:852`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L852) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:603-607`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L603-L607) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::extswx => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] as i32 as i64 as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extends the low 32 bits of `RS` to 64 bits.** Bit 32 (sign bit of the word) is replicated through bits 0–31 of `RA`. +- **Used heavily in 32-to-64-bit promotion.** Most Xbox 360 ABI parameters are 32-bit; promoting a 32-bit `int` to a 64-bit GPR requires this instruction. Many functions emit it on entry to canonicalise their argument registers. +- **`Rc=1` CR0 update is correctly 64-bit.** [`interpreter.rs:399`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L399) uses `as i64` (no truncation) — one of the few xenia-rs sites where the spec width is honoured. The signed compare in CR0 reflects the full sign-extended value. +- **Operand convention** is the X-form one (`RA` destination, `RS` source). +- **No `XER` side effects.** +- **`RB` field unused.** +- **Aliasing is fine.** `extsw r3, r3` is the canonical "sign-extend in place" idiom. +- **Distinct from `srawi RA, RS, 31`**, which produces the *sign mask* (`-1` if negative else `0`) rather than the sign-extended value. + +## Related Instructions + +- [`extsbx`](extsbx.md), [`extshx`](extshx.md) — narrower sign extensions. +- [`srawix`](srawix.md) — to derive a sign mask instead. +- [`rldiclx`](rldiclx.md) — to *zero*-extend the low 32 bits. +- `lwa` / `lwax` (memory ops) — combined load-and-sign-extend; lives outside this set. + +## IBM Reference + +- [AIX 7.3 — `extsw` (Extend Sign Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-extsw-extend-sign-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/isync.md b/migration/project-root/ppc-manual/alu/isync.md new file mode 100644 index 0000000..7559242 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/isync.md @@ -0,0 +1,111 @@ +# `isync` — Instruction Synchronize + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c00012c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `isync` | `isync` | — | Instruction Synchronize | + +## Syntax + +```asm +isync +``` + +## Encoding + +### `isync` — form `XL` + +- **Opcode word:** `0x4c00012c` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `150` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `isync` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +instruction-stream synchronisation — discards speculative state. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`isync`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="isync"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:759`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L759) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:32`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L32) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:714`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L714) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1691-1693`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1691-L1693) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sync | PpcOpcode::eieio | PpcOpcode::isync => { + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Instruction-fetch barrier.** Discards any speculatively fetched/decoded instructions and forces all subsequent ones to be re-fetched after preceding instructions complete. Required after self-modifying code, JIT-emitted code, and after MMU/page-table changes. +- **Stronger than [`sync`](sync.md) for instruction stream**, weaker for memory stream — `isync` does not order stores against later loads. It only forces a fetch refresh. +- **Common idiom: `dcbf` / `icbi` / `sync` / `isync`** — flush data cache, invalidate instruction cache, drain memory, refetch — used by JITs and self-modifying loaders. +- **No operands.** Encoded as a fixed-form `XL` instruction; assemblers always emit `0x4c00012c`. +- **Xenia-rs is a no-op.** [`interpreter.rs:1267`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1267) handles `sync`/`eieio`/`isync` together. Because xenia interprets in straight-line program order without any speculative instruction cache, no barrier behaviour is needed for correctness. +- **Privilege level: user.** Unlike most cache management ops, `isync` is unprivileged and frequently appears in userland trampolines. + +## Related Instructions + +- [`sync`](sync.md) — heavy memory barrier. +- [`eieio`](eieio.md) — I/O ordering for caching-inhibited storage. +- `icbi`, `dcbf`, `dcbst` — cache management ops (outside this page set) usually paired with `isync`. + +## IBM Reference + +- [AIX 7.3 — `isync` (Instruction Synchronize)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-isync-instruction-synchronize-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulhdux.md b/migration/project-root/ppc-manual/alu/mulhdux.md new file mode 100644 index 0000000..10d9358 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulhdux.md @@ -0,0 +1,124 @@ +# `mulhdux` — Multiply High Doubleword Unsigned + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000012` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulhdu` | `mulhdux` | — | Multiply High Doubleword Unsigned | +| `mulhdu.` | `mulhdux` | Rc=1 | Multiply High Doubleword Unsigned | + +## Syntax + +```asm +mulhdu[Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mulhdux` — form `XO` + +- **Opcode word:** `0x7c000012` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `9` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulhdux: read | Source GPR (`r0`–`r31`). | +| `RB` | mulhdux: read | Source GPR. | +| `RD` | mulhdux: write | Destination GPR. | +| `CR` | mulhdux: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mulhdux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mulhdux`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA) * (RB))[0:63] ; high 64 of unsigned 64×64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulhdux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulhdux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:311`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L311) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:860`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L860) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:454-462`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L454-L462) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulhdux => { + let ra = ctx.gpr[instr.ra()] as u128; + let rb = ctx.gpr[instr.rb()] as u128; + ctx.gpr[instr.rd()] = (ra.wrapping_mul(rb) >> 64) as u64; + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Returns the high 64 bits of an unsigned 64×64 product.** Operands are zero-extended (treated as unsigned) before multiplication. Pair with [`mulldx`](mulldx.md) for the low 64 bits to form a full 128-bit unsigned product. +- **No `OE` bit.** No overflow signal — the high half is a defined function of the inputs even when the product fills 128 bits. +- **Xenia uses native `u128`.** [`interpreter.rs:284`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L284) widens both operands then shifts. Note `as u128` *zero*-extends, in contrast to [`mulhdx`](mulhdx.md)'s `as i64 as i128` which sign-extends — this is the entire semantic difference. +- **`Rc=1` CR0 update is correctly 64-bit.** Uses `as i64` directly. Because the high half is unsigned, treating it as signed for CR0 means very large unsigned values appear `LT` — keep this in mind when interpreting the CR0 bits after `mulhdu.`. +- **Used in reciprocal-multiply division strategies.** Compilers may strength-reduce divide-by-constant into `mulhdu` plus a shift; appears in optimised disassembly. +- **Slow.** Same multi-cycle cost as the signed variant. + +## Related Instructions + +- [`mulldx`](mulldx.md) — low 64 bits of the same unsigned product. +- [`mulhdx`](mulhdx.md) — signed high half. +- [`mulhwux`](mulhwux.md) — 32-bit unsigned high half. +- [`divdux`](divdux.md) — 64-bit unsigned divide; sometimes replaced by reciprocal `mulhdu`. + +## IBM Reference + +- [AIX 7.3 — `mulhdu` (Multiply High Doubleword Unsigned)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulhdu-multiply-high-double-word-unsigned-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulhdx.md b/migration/project-root/ppc-manual/alu/mulhdx.md new file mode 100644 index 0000000..b9b49d6 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulhdx.md @@ -0,0 +1,124 @@ +# `mulhdx` — Multiply High Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000092` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulhd` | `mulhdx` | — | Multiply High Doubleword | +| `mulhd.` | `mulhdx` | Rc=1 | Multiply High Doubleword | + +## Syntax + +```asm +mulhd[Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mulhdx` — form `XO` + +- **Opcode word:** `0x7c000092` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `73` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulhdx: read | Source GPR (`r0`–`r31`). | +| `RB` | mulhdx: read | Source GPR. | +| `RD` | mulhdx: write | Destination GPR. | +| `CR` | mulhdx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mulhdx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mulhdx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA) * (RB))[0:63] ; high 64 of signed 64×64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulhdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulhdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:297`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L297) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:864`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L864) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:445-453`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L445-L453) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulhdx => { + let ra = ctx.gpr[instr.ra()] as i64 as i128; + let rb = ctx.gpr[instr.rb()] as i64 as i128; + ctx.gpr[instr.rd()] = (ra.wrapping_mul(rb) >> 64) as u64; + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Returns the high 64 bits of a signed 64×64 product.** Pair with [`mulldx`](mulldx.md) (which returns the low 64 bits) to obtain the full 128-bit product. Both must be issued separately; PowerPC has no fused multiply-double-wide instruction. +- **No `OE` bit.** This XO-form instruction has no overflow-enable variant — there is no "high half overflow" because the high half is always defined. +- **Xenia widens to `i128` natively.** [`interpreter.rs:275`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L275) does the multiply in 128 bits then extracts the high 64. The `i64 as i128` casts ensure signed extension on both sides. +- **`Rc=1` CR0 update is correctly 64-bit.** [`interpreter.rs:278`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L278) uses `as i64` directly. CR0 reflects the sign of the high half: `LT` if the product is negative, `GT` if positive and large enough to overflow into the high half, `EQ` if the product fits in 64 bits *signed* (so the high half is the sign-extension of the low half — but xenia's check uses raw signed-zero compare, which equates only when the high half is exactly zero, i.e. the product is in `[0, 2^63)`). +- **Use [`mulhdux`](mulhdux.md) for the unsigned high half.** The two instructions differ in whether the operands are sign- or zero-extended before the multiply. +- **Slow.** 64-bit multiply is multi-cycle on Xenon; combining `mulhd` with `mulld` for a full 128-bit product roughly doubles the cost. + +## Related Instructions + +- [`mulldx`](mulldx.md) — low 64 bits of the same signed product. +- [`mulhdux`](mulhdux.md) — high 64 bits, unsigned interpretation. +- [`mullwx`](mullwx.md), [`mulhwx`](mulhwx.md), [`mulhwux`](mulhwux.md) — 32-bit family. +- [`divdx`](divdx.md), [`divdux`](divdux.md) — 64-bit division. + +## IBM Reference + +- [AIX 7.3 — `mulhd` (Multiply High Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulhd-multiply-high-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulhwux.md b/migration/project-root/ppc-manual/alu/mulhwux.md new file mode 100644 index 0000000..4dce5e5 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulhwux.md @@ -0,0 +1,126 @@ +# `mulhwux` — Multiply High Word Unsigned + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000016` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulhwu` | `mulhwux` | — | Multiply High Word Unsigned | +| `mulhwu.` | `mulhwux` | Rc=1 | Multiply High Word Unsigned | + +## Syntax + +```asm +mulhwu[Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mulhwux` — form `XO` + +- **Opcode word:** `0x7c000016` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `11` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulhwux: read | Source GPR (`r0`–`r31`). | +| `RB` | mulhwux: read | Source GPR. | +| `RD` | mulhwux: write | Destination GPR. | +| `CR` | mulhwux: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mulhwux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mulhwux`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RT <- high_32_of_unsigned_multiply((RA)[32:63], (RB)[32:63]) zero-extended to 64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulhwux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulhwux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:347`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L347) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:862`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L862) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:383-393`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L383-L393) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulhwux => { + // PPCBUG-020: 32-bit ABI CR0 view. + let ra = ctx.gpr[instr.ra()] as u32 as u64; + let rb = ctx.gpr[instr.rb()] as u32 as u64; + let result = ra.wrapping_mul(rb); + ctx.gpr[instr.rd()] = (result >> 32) & 0xFFFF_FFFF; + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Inputs are the low 32 bits, zero-extended.** `RA[32:63]` and `RB[32:63]` are treated as unsigned, widened to 64-bit `u64`, multiplied; the high 32 bits of the 64-bit product land in `RT[32:63]`. Xenia masks the high 32 bits of `RT` to zero ([`interpreter.rs:231`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L231)). +- **Pair with [`mullwx`](mullwx.md) for the full 64-bit unsigned product.** `mullw` returns the low 32 sign-extended; for unsigned use, pair `mulhwu` with `rlwinm` to mask the low half. Xbox 360 compilers commonly emit this combination. +- **No `OE` bit.** Same family rule. +- **`Rc=1` CR0 update.** Uses `as i32 as i64` ([`interpreter.rs:234`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L234)). Because the result is bounded by `0xFFFFFFFF` and stored only in the low 32 bits, this CR0 will report `LT` for any unsigned high half ≥ `0x80000000` — a known signed/unsigned interpretation pitfall when `Rc=1` is used with `mulhwu`. +- **Common idiom for multi-precision arithmetic.** `mulhwu` + `mullw` + `addc` chains build extended-precision multiplies entirely in 32-bit ops; useful for cryptographic code that targets the Xenon's 32-bit ABI. +- **Multi-cycle latency** like the rest of the multiply family. + +## Related Instructions + +- [`mullwx`](mullwx.md) — low 32 bits of the same product (sign-extended). +- [`mulhwx`](mulhwx.md) — signed high half. +- [`mulhdux`](mulhdux.md) — 64-bit unsigned high half. +- [`addcx`](addcx.md), [`addex`](addex.md) — used to chain 32-bit products into wider precision. + +## IBM Reference + +- [AIX 7.3 — `mulhwu` (Multiply High Word Unsigned)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulhwu-multiply-high-word-unsigned-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulhwx.md b/migration/project-root/ppc-manual/alu/mulhwx.md new file mode 100644 index 0000000..7d0f40f --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulhwx.md @@ -0,0 +1,126 @@ +# `mulhwx` — Multiply High Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000096` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulhw` | `mulhwx` | — | Multiply High Word | +| `mulhw.` | `mulhwx` | Rc=1 | Multiply High Word | + +## Syntax + +```asm +mulhw[Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mulhwx` — form `XO` + +- **Opcode word:** `0x7c000096` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `75` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulhwx: read | Source GPR (`r0`–`r31`). | +| `RB` | mulhwx: read | Source GPR. | +| `RD` | mulhwx: write | Destination GPR. | +| `CR` | mulhwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mulhwx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mulhwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RT <- high_32_of_signed_multiply((RA)[32:63], (RB)[32:63]) sign-extended to 64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulhwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulhwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:326`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L326) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:865`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L865) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:372-382`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L372-L382) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulhwx => { + // PPCBUG-020: 32-bit ABI CR0 view. + let ra = ctx.gpr[instr.ra()] as i32 as i64; + let rb = ctx.gpr[instr.rb()] as i32 as i64; + let result = ra.wrapping_mul(rb); + ctx.gpr[instr.rd()] = ((result >> 32) as u32) as u64; + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Inputs are the low 32 bits, signed-extended.** `RA[32:63]` and `RB[32:63]` are sign-extended to 64-bit signed values, multiplied, and the *high* 32 bits of the 64-bit product are returned in `RT[32:63]`. The high 32 bits of `RT` are *implementation-defined* per spec but xenia masks them to zero ([`interpreter.rs:222`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L222) `& 0xFFFF_FFFF`). +- **Pair with [`mullwx`](mullwx.md) for the full 64-bit product.** Both can issue independently — no fused 32×32→64 instruction. +- **No `OE` bit.** Like all `mulh*` instructions, no overflow flag is produced; the high half is by definition defined. +- **Xenia-rs quirk: high 32 bits zeroed.** Because spec says they're "undefined", legitimately matching either zero, sign-extension, or garbage. Xenia chooses zero, which differs from the literal Xenon behaviour (which sign-extends in some microarchitecture cases). For game code that doesn't read those bits, the difference is invisible. +- **`Rc=1` CR0 update.** [`interpreter.rs:225`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L225) uses `as i32 as i64` — operates on the truncated low 32 bits, which is correct for the *defined* portion of the result. +- **Multi-cycle latency.** Multiply is the slowest pipelined ALU op; `mulhw` shares the divider/multiplier unit. + +## Related Instructions + +- [`mullwx`](mullwx.md) — low 32 bits of a signed 32×32 product. +- [`mulhwux`](mulhwux.md) — high 32 bits, unsigned interpretation. +- [`mulhdx`](mulhdx.md), [`mulhdux`](mulhdux.md), [`mulldx`](mulldx.md) — 64-bit family. +- [`divwx`](divwx.md) — sometimes replaced by reciprocal-mul-then-shift in compiled code. + +## IBM Reference + +- [AIX 7.3 — `mulhw` (Multiply High Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulhw-multiply-high-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulldx.md b/migration/project-root/ppc-manual/alu/mulldx.md new file mode 100644 index 0000000..b4cffb8 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulldx.md @@ -0,0 +1,130 @@ +# `mulldx` — Multiply Low Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0001d2` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulld` | `mulldx` | — | Multiply Low Doubleword | +| `mulldo` | `mulldx` | OE=1 | Multiply Low Doubleword | +| `mulld.` | `mulldx` | Rc=1 | Multiply Low Doubleword | +| `mulldo.` | `mulldx` | OE=1, Rc=1 | Multiply Low Doubleword | + +## Syntax + +```asm +mulld[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mulldx` — form `XO` + +- **Opcode word:** `0x7c0001d2` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `233` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulldx: read | Source GPR (`r0`–`r31`). | +| `RB` | mulldx: read | Source GPR. | +| `RD` | mulldx: write | Destination GPR. | +| `CR` | mulldx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | mulldx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `mulldx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `mulldx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA) * (RB))[64:127] ; low 64 of signed 64×64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulldx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulldx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:368`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L368) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:872`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L872) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:433-444`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L433-L444) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulldx => { + let ra = ctx.gpr[instr.ra()] as i64; + let rb = ctx.gpr[instr.rb()] as i64; + ctx.gpr[instr.rd()] = ra.wrapping_mul(rb) as u64; + if instr.oe() { + overflow::apply(ctx, overflow::mulld_ov(ra, rb)); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Returns the low 64 bits of a signed 64×64 product.** Equivalent to `(int64_t)(RA * RB)` modulo `2^64`. Both operands are full 64-bit signed; no truncation on input. +- **High bits silently lost.** The high 64 bits of the true product are discarded; pair with [`mulhdx`](mulhdx.md) (signed) or [`mulhdux`](mulhdux.md) (unsigned) to recover them. +- **`OE=1` should set `XER[OV]`** when the 128-bit signed product cannot be represented in 64 bits — i.e. when `mulhd RA, RB` is not the sign-extension of `mulld RA, RB`. **Xenia-rs does not implement** `OE` ([`interpreter.rs:264`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L264) has no `oe()` branch). +- **`Rc=1` CR0 update is correctly 64-bit.** [`interpreter.rs:269`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L269) uses `as i64` — full 64-bit signed compare. One of the few non-truncating CR0 sites in xenia-rs; means `mulld.` gives spec-correct CR0 even when the result has non-zero high 32 bits. +- **Same instruction for signed and unsigned low halves.** Modular arithmetic is identical; only the high half (`mulhd` vs `mulhdu`) distinguishes the interpretations. +- **Multi-cycle latency** — slowest of the ALU pipelines after divide. + +## Related Instructions + +- [`mulhdx`](mulhdx.md), [`mulhdux`](mulhdux.md) — signed/unsigned high halves of the same multiply. +- [`mullwx`](mullwx.md) — 32-bit signed multiply (low 64). +- [`mulli`](mulli.md) — D-form: `RT ← (RA[32:63]) × SIMM`. +- [`divdx`](divdx.md), [`divdux`](divdux.md) — 64-bit divide, often paired with `mulld` for remainders. + +## IBM Reference + +- [AIX 7.3 — `mulld` (Multiply Low Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulld-multiply-low-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/mulli.md b/migration/project-root/ppc-manual/alu/mulli.md new file mode 100644 index 0000000..315ec27 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mulli.md @@ -0,0 +1,120 @@ +# `mulli` — Multiply Low Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x1c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mulli` | `mulli` | — | Multiply Low Immediate | + +## Syntax + +```asm +mulli [RD], [RA], [SIMM] +``` + +## Encoding + +### `mulli` — form `D` + +- **Opcode word:** `0x1c000000` +- **Primary opcode (bits 0–5):** `7` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mulli: read | Source GPR (`r0`–`r31`). | +| `SIMM` | mulli: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | mulli: write | Destination GPR. | + +## Register Effects + +### `mulli` + +- **Reads (always):** `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RT <- ((RA) * EXTS(SIMM))[64:127] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mulli`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mulli"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:382`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L382) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:332`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L332) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:165-172`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L165-L172) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mulli => { + // PPCBUG-004: 32-bit ABI. Read RA as i32 (low 32, sign-extended for + // multiply), product fits in 32 bits per ISA (overflow wraps). + let ra = ctx.gpr[instr.ra()] as i32 as i64; + let imm = instr.simm16() as i64; + ctx.gpr[instr.rd()] = (ra.wrapping_mul(imm) as u32) as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **64-bit operand, sign-extended 16-bit immediate.** Xenia reads the full 64-bit `RA` as `i64` and the immediate as a sign-extended `i64` ([`interpreter.rs:80-81`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L80-L81)) — note this differs from the PPC pseudocode header which writes `(RA) * EXTS(SIMM)` as a 64-bit operation but other implementations sometimes treat it as 32×32. On the Xenon (and in xenia-rs), it is genuinely 64-bit. +- **Returns the low 64 bits.** No high half is produced — equivalent to `(int64_t)RA * (int64_t)SIMM` modulo `2^64`. There is no `mulhi`-immediate instruction. +- **No `Rc`, no `OE`.** This D-form has no flag bits — strictly `RT ← RA * SIMM`. To check overflow, compare the result to `(int32_t)RA * SIMM` after the fact, or use [`mulldx`](mulldx.md) with `OE=1` after materialising the immediate. +- **Common compiler idiom.** `mulli` is heavily used for fixed-stride array indexing (`r3 *= sizeof_struct`) when the size is a small signed constant. +- **No carry.** `XER[CA]` is untouched. +- **Same multi-cycle latency** as `mullw` / `mulld`. Compilers strength-reduce `mulli rD, rA, 2^k` to a left shift and `mulli rD, rA, 3` to `add+shift` when the immediate has cheap structure. +- **Aliasing fine.** `mulli r3, r3, 5` rewrites in place. + +## Related Instructions + +- [`mullwx`](mullwx.md) — register-register low 32 (signed). +- [`mulldx`](mulldx.md) — register-register low 64 (signed). +- [`mulhdx`](mulhdx.md), [`mulhdux`](mulhdux.md) — high halves (no immediate variant). +- [`addi`](addi.md) — add immediate; sometimes substituted by compilers when the multiplier is `2^k+1` etc. +- [`slwx`](slwx.md), [`sldx`](sldx.md) — shifts often replace `mulli` for power-of-two multipliers. + +## IBM Reference + +- [AIX 7.3 — `mulli` (Multiply Low Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulli-multiply-low-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/mullwx.md b/migration/project-root/ppc-manual/alu/mullwx.md new file mode 100644 index 0000000..0ade3cc --- /dev/null +++ b/migration/project-root/ppc-manual/alu/mullwx.md @@ -0,0 +1,146 @@ +# `mullwx` — Multiply Low Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0001d6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mullw` | `mullwx` | — | Multiply Low Word | +| `mullwo` | `mullwx` | OE=1 | Multiply Low Word | +| `mullw.` | `mullwx` | Rc=1 | Multiply Low Word | +| `mullwo.` | `mullwx` | OE=1, Rc=1 | Multiply Low Word | + +## Syntax + +```asm +mullw[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `mullwx` — form `XO` + +- **Opcode word:** `0x7c0001d6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `235` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | mullwx: read | Source GPR (`r0`–`r31`). | +| `RB` | mullwx: read | Source GPR. | +| `RD` | mullwx: write | Destination GPR. | +| `CR` | mullwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | mullwx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `mullwx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `mullwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ((RA)[32:63]) * ((RB)[32:63]) ; signed 32×32 → 64 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mullwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mullwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:390`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L390) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:57`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L57) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:874`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L874) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:357-371`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L357-L371) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mullwx => { + // PPCBUG-009: 32-bit ABI. Truncate product to u32 — overflow detection + // (mullw_ov) still uses the full i64 product to catch the overflow. + let ra = ctx.gpr[instr.ra()] as i32 as i64; + let rb = ctx.gpr[instr.rb()] as i32 as i64; + let product = ra.wrapping_mul(rb); + ctx.gpr[instr.rd()] = product as u32 as u64; + if instr.oe() { + overflow::apply(ctx, overflow::mullw_ov(product)); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, ctx.gpr[instr.rd()] as u32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +prod64 <- sign_extend_32_to_64((RA)[32:63]) *s sign_extend_32_to_64((RB)[32:63]) +RT <- prod64 ; 64-bit result +if OE then + XER[OV] <- (prod64 ≠ sign_extend_32_to_64(prod64[32:63])) ; set when product doesn't fit in 32 bits + XER[SO] <- XER[SO] | XER[OV] +if Rc then + CR0 <- signed_compare(RT, 0) || XER[SO] +``` + +## Special Cases & Edge Conditions + +- **Inputs are the low 32 bits.** `mullw` only looks at `RA[32:63]` and `RB[32:63]`; the high 32 bits of each source are ignored. This is a 32-bit × 32-bit → 64-bit signed multiply. For full 64-bit operands use [`mulldx`](mulldx.md). +- **Result is sign-extended to 64 bits.** The 64-bit product fits into a 64-bit GPR without loss. Subsequent 32-bit consumers see `RT[32:63]` (the low 32 bits of the product); use [`mulhwx`](mulhwx.md) for the signed high 32 bits or [`mulhwux`](mulhwux.md) for the unsigned high 32 bits, computed in parallel without this instruction. +- **`OE` overflow test is 32-bit.** `XER[OV]` is set iff the 64-bit signed product cannot be represented in 32 bits — equivalently, iff `RT[32] ≠ RT[33] = … = RT[63]` (sign bit disagrees with the next 32 bits). Xenia-rs does **not** implement this; `OE` on `mullwo` is a no-op in the interpreter. +- **Xenia-rs CR0 update bug footprint.** The interpreter computes CR0 from `result as i32 as i64` — the low 32 bits sign-extended. For a 32×32→64 multiply the high 32 bits may be non-zero even when the low 32 bits are zero, so xenia's CR0 can differ from the spec's (which compares the full 64-bit product to zero). In practice this matters only for code that relies on `mullw.` to detect overflow via CR0 — extremely rare. +- **Latency.** On the Xenon, `mullw` has higher latency than add/sub; many hot inner loops avoid it by strength-reduction or shift-add chains. This is irrelevant for correctness but sometimes explains surprising instruction sequences in disassembly. + +## Related Instructions + +- [`mulhwx`](mulhwx.md) — signed high 32 bits of the same 32×32 product. +- [`mulhwux`](mulhwux.md) — unsigned high 32 bits of a 32×32 product. +- [`mulli`](mulli.md) — D-form: `RT ← (RA[32:63]) × SIMM` (low 64 bits, signed). +- [`mulldx`](mulldx.md), [`mulhdx`](mulhdx.md), [`mulhdux`](mulhdux.md) — 64-bit multiplies (low/high, signed/unsigned). +- [`divwx`](divwx.md), [`divwux`](divwux.md) — 32-bit signed / unsigned division. + +## IBM Reference + +- [AIX 7.3 — `mullw` (Multiply Low Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mullw-multiply-low-word-instruction) +- [AIX 7.3 — `mulli` (Multiply Low Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mulli-multiply-low-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/nandx.md b/migration/project-root/ppc-manual/alu/nandx.md new file mode 100644 index 0000000..668dbf5 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/nandx.md @@ -0,0 +1,122 @@ +# `nandx` — NAND + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0003b8` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `nand` | `nandx` | — | NAND | +| `nand.` | `nandx` | Rc=1 | NAND | + +## Syntax + +```asm +nand[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `nandx` — form `X` + +- **Opcode word:** `0x7c0003b8` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `476` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | nandx: read | Source GPR (alias for RD in some stores). | +| `RB` | nandx: read | Source GPR. | +| `RA` | nandx: write | Source GPR (`r0`–`r31`). | +| `CR` | nandx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `nandx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `nandx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- ~((RS) & (RB)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`nandx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="nandx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:753`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L753) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:812`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L812) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:570-577`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L570-L577) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::nandx => { + // PPCBUG-030: same shape — operate in u32. + let rs32 = ctx.gpr[instr.rs()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = (!(rs32 & rb32)) as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ~(RS AND RB)`.** Bit-wise NAND. Since `nand RA, RS, RS = ~RS` (NOT-OR-self), the simplified mnemonic `not RA, RS` assembles to `nor RA, RS, RS` (note: NOR, not NAND). NAND-self is equivalent — both produce `~RS` — but the assembler prefers the NOR form by convention. +- **Operand convention is X-form** (`RA` destination, `RS`/`RB` sources). +- **64-bit operation** on Xenon; `~` operates on the full `u64`. +- **No `OE` or `XER` side effects.** Only `Rc=1` updates `CR0` (signed compare to zero). +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:377`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L377) truncates with `as i32 as i64`. NAND results frequently have all-ones high bits when the low half AND is non-saturating, so the truncation can change CR0 semantics in subtle ways — call out as a quirk if reproducing CR-sensitive behaviour. +- **Idiom: NAND of two equal values produces NOT.** `nand. RA, RS, RS` ≡ `~RS` with CR0 update. Sometimes used by compilers when `not.` is unavailable in their tablegen. + +## Related Instructions + +- [`andx`](andx.md), [`andcx`](andcx.md) — base AND family. +- [`norx`](norx.md) — assembler-preferred form for "NOT" via `nor RA, RS, RS`. +- [`eqvx`](eqvx.md) — NXOR. +- [`orx`](orx.md), [`orcx`](orcx.md), [`xorx`](xorx.md) — full logical family. + +## IBM Reference + +- [AIX 7.3 — `nand` (NAND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-nand-instruction) diff --git a/migration/project-root/ppc-manual/alu/negx.md b/migration/project-root/ppc-manual/alu/negx.md new file mode 100644 index 0000000..1b0316d --- /dev/null +++ b/migration/project-root/ppc-manual/alu/negx.md @@ -0,0 +1,132 @@ +# `negx` — Negate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0000d0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `neg` | `negx` | — | Negate | +| `nego` | `negx` | OE=1 | Negate | +| `neg.` | `negx` | Rc=1 | Negate | +| `nego.` | `negx` | OE=1, Rc=1 | Negate | + +## Syntax + +```asm +neg[OE][Rc] [RD], [RA] +``` + +## Encoding + +### `negx` — form `XO` + +- **Opcode word:** `0x7c0000d0` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `104` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | negx: read | Source GPR (`r0`–`r31`). | +| `RD` | negx: write | Destination GPR. | +| `CR` | negx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | negx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `negx` + +- **Reads (always):** `RA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `negx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ~(RA) + 1 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`negx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="negx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:406`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L406) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:866`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L866) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:342-356`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L342-L356) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::negx => { + // PPCBUG-006: 32-bit ABI. `(!ra).wrapping_add(1)` on u64 always + // sets upper 32 bits — every neg poisoned the GPR. neg_ov also + // checks at 64-bit INT_MIN; should be 32-bit INT_MIN. + let ra32 = ctx.gpr[instr.ra()] as u32; + let result32 = (!ra32).wrapping_add(1); + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + overflow::apply(ctx, ra32 == 0x8000_0000); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Two's-complement negate.** `RT ← ~RA + 1`, equivalent to `0 − RA`. A specialisation of [`subfx`](subfx.md) where `RB` is implicit zero. +- **`INT64_MIN` is its own negation.** `neg(0x8000000000000000) = 0x8000000000000000` — the only fixed point. `nego` should set `XER[OV]` in this case (it is the canonical signed-overflow trigger), but **xenia-rs does not implement `OE`** ([`interpreter.rs:201`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L201) has no `oe()` branch). +- **`RB` field unused.** Set to 0 by assemblers; ignored. +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:204`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L204). Important: `neg.` of a 64-bit value with high bits set will give a CR0 that doesn't match spec (which compares the full 64-bit `~RA + 1` to zero). +- **No carry produced.** Use [`subfic`](subficx.md) `RT, RA, 0` (`RT ← 0 − RA` with carry) when you need the borrow. +- **Latency: single cycle.** Negate is the cheapest XO-form ALU operation (cheaper than `subf` despite being a special case, because there's no `RB` operand fetch). + +## Related Instructions + +- [`subfx`](subfx.md) — generalisation: `neg RT, RA` ≡ `subf RT, RA, 0` (but the latter requires materialising 0 in a register). +- [`subfic`](subficx.md) — `RT ← SIMM − RA` with `XER[CA]`; `subfic RT, RA, 0` produces a borrow. +- [`addx`](addx.md), [`addmex`](addmex.md), [`addzex`](addzex.md) — for chained negation (multi-word two's complement). +- `not` (simplified) — bit-wise complement via `nor RA, RS, RS`; distinct from negate. + +## IBM Reference + +- [AIX 7.3 — `neg` (Negate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-neg-negate-instruction) diff --git a/migration/project-root/ppc-manual/alu/norx.md b/migration/project-root/ppc-manual/alu/norx.md new file mode 100644 index 0000000..20ebec2 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/norx.md @@ -0,0 +1,124 @@ +# `norx` — NOR + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0000f8` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `nor` | `norx` | — | NOR | +| `nor.` | `norx` | Rc=1 | NOR | + +## Syntax + +```asm +nor[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `norx` — form `X` + +- **Opcode word:** `0x7c0000f8` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `124` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | norx: read | Source GPR (alias for RD in some stores). | +| `RB` | norx: read | Source GPR. | +| `RA` | norx: write | Source GPR (`r0`–`r31`). | +| `CR` | norx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `norx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `norx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- ~((RS) | (RB)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`norx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="norx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:763`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L763) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:777`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L777) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:562-569`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L562-L569) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::norx => { + // PPCBUG-029: `not` simplified mnemonic — every `not` poisoned the GPR. + let rs32 = ctx.gpr[instr.rs()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = (!(rs32 | rb32)) as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ~(RS OR RB)`.** Bit-wise NOR. The canonical idiom for one-instruction NOT: **`not RA, RS` is the simplified mnemonic for `nor RA, RS, RS`** — both source operands the same yields `~RS`. Almost every disassembly contains this pattern. +- **Operand convention** is X-form (`RA` destination, `RS`/`RB` sources). +- **64-bit operation** on Xenon; full 64-bit complement via `!` on `u64`. +- **No `OE` or `XER` side effects.** +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:372`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L372) uses `as i32 as i64`. Note that NOR almost always produces results with high bits set (since the OR rarely covers all 64 bits), so the truncated CR0 is usually `LT` (negative low half) where spec might give a different signed compare for the full 64-bit value. +- **`nor.` after a clear-low operation is a common pattern** for testing whether some high-bit mask is empty. + +## Related Instructions + +- [`orx`](orx.md), [`orcx`](orcx.md) — base OR family. +- [`andx`](andx.md), [`andcx`](andcx.md), [`nandx`](nandx.md) — AND family. +- [`eqvx`](eqvx.md) — NXOR. +- [`xorx`](xorx.md) — XOR. +- `not` (simplified mnemonic for `nor RA, RS, RS`). + +## IBM Reference + +- [AIX 7.3 — `nor` (NOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-nor-instruction) +- [AIX 7.3 — `not` (simplified mnemonic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-not-complement-register) diff --git a/migration/project-root/ppc-manual/alu/orcx.md b/migration/project-root/ppc-manual/alu/orcx.md new file mode 100644 index 0000000..3ad2f96 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/orcx.md @@ -0,0 +1,122 @@ +# `orcx` — OR with Complement + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000338` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `orc` | `orcx` | — | OR with Complement | +| `orc.` | `orcx` | Rc=1 | OR with Complement | + +## Syntax + +```asm +orc[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `orcx` — form `X` + +- **Opcode word:** `0x7c000338` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `412` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | orcx: read | Source GPR (alias for RD in some stores). | +| `RB` | orcx: read | Source GPR. | +| `RA` | orcx: write | Source GPR (`r0`–`r31`). | +| `CR` | orcx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `orcx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `orcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- (RS) | ~(RB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`orcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="orcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:800`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L800) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:807`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L807) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:548-555`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L548-L555) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::orcx => { + // PPCBUG-028: same shape as andcx — operate in u32. + let rs32 = ctx.gpr[instr.rs()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = (rs32 | !rb32) as u64; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← RS OR (NOT RB)`.** The complement is on `RB`. Useful for setting bits *outside* a mask — e.g. `orc r3, r3, r4` sets in `r3` every bit *not* set in `r4`. +- **Idiom: `orc RA, RS, RS`** = `RS | ~RS` = `-1` (all ones). Cheaper-looking than constructing `−1` via `lis`+`ori`, but the assembler usually prefers `li RA, -1` or `eqv RA, RS, RS`. +- **Operand convention** is X-form (`RA` destination, `RS`/`RB` sources). +- **64-bit operation** on Xenon; xenia uses Rust's `!` on `u64` for full-width complement. +- **No `OE` or `XER` side effects.** +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:362`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L362). Because `~RB` typically has high bits set, `orc.` results often appear `LT` in the truncated CR0. + +## Related Instructions + +- [`orx`](orx.md) — base OR (no complement). +- [`andcx`](andcx.md) — AND-with-complement; sister `c` form. +- [`norx`](norx.md), [`nandx`](nandx.md) — full-result complements. +- [`eqvx`](eqvx.md) — `~(RS XOR RB)`. + +## IBM Reference + +- [AIX 7.3 — `orc` (OR with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-orc-complement-instruction) diff --git a/migration/project-root/ppc-manual/alu/ori.md b/migration/project-root/ppc-manual/alu/ori.md new file mode 100644 index 0000000..8cafd10 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/ori.md @@ -0,0 +1,115 @@ +# `ori` — OR Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x60000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `ori` | `ori` | — | OR Immediate | + +## Syntax + +```asm +ori [RA], [RS], [UIMM] +``` + +## Encoding + +### `ori` — form `D` + +- **Opcode word:** `0x60000000` +- **Primary opcode (bits 0–5):** `24` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | ori: read | Source GPR (alias for RD in some stores). | +| `UIMM` | ori: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | ori: write | Source GPR (`r0`–`r31`). | + +## Register Effects + +### `ori` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RA <- (RS) | (0x0000 || UIMM) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`ori`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ori"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:810`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L810) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:347`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L347) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:512-515`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L512-L515) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ori => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | (instr.uimm16() as u64); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No record form.** Unlike [`andix`](andix.md), `ori` does **not** update `CR0` — there is no `ori.`. If you need a CR update after OR-immediate, follow it with `cmpwi` or use [`orx`](orx.md) with `Rc=1`. +- **Immediate is zero-extended.** Only the low 16 bits of `RA` can be affected; the high 48 bits are passed through from `RS` unchanged. +- **`ori 0, 0, 0` is the canonical NOP.** All PowerPC NOPs assemble to this encoding (`0x60000000`). Disassemblers usually display this as `nop`. +- **Common idiom: build a 32-bit constant via `lis` + `ori`.** `lis r3, hi16; ori r3, r3, lo16` materialises any 32-bit immediate with no CR or XER disturbance. +- **64-bit operation in xenia-rs.** [`interpreter.rs:330`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L330) — full `u64` OR; high bits unchanged from `RS`. +- **`RA = 0` reads `r0`** (not the literal zero). Different from `addi`'s `RA0` semantics; `ori` uses the regular `RA` interpretation. + +## Related Instructions + +- [`oris`](oris.md) — same op with the immediate shifted left 16. +- [`orx`](orx.md) — register-register; supports `Rc=1`. +- [`xori`](xori.md), [`xoris`](xoris.md), [`andix`](andix.md), [`andisx`](andisx.md) — sister immediate logicals. +- `nop` (simplified) — `ori 0, 0, 0`. + +## IBM Reference + +- [AIX 7.3 — `ori` (OR Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-ori-immediate-instruction) +- [AIX 7.3 — `nop` (simplified)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-nop-no-operation) diff --git a/migration/project-root/ppc-manual/alu/oris.md b/migration/project-root/ppc-manual/alu/oris.md new file mode 100644 index 0000000..b659c45 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/oris.md @@ -0,0 +1,113 @@ +# `oris` — OR Immediate Shifted + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x64000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `oris` | `oris` | — | OR Immediate Shifted | + +## Syntax + +```asm +oris [RA], [RS], [UIMM] +``` + +## Encoding + +### `oris` — form `D` + +- **Opcode word:** `0x64000000` +- **Primary opcode (bits 0–5):** `25` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | oris: read | Source GPR (alias for RD in some stores). | +| `UIMM` | oris: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | oris: write | Source GPR (`r0`–`r31`). | + +## Register Effects + +### `oris` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RA <- (RS) | (UIMM || 0x0000) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`oris`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="oris"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:821`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L821) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:348`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L348) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:516-519`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L516-L519) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::oris => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ((instr.uimm16() as u64) << 16); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No record form.** No `oris.` — same as [`ori`](ori.md). For CR0 updates use [`orx`](orx.md) with `Rc=1`. +- **Immediate is zero-extended *then* shifted left 16.** Only bits 32–47 of `RA` (in PowerISA bit numbering) can be affected; the high 32 bits and low 16 bits of `RA` come from `RS` unchanged. +- **Common pair with `lis`** to load a 32-bit constant: `lis r3, hi16` (= `addis r3, 0, hi16`), then `ori r3, r3, lo16`. **For unsigned constants whose low half has the high bit set**, `lis` followed by `ori` works cleanly because `ori` is zero-extending; using `addi` instead would sign-extend `lo16` and corrupt the constant. +- **64-bit operation in xenia-rs.** [`interpreter.rs:334`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L334). +- **No `XER`, no `CR` effect.** Pure register OR. +- **`RA = 0` reads `r0`** (not literal zero); see [`ori`](ori.md). + +## Related Instructions + +- [`ori`](ori.md) — companion (immediate not shifted). +- [`addis`](addis.md) — D-form add-immediate-shifted; pairs with `ori` to build constants. +- [`xoris`](xoris.md), [`andisx`](andisx.md) — sister immediate-shifted logicals. + +## IBM Reference + +- [AIX 7.3 — `oris` (OR Immediate Shifted)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-oris-immediate-shifted-instruction) diff --git a/migration/project-root/ppc-manual/alu/orx.md b/migration/project-root/ppc-manual/alu/orx.md new file mode 100644 index 0000000..6d15a85 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/orx.md @@ -0,0 +1,122 @@ +# `orx` — OR + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000378` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `or` | `orx` | — | OR | +| `or.` | `orx` | Rc=1 | OR | + +## Syntax + +```asm +or[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `orx` — form `X` + +- **Opcode word:** `0x7c000378` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `444` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | orx: read | Source GPR (alias for RD in some stores). | +| `RB` | orx: read | Source GPR. | +| `RA` | orx: write | Source GPR (`r0`–`r31`). | +| `CR` | orx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `orx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `orx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- (RS) | (RB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`orx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="orx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:773`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L773) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:59`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L59) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:809`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L809) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:542-547`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L542-L547) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::orx => { + // PPCBUG-032+020: 32-bit ABI CR0 view. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] | ctx.gpr[instr.rb()]; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Canonical "register move".** `or RA, RS, RS` copies `RS` to `RA` — assemblers expose this as the simplified mnemonic **`mr RA, RS`** (move register). It is the single most common instruction in PPC disassembly after loads/stores. +- **Operand convention** is X-form (`RA` destination, `RS`/`RB` sources). +- **64-bit operation** on Xenon; full bitwise OR across 64 bits. +- **No `OE` or `XER` side effects.** Only `Rc=1` updates `CR0`. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:357`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L357) truncates with `as i32 as i64`. For `or. RA, RS, RS` (i.e. `mr.`), this means CR0 reflects the low 32 bits of `RS` only — distinguishable from spec only when the high 32 bits are non-zero with all-zero low 32. +- **`or 26, 26, 26` is the Xbox 360 NOP variant** historically used to mark cache lines or signal the dispatch unit (alongside `nop` ≡ `ori 0,0,0`). Disassembly may show this — it has no architectural effect. + +## Related Instructions + +- [`orcx`](orcx.md) — OR with complement. +- [`norx`](norx.md) — NOR (and the basis for `not`). +- [`andx`](andx.md), [`xorx`](xorx.md), [`eqvx`](eqvx.md) — sister logicals. +- [`ori`](ori.md), [`oris`](oris.md) — D-form immediate variants (no record form). +- `mr` (simplified) — `or RA, RS, RS`. + +## IBM Reference + +- [AIX 7.3 — `or` (OR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-instruction-1) +- [AIX 7.3 — `mr` (Move Register, simplified)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-mr-move-register) diff --git a/migration/project-root/ppc-manual/alu/rldclx.md b/migration/project-root/ppc-manual/alu/rldclx.md new file mode 100644 index 0000000..067175b --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldclx.md @@ -0,0 +1,137 @@ +# `rldclx` — Rotate Left Doubleword then Clear Left + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MDS](../forms/MDS.md) · **Opcode:** `0x78000010` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldcl` | `rldclx` | — | Rotate Left Doubleword then Clear Left | +| `rldcl.` | `rldclx` | Rc=1 | Rotate Left Doubleword then Clear Left | + +## Syntax + +```asm +rldcl[Rc] [RA], [RS], [RB], [MB] +``` + +## Encoding + +### `rldclx` — form `MDS` + +- **Opcode word:** `0x78000010` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `RB` | source B GPR | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–30 | `XO` | extended opcode | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldclx: read | Source GPR (alias for RD in some stores). | +| `RB` | rldclx: read | Source GPR. | +| `MB` | rldclx: read | Mask begin bit. | +| `RA` | rldclx: write | Source GPR (`r0`–`r31`). | +| `CR` | rldclx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldclx` + +- **Reads (always):** `RS`, `RB`, `MB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldclx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldclx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldclx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:856`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L856) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:733`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L733) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:802-811`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L802-L811) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldclx => { + let rs = ctx.gpr[instr.rs()]; + let sh = ctx.gpr[instr.rb()] & 0x3F; + let mb = instr.mb_md(); + let rotated = rs.rotate_left(sh as u32); + let mask = rld_mask_left(mb); + ctx.gpr[instr.ra()] = rotated & mask; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL64(RS, RB[58:63]) & MASK(MB, 63)`.** Rotate `RS` left by `RB & 0x3F`, then *clear* bits to the left of `MB` — i.e. keep bits `MB..63`, force bits `0..MB-1` to zero. +- **Shift comes from a register.** Unlike [`rldiclx`](rldiclx.md), the rotate amount is dynamic. Only the low 6 bits of `RB` are used (`& 0x3F`); the upper 58 bits are silently ignored. +- **`MB` is a split 6-bit field.** Bit 5 of the encoded `mb/me` is *swapped* into bit position 5 (raw bit 30) — xenia decodes via `(instr.mb() << 1) | ((raw >> 1) & 1)` ([`interpreter.rs:587`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L587)). This MDS form is unusual; if you write a decoder, follow this exact bit assembly. +- **Mask generation.** `rld_mask_left(MB)` is `(1 << (64 - MB)) - 1` — i.e. clear bits `0..MB-1`, keep bits `MB..63`. When `MB = 0` the mask is all ones; when `MB = 63` only bit 63 survives. +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:592`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L592) uses `as i64` directly — no truncation. The rotate-and-mask family is one of the few xenia-rs instruction groups that already does the spec-correct 64-bit CR0 compare. +- **No `XER` effect.** +- **Use over [`rldiclx`](rldiclx.md)** when the shift amount is computed at runtime (e.g. via `cntlzd` for normalisation). + +## Related Instructions + +- [`rldcrx`](rldcrx.md) — sister: clear *right* instead of left. +- [`rldiclx`](rldiclx.md), [`rldicrx`](rldicrx.md), [`rldicx`](rldicx.md) — immediate-shift variants. +- [`rldimix`](rldimix.md) — rotate and mask insert. +- [`rlwnmx`](rlwnmx.md), [`rlwinmx`](rlwinmx.md) — 32-bit cousins. +- [`sldx`](sldx.md), [`srdx`](srdx.md) — preferred for plain 64-bit shifts. + +## IBM Reference + +- [AIX 7.3 — `rldcl` (Rotate Left Doubleword then Clear Left)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldcl-rotate-left-double-word-then-clear-left-instruction) diff --git a/migration/project-root/ppc-manual/alu/rldcrx.md b/migration/project-root/ppc-manual/alu/rldcrx.md new file mode 100644 index 0000000..f0caf25 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldcrx.md @@ -0,0 +1,136 @@ +# `rldcrx` — Rotate Left Doubleword then Clear Right + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MDS](../forms/MDS.md) · **Opcode:** `0x78000012` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldcr` | `rldcrx` | — | Rotate Left Doubleword then Clear Right | +| `rldcr.` | `rldcrx` | Rc=1 | Rotate Left Doubleword then Clear Right | + +## Syntax + +```asm +rldcr[Rc] [RA], [RS], [RB], [ME] +``` + +## Encoding + +### `rldcrx` — form `MDS` + +- **Opcode word:** `0x78000012` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `RB` | source B GPR | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–30 | `XO` | extended opcode | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldcrx: read | Source GPR (alias for RD in some stores). | +| `RB` | rldcrx: read | Source GPR. | +| `ME` | rldcrx: read | Mask end bit. | +| `RA` | rldcrx: write | Source GPR (`r0`–`r31`). | +| `CR` | rldcrx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldcrx` + +- **Reads (always):** `RS`, `RB`, `ME` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldcrx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldcrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldcrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:881`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L881) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:734`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L734) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:812-821`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L812-L821) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldcrx => { + let rs = ctx.gpr[instr.rs()]; + let sh = ctx.gpr[instr.rb()] & 0x3F; + let me = instr.mb_md(); + let rotated = rs.rotate_left(sh as u32); + let mask = rld_mask_right(me); + ctx.gpr[instr.ra()] = rotated & mask; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL64(RS, RB[58:63]) & MASK(0, ME)`.** Rotate `RS` left by `RB & 0x3F`, then *clear* bits to the right of `ME` — keep bits `0..ME`, force bits `ME+1..63` to zero. +- **Shift from register.** Same as [`rldclx`](rldclx.md): only the low 6 bits of `RB` count. +- **`ME` is a split 6-bit field.** Same swap-decoded layout as `MB` in `rldclx`: `(instr.mb() << 1) | ((raw >> 1) & 1)` ([`interpreter.rs:597`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L597)). Note that even though it represents `ME` here, xenia reads it from `instr.mb()` because the field shares the same encoding slot. +- **Mask generation.** `rld_mask_right(ME)` = `~((1 << (63 - ME)) - 1)` keeping bits `0..ME`. When `ME = 63` the mask is all ones; when `ME = 0` only bit 0 survives. +- **`Rc=1` CR0 is correctly 64-bit.** Uses `as i64` directly ([`interpreter.rs:602`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L602)). +- **No `XER` effect.** +- **Useful for left-shift with arbitrary discard.** `rldcr RA, RS, RB, 63 - n` is functionally close to a left-shift-and-mask sequence, with the rotate variant additionally allowing wrap-around. + +## Related Instructions + +- [`rldclx`](rldclx.md) — sister: clear *left* instead of right. +- [`rldicrx`](rldicrx.md) — immediate-shift form. +- [`rldicx`](rldicx.md), [`rldiclx`](rldiclx.md), [`rldimix`](rldimix.md) — full immediate rotate-and-mask family. +- [`sldx`](sldx.md) — plain 64-bit logical left shift (often a strength-reduced equivalent). + +## IBM Reference + +- [AIX 7.3 — `rldcr` (Rotate Left Doubleword then Clear Right)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldcr-rotate-left-double-word-then-clear-right-instruction) diff --git a/migration/project-root/ppc-manual/alu/rldiclx.md b/migration/project-root/ppc-manual/alu/rldiclx.md new file mode 100644 index 0000000..e688ecd --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldiclx.md @@ -0,0 +1,142 @@ +# `rldiclx` — Rotate Left Doubleword Immediate then Clear Left + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MD](../forms/MD.md) · **Opcode:** `0x78000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldicl` | `rldiclx` | — | Rotate Left Doubleword Immediate then Clear Left | +| `rldicl.` | `rldiclx` | Rc=1 | Rotate Left Doubleword Immediate then Clear Left | + +## Syntax + +```asm +rldicl[Rc] [RA], [RS], [SH], [MB] +``` + +## Encoding + +### `rldiclx` — form `MD` + +- **Opcode word:** `0x78000000` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–29 | `XO` | extended opcode | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldiclx: read | Source GPR (alias for RD in some stores). | +| `SH` | rldiclx: read | Shift amount. | +| `MB` | rldiclx: read | Mask begin bit. | +| `RA` | rldiclx: write | Source GPR (`r0`–`r31`). | +| `CR` | rldiclx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldiclx` + +- **Reads (always):** `RS`, `SH`, `MB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldiclx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldiclx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldiclx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:929`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L929) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:728`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L728) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:762-771`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L762-L771) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldiclx => { + let rs = ctx.gpr[instr.rs()]; + let sh = instr.sh64(); + let mb = instr.mb_md(); + let rotated = rs.rotate_left(sh); + let mask = rld_mask_left(mb); + ctx.gpr[instr.ra()] = rotated & mask; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL64(RS, SH) & MASK(MB, 63)`.** Rotate left by `SH`, then clear bits 0 through `MB-1`. The mask retains bits `MB..63`. +- **The Swiss-army knife of bit extraction.** Many assembler shorthands lower to this single instruction: + - `srdi RA, RS, n` ≡ `rldicl RA, RS, 64-n, n` — logical right shift by `n`. + - `clrldi RA, RS, n` ≡ `rldicl RA, RS, 0, n` — clear top `n` bits. + - `extrdi RA, RS, n, b` ≡ `rldicl RA, RS, b+n, 64-n` — extract `n` bits starting at `b`. +- **`SH` is 6 bits, immediate** (bits 16–20 + bit 30). Xenia uses `instr.sh64()` to assemble them. +- **`MB` is 6 bits, split-encoded** (`(instr.mb() << 1) | ((raw >> 1) & 1)`). +- **`Rc=1` CR0 is correctly 64-bit.** Uses `as i64` directly ([`interpreter.rs:551`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L551)). +- **No `XER` effect.** +- **Often appears in compiled disassembly** as a generic 64-bit shift. Decoding back to the simplified mnemonic above makes the intent obvious. + +## Related Instructions + +- [`rldicrx`](rldicrx.md) — clear-right counterpart (`MASK(0, ME)`). +- [`rldicx`](rldicx.md) — clear both ends. +- [`rldclx`](rldclx.md) — register-shift version. +- [`rlwinmx`](rlwinmx.md) — 32-bit cousin. +- `srdi`, `clrldi`, `extrdi` (simplified mnemonics). + +## IBM Reference + +- [AIX 7.3 — `rldicl` (Rotate Left Doubleword Immediate then Clear Left)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldicl-rotate-left-double-word-immediate-then-clear-left-instruction) +- [AIX 7.3 — Simplified shift/extract mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-rotate-shift) diff --git a/migration/project-root/ppc-manual/alu/rldicrx.md b/migration/project-root/ppc-manual/alu/rldicrx.md new file mode 100644 index 0000000..d416089 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldicrx.md @@ -0,0 +1,142 @@ +# `rldicrx` — Rotate Left Doubleword Immediate then Clear Right + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MD](../forms/MD.md) · **Opcode:** `0x78000004` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldicr` | `rldicrx` | — | Rotate Left Doubleword Immediate then Clear Right | +| `rldicr.` | `rldicrx` | Rc=1 | Rotate Left Doubleword Immediate then Clear Right | + +## Syntax + +```asm +rldicr[Rc] [RA], [RS], [SH], [ME] +``` + +## Encoding + +### `rldicrx` — form `MD` + +- **Opcode word:** `0x78000004` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–29 | `XO` | extended opcode | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldicrx: read | Source GPR (alias for RD in some stores). | +| `SH` | rldicrx: read | Shift amount. | +| `ME` | rldicrx: read | Mask end bit. | +| `RA` | rldicrx: write | Source GPR (`r0`–`r31`). | +| `CR` | rldicrx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldicrx` + +- **Reads (always):** `RS`, `SH`, `ME` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldicrx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldicrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldicrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:957`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L957) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:729`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L729) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:772-781`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L772-L781) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldicrx => { + let rs = ctx.gpr[instr.rs()]; + let sh = instr.sh64(); + let me = instr.mb_md(); + let rotated = rs.rotate_left(sh); + let mask = rld_mask_right(me); + ctx.gpr[instr.ra()] = rotated & mask; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL64(RS, SH) & MASK(0, ME)`.** Rotate left by `SH`, then clear bits `ME+1..63`. +- **Common simplified mnemonics:** + - `sldi RA, RS, n` ≡ `rldicr RA, RS, n, 63-n` — logical left shift by `n`. + - `clrrdi RA, RS, n` ≡ `rldicr RA, RS, 0, 63-n` — clear low `n` bits. + - `extldi RA, RS, n, b` ≡ `rldicr RA, RS, b, n-1` — extract `n` bits from position `b` left-aligned. +- **`SH` is 6 bits, immediate.** Same `instr.sh64()` decode as the rest of the family. +- **`ME` is 6 bits, split-encoded.** Xenia stores it via `instr.mb()` — the field shares the slot with `MB` from sister instructions; the operation just interprets it as the right edge. +- **`Rc=1` CR0 is correctly 64-bit.** Uses `as i64` directly ([`interpreter.rs:561`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L561)). +- **No `XER` effect.** +- **Heavily emitted by 64-bit code generators** for left-shift-and-clear sequences. Recognising the simplified mnemonics aids disassembly. + +## Related Instructions + +- [`rldiclx`](rldiclx.md) — clear-left counterpart (`MASK(MB, 63)`). +- [`rldicx`](rldicx.md) — clear both ends. +- [`rldcrx`](rldcrx.md) — register-shift version. +- [`rldimix`](rldimix.md) — insert under mask. +- [`rlwinmx`](rlwinmx.md) — 32-bit cousin. +- `sldi`, `clrrdi`, `extldi` (simplified mnemonics). + +## IBM Reference + +- [AIX 7.3 — `rldicr` (Rotate Left Doubleword Immediate then Clear Right)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldicr-rotate-left-double-word-immediate-then-clear-right-instruction) diff --git a/migration/project-root/ppc-manual/alu/rldicx.md b/migration/project-root/ppc-manual/alu/rldicx.md new file mode 100644 index 0000000..da8dca4 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldicx.md @@ -0,0 +1,138 @@ +# `rldicx` — Rotate Left Doubleword Immediate then Clear + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MD](../forms/MD.md) · **Opcode:** `0x78000008` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldic` | `rldicx` | — | Rotate Left Doubleword Immediate then Clear | +| `rldic.` | `rldicx` | Rc=1 | Rotate Left Doubleword Immediate then Clear | + +## Syntax + +```asm +rldic[Rc] [RA], [RS], [SH], [MB] +``` + +## Encoding + +### `rldicx` — form `MD` + +- **Opcode word:** `0x78000008` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–29 | `XO` | extended opcode | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldicx: read | Source GPR (alias for RD in some stores). | +| `SH` | rldicx: read | Shift amount. | +| `MB` | rldicx: read | Mask begin bit. | +| `RA` | rldicx: write | Source GPR (`r0`–`r31`). | +| `CR` | rldicx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldicx` + +- **Reads (always):** `RS`, `SH`, `MB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldicx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldicx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldicx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:906`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L906) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:730`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L730) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:782-791`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L782-L791) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldicx => { + let rs = ctx.gpr[instr.rs()]; + let sh = instr.sh64(); + let mb = instr.mb_md(); + let rotated = rs.rotate_left(sh); + let mask = rld_mask_left(mb) & rld_mask_right(63 - sh); + ctx.gpr[instr.ra()] = rotated & mask; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL64(RS, SH) & MASK(MB, 63 - SH)`.** Rotate `RS` left by `SH` bits, then mask off both ends: clear bits `0..MB-1` *and* clear bits `64-SH..63`. This is the "clear at both edges" variant — useful for inserting a field into an otherwise-zero register. +- **`SH` is a 6-bit immediate** spanning bits 16–20 plus bit 30 of the instruction word. Xenia uses the helper `instr.sh64()` ([`interpreter.rs:566`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L566)) to assemble the 6 bits. +- **`MB` is also 6-bit, split-encoded** like the rest of the `rld*` family: `(instr.mb() << 1) | ((raw >> 1) & 1)`. +- **Mask is computed as `MASK_LEFT(MB) AND MASK_RIGHT(63 - SH)`.** This produces the equivalent of "left-shift `RS` by `SH` then clear high bits above bit `MB`" — a common pattern when `MB ≤ 63 - SH`. +- **Equivalent to a logical shift when `MB = 0`.** `rldic RA, RS, SH, 0` ≡ `sldi RA, RS, SH` (an alias the assembler may prefer). +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:571`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L571) uses `as i64` directly. +- **No `XER` effect.** + +## Related Instructions + +- [`rldiclx`](rldiclx.md), [`rldicrx`](rldicrx.md) — clear-only-one-side variants. +- [`rldclx`](rldclx.md), [`rldcrx`](rldcrx.md) — register-shift forms. +- [`rldimix`](rldimix.md) — insert under mask. +- [`rlwinmx`](rlwinmx.md) — 32-bit cousin. +- `sldi` (simplified) — `rldic RA, RS, n, 0`; assemblers prefer this for plain logical left shifts. + +## IBM Reference + +- [AIX 7.3 — `rldic` (Rotate Left Doubleword Immediate then Clear)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldic-rotate-left-double-word-immediate-then-clear-instruction) diff --git a/migration/project-root/ppc-manual/alu/rldimix.md b/migration/project-root/ppc-manual/alu/rldimix.md new file mode 100644 index 0000000..678e091 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rldimix.md @@ -0,0 +1,136 @@ +# `rldimix` — Rotate Left Doubleword Immediate then Mask Insert + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [MD](../forms/MD.md) · **Opcode:** `0x7800000c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rldimi` | `rldimix` | — | Rotate Left Doubleword Immediate then Mask Insert | +| `rldimi.` | `rldimix` | Rc=1 | Rotate Left Doubleword Immediate then Mask Insert | + +## Syntax + +```asm +rldimi[Rc] [RA], [RS], [SH], [MB] +``` + +## Encoding + +### `rldimix` — form `MD` + +- **Opcode word:** `0x7800000c` +- **Primary opcode (bits 0–5):** `30` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–29 | `XO` | extended opcode | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rldimix: read | Source GPR (alias for RD in some stores). | +| `SH` | rldimix: read | Shift amount. | +| `MB` | rldimix: read | Mask begin bit. | +| `RA` | rldimix: write | Source GPR (`r0`–`r31`). | +| `CR` | rldimix: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rldimix` + +- **Reads (always):** `RS`, `SH`, `MB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rldimix`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rldimix`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rldimix"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:985`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L985) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:731`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L731) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:792-801`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L792-L801) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rldimix => { + let rs = ctx.gpr[instr.rs()]; + let sh = instr.sh64(); + let mb = instr.mb_md(); + let rotated = rs.rotate_left(sh); + let mask = rld_mask_left(mb) & rld_mask_right(63 - sh); + ctx.gpr[instr.ra()] = (rotated & mask) | (ctx.gpr[instr.ra()] & !mask); + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← (ROTL64(RS, SH) & MASK) | (RA & ~MASK)`.** *Reads* the prior `RA` so it can preserve the bits outside the mask — this is the only `rld*` instruction with `RA` as both source and destination. +- **Mask is `MASK_LEFT(MB) AND MASK_RIGHT(63 - SH)`** ([`interpreter.rs:578`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L578)) — same span as [`rldicx`](rldicx.md), but the un-masked region is preserved in the destination instead of being zeroed. +- **Use to insert a bit-field.** Common idiom: `rldimi RA, RS, b, mask_start` writes `RS`'s low (`64 - mask_start`) bits into `RA` starting at bit `b`. +- **`SH` and `MB` decoding** is identical to the rest of the family (6-bit `sh` via `instr.sh64()`, 6-bit `mb` via the swap layout). +- **`Rc=1` CR0 is correctly 64-bit.** Uses `as i64` directly. +- **No `XER` effect.** +- **Compile-time pattern.** When you see `rldimi r3, r4, n, m`, the compiler is splicing a value into `r3`; recover the meaning by computing the mask `MASK(m, 63 - n)`. + +## Related Instructions + +- [`rldicx`](rldicx.md), [`rldiclx`](rldiclx.md), [`rldicrx`](rldicrx.md) — same form family, but they zero outside the mask instead of preserving. +- [`rlwimix`](rlwimix.md) — 32-bit insert cousin. +- [`rldclx`](rldclx.md), [`rldcrx`](rldcrx.md) — register-shift forms (no insert variant). + +## IBM Reference + +- [AIX 7.3 — `rldimi` (Rotate Left Doubleword Immediate then Mask Insert)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rldimi-rotate-left-double-word-immediate-then-mask-insert-instruction) diff --git a/migration/project-root/ppc-manual/alu/rlwimix.md b/migration/project-root/ppc-manual/alu/rlwimix.md new file mode 100644 index 0000000..7032262 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rlwimix.md @@ -0,0 +1,139 @@ +# `rlwimix` — Rotate Left Word Immediate then Mask Insert + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [M](../forms/M.md) · **Opcode:** `0x50000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rlwimi` | `rlwimix` | — | Rotate Left Word Immediate then Mask Insert | +| `rlwimi.` | `rlwimix` | Rc=1 | Rotate Left Word Immediate then Mask Insert | + +## Syntax + +```asm +rlwimi[Rc] [RA], [RS], [SH], [MB], [ME] +``` + +## Encoding + +### `rlwimix` — form `M` + +- **Opcode word:** `0x50000000` +- **Primary opcode (bits 0–5):** `20` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `SH/RB` | shift amount or source B | +| 21–25 | `MB` | mask begin | +| 26–30 | `ME` | mask end | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rlwimix: read | Source GPR (alias for RD in some stores). | +| `SH` | rlwimix: read | Shift amount. | +| `MB` | rlwimix: read | Mask begin bit. | +| `ME` | rlwimix: read | Mask end bit. | +| `RA` | rlwimix: write | Source GPR (`r0`–`r31`). | +| `CR` | rlwimix: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rlwimix` + +- **Reads (always):** `RS`, `SH`, `MB`, `ME` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rlwimix`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rlwimix`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rlwimix"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1010`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1010) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:344`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L344) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:737-749`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L737-L749) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rlwimix => { + let rs = ctx.gpr[instr.rs()] as u32; + let sh = instr.sh(); + let mb = instr.mb(); + let me = instr.me(); + let rotated = rs.rotate_left(sh); + let mask = rlw_mask(mb, me); + let ra = ctx.gpr[instr.ra()] as u32; + ctx.gpr[instr.ra()] = ((rotated & mask) | (ra & !mask)) as u64; + // PPCBUG-025: 32-bit ABI CR0 view. + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← (ROTL32(RS[32:63], SH) & MASK) | (RA[32:63] & ~MASK)`.** Reads the low 32 bits of `RS`, rotates them, then *inserts* under the mask back into the low 32 bits of `RA`. The high 32 bits of `RA` are *implementation-defined* per spec; **xenia-rs zeroes them** (the `as u32` cast at [`interpreter.rs:529`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L529) discards them on read, then `as u64` zero-extends on write). +- **Mask follows the standard `MB..ME` PPC convention.** Both `MB` and `ME` are 5-bit fields; the mask is contiguous when `MB <= ME`, and *wraps* around (a "donut" mask: bits `MB..31` and `0..ME`) when `MB > ME`. Xenia's `rlw_mask(mb, me)` helper handles both cases. +- **`SH` is 5 bits.** Rotate amount is `SH mod 32`; values `≥ 32` are not encodable in this M-form. +- **Used for bit-field insertion** (`insrwi RA, RS, n, b` ≡ `rlwimi RA, RS, 32-(b+n), b, b+n-1`). Compilers emit `rlwimi` extensively for struct-bitfield writes. +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:531`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L531). Since the high 32 bits of the result are zero, this matches spec's compare on the (defined) low half — but if a real Xenon left high bits non-zero, behaviour would diverge. +- **No `XER` effect.** + +## Related Instructions + +- [`rlwinmx`](rlwinmx.md) — same mask family but zeroes outside (no read-modify-write). +- [`rlwnmx`](rlwnmx.md) — register-shift variant of `rlwinm`. +- [`rldimix`](rldimix.md) — 64-bit insert cousin. +- `insrwi`, `inslwi` (simplified mnemonics for common insert patterns). + +## IBM Reference + +- [AIX 7.3 — `rlwimi` (Rotate Left Word Immediate then Mask Insert)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rlwimi-rotate-left-word-immediate-then-mask-insert-instruction) diff --git a/migration/project-root/ppc-manual/alu/rlwinmx.md b/migration/project-root/ppc-manual/alu/rlwinmx.md new file mode 100644 index 0000000..da4bbcf --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rlwinmx.md @@ -0,0 +1,145 @@ +# `rlwinmx` — Rotate Left Word Immediate then AND with Mask + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [M](../forms/M.md) · **Opcode:** `0x54000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rlwinm` | `rlwinmx` | — | Rotate Left Word Immediate then AND with Mask | +| `rlwinm.` | `rlwinmx` | Rc=1 | Rotate Left Word Immediate then AND with Mask | + +## Syntax + +```asm +rlwinm[Rc] [RA], [RS], [SH], [MB], [ME] +``` + +## Encoding + +### `rlwinmx` — form `M` + +- **Opcode word:** `0x54000000` +- **Primary opcode (bits 0–5):** `21` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `SH/RB` | shift amount or source B | +| 21–25 | `MB` | mask begin | +| 26–30 | `ME` | mask end | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rlwinmx: read | Source GPR (alias for RD in some stores). | +| `SH` | rlwinmx: read | Shift amount. | +| `MB` | rlwinmx: read | Mask begin bit. | +| `ME` | rlwinmx: read | Mask end bit. | +| `RA` | rlwinmx: write | Source GPR (`r0`–`r31`). | +| `CR` | rlwinmx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rlwinmx` + +- **Reads (always):** `RS`, `SH`, `MB`, `ME` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rlwinmx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rlwinmx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rlwinmx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1046`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1046) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:345`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L345) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:725-736`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L725-L736) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rlwinmx => { + let rs = ctx.gpr[instr.rs()] as u32; + let sh = instr.sh(); + let mb = instr.mb(); + let me = instr.me(); + let rotated = rs.rotate_left(sh); + let mask = rlw_mask(mb, me); + ctx.gpr[instr.ra()] = (rotated & mask) as u64; + // PPCBUG-024: 32-bit ABI CR0 view. + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL32(RS[32:63], SH) & MASK(MB, ME)`.** Take the low 32 bits of `RS`, rotate them left by `SH`, AND with a 32-bit mask. The high 32 bits of `RA` are zero (`as u64` zero-extension on the result). +- **The 32-bit Swiss army knife.** Most 32-bit shift/extract simplified mnemonics expand to this single instruction: + - `slwi RA, RS, n` ≡ `rlwinm RA, RS, n, 0, 31-n` — logical left shift. + - `srwi RA, RS, n` ≡ `rlwinm RA, RS, 32-n, n, 31` — logical right shift. + - `clrlwi RA, RS, n` ≡ `rlwinm RA, RS, 0, n, 31` — clear high `n` bits. + - `clrrwi RA, RS, n` ≡ `rlwinm RA, RS, 0, 0, 31-n` — clear low `n` bits. + - `extlwi`, `extrwi`, `clrlslwi` — full mnemonic family in PowerISA appendix. +- **Mask convention `MB..ME`** is contiguous when `MB ≤ ME`. When `MB > ME`, the mask is the *complement* of bits `ME+1..MB-1` — a donut/wrap mask. Xenia's `rlw_mask` handles both. +- **`SH` is 5 bits**, rotate amount `0..31`. +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:518`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L518). Since the result fits in 32 bits, the truncation matches spec exactly. +- **No `XER` effect.** + +## Related Instructions + +- [`rlwimix`](rlwimix.md) — same mask family with read-modify-write insert. +- [`rlwnmx`](rlwnmx.md) — register-shift version. +- [`rldiclx`](rldiclx.md), [`rldicrx`](rldicrx.md) — 64-bit cousins. +- [`slwx`](slwx.md), [`srwx`](srwx.md), [`srawix`](srawix.md) — straight 32-bit shift instructions. +- `slwi`, `srwi`, `clrlwi`, `clrrwi`, `extlwi`, `extrwi` (simplified mnemonics). + +## IBM Reference + +- [AIX 7.3 — `rlwinm` (Rotate Left Word Immediate then AND with Mask)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rlwinm-rotate-left-word-immediate-then-mask-instruction) +- [AIX 7.3 — Rotate / shift simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-rotate-shift) diff --git a/migration/project-root/ppc-manual/alu/rlwnmx.md b/migration/project-root/ppc-manual/alu/rlwnmx.md new file mode 100644 index 0000000..e5c96ff --- /dev/null +++ b/migration/project-root/ppc-manual/alu/rlwnmx.md @@ -0,0 +1,139 @@ +# `rlwnmx` — Rotate Left Word then AND with Mask + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [M](../forms/M.md) · **Opcode:** `0x5c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `rlwnm` | `rlwnmx` | — | Rotate Left Word then AND with Mask | +| `rlwnm.` | `rlwnmx` | Rc=1 | Rotate Left Word then AND with Mask | + +## Syntax + +```asm +rlwnm[Rc] [RA], [RS], [RB], [MB], [ME] +``` + +## Encoding + +### `rlwnmx` — form `M` + +- **Opcode word:** `0x5c000000` +- **Primary opcode (bits 0–5):** `23` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `SH/RB` | shift amount or source B | +| 21–25 | `MB` | mask begin | +| 26–30 | `ME` | mask end | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | rlwnmx: read | Source GPR (alias for RD in some stores). | +| `RB` | rlwnmx: read | Source GPR. | +| `MB` | rlwnmx: read | Mask begin bit. | +| `ME` | rlwnmx: read | Mask end bit. | +| `RA` | rlwnmx: write | Source GPR (`r0`–`r31`). | +| `CR` | rlwnmx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `rlwnmx` + +- **Reads (always):** `RS`, `RB`, `MB`, `ME` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `rlwnmx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`rlwnmx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="rlwnmx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1101`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1101) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:61`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L61) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:346`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L346) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:750-761`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L750-L761) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::rlwnmx => { + let rs = ctx.gpr[instr.rs()] as u32; + let sh = ctx.gpr[instr.rb()] as u32 & 0x1F; + let mb = instr.mb(); + let me = instr.me(); + let rotated = rs.rotate_left(sh); + let mask = rlw_mask(mb, me); + ctx.gpr[instr.ra()] = (rotated & mask) as u64; + // PPCBUG-026: 32-bit ABI CR0 view. + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ROTL32(RS[32:63], RB[59:63]) & MASK(MB, ME)`.** Identical to [`rlwinmx`](rlwinmx.md) except the rotate amount comes from the low 5 bits of `RB`. Xenia masks with `& 0x1F` ([`interpreter.rs:535`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L535)). +- **Use over `rlwinm`** when the rotate amount is dynamic (e.g. computed from a `cntlzw` for normalisation, or read from a parameter). +- **Mask is still 5+5 bits immediate** — `MB` and `ME` are not register-sourced; only the shift is. This is the M-form's quirk: only one of (`SH`, `MB`, `ME`) is variable across the family. +- **Donut masks supported.** `MB > ME` produces a wraparound mask, same as `rlwinm`. +- **High 32 bits of `RA` are zero** (32-bit operation, then `as u64` zero-extends). +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:540`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L540) — harmless because the result already fits in 32 bits. +- **No `XER` effect.** + +## Related Instructions + +- [`rlwinmx`](rlwinmx.md) — same op, immediate shift. +- [`rlwimix`](rlwimix.md) — insert variant (no register-shift form exists for insert). +- [`rldclx`](rldclx.md), [`rldcrx`](rldcrx.md) — 64-bit register-shift cousins. +- [`slwx`](slwx.md), [`srwx`](srwx.md) — straight 32-bit shifts. + +## IBM Reference + +- [AIX 7.3 — `rlwnm` (Rotate Left Word then AND with Mask)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-rlwnm-rotate-left-word-then-mask-instruction) diff --git a/migration/project-root/ppc-manual/alu/sldx.md b/migration/project-root/ppc-manual/alu/sldx.md new file mode 100644 index 0000000..c77837c --- /dev/null +++ b/migration/project-root/ppc-manual/alu/sldx.md @@ -0,0 +1,124 @@ +# `sldx` — Shift Left Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000036` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sld` | `sldx` | — | Shift Left Doubleword | +| `sld.` | `sldx` | Rc=1 | Shift Left Doubleword | + +## Syntax + +```asm +sld[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `sldx` — form `X` + +- **Opcode word:** `0x7c000036` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `27` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | sldx: read | Source GPR (alias for RD in some stores). | +| `RB` | sldx: read | Source GPR. | +| `RA` | sldx: write | Source GPR (`r0`–`r31`). | +| `CR` | sldx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `sldx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `sldx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- (RB)[57:63] +RA <- ((RS) << n) if n < 64 else 0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sldx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sldx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1122`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1122) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:759`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L759) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:676-683`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L676-L683) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sldx => { + let sh = ctx.gpr[instr.rb()] & 0x7F; + ctx.gpr[instr.ra()] = if sh < 64 { + ctx.gpr[instr.rs()] << sh + } else { 0 }; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **64-bit logical left shift.** `RA ← RS << (RB & 0x7F)` if the shift count is `< 64`, otherwise `RA = 0`. Bits shifted past bit 0 are discarded. +- **Critical: shift count is *7 bits*, not 6.** PowerISA reads `RB[57:63]` (7 bits, `0..127`). Counts in `[64, 127]` produce zero, *not* `RS << (count mod 64)`. Xenia respects this with `& 0x7F` and an explicit `if sh < 64` check ([`interpreter.rs:464`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L464)). C semantics' undefined behaviour for `<<` with a count `>= width` is a spec-violation source if you naïvely translate. +- **No `XER[CA]` produced** by left shifts. Logical right [`srdx`](srdx.md) and arithmetic right [`sradx`](sradx.md) differ here — arithmetic right *does* set `CA`. +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:467`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L467) uses `as i64` directly. CR0 reflects the sign of the full 64-bit shifted value (which is 0 for shifts ≥ 64, otherwise either `LT`/`GT`/`EQ`). +- **Strength-reduced from `mulli` for power-of-two multipliers.** +- **No `OE` bit.** + +## Related Instructions + +- [`slwx`](slwx.md) — 32-bit logical left shift. +- [`srdx`](srdx.md) — 64-bit logical right shift. +- [`sradx`](sradx.md), [`sradix`](sradix.md) — 64-bit arithmetic right shifts. +- [`rldicrx`](rldicrx.md) — `sldi` simplified mnemonic uses this. +- [`mulli`](mulli.md) — for non-power-of-two multipliers. + +## IBM Reference + +- [AIX 7.3 — `sld` (Shift Left Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sld-shift-left-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/slwx.md b/migration/project-root/ppc-manual/alu/slwx.md new file mode 100644 index 0000000..287d9b7 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/slwx.md @@ -0,0 +1,124 @@ +# `slwx` — Shift Left Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000030` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `slw` | `slwx` | — | Shift Left Word | +| `slw.` | `slwx` | Rc=1 | Shift Left Word | + +## Syntax + +```asm +slw[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `slwx` — form `X` + +- **Opcode word:** `0x7c000030` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `24` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | slwx: read | Source GPR (alias for RD in some stores). | +| `RB` | slwx: read | Source GPR. | +| `RA` | slwx: write | Source GPR (`r0`–`r31`). | +| `CR` | slwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `slwx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `slwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- (RB)[58:63] +RA <- ((RS) << n) & 0x0000_0000_FFFF_FFFF if n < 32 else 0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`slwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="slwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1141`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1141) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:757`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L757) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:622-631`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L622-L631) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::slwx => { + // PPCBUG-044: 32-bit ABI CR0 view. A result with bit 31 set + // (e.g. 0x80000000) is negative in i32 view but positive in i64. + let sh = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = if sh < 32 { + ((ctx.gpr[instr.rs()] as u32) << sh) as u64 + } else { 0 }; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit logical left shift, zero-extended to 64.** `RA ← (RS[32:63] << (RB & 0x3F))[32:63]` if `(RB & 0x3F) < 32`, else `RA = 0`. The high 32 bits of `RA` are always zero (zero-extension of the 32-bit result). +- **Shift count is 6 bits**, `RB[58:63]` — not 7 like [`sldx`](sldx.md). Counts in `[32, 63]` produce zero. Xenia reads the full register but the explicit `if sh < 32` guard in [`interpreter.rs:417`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L417) prevents Rust UB. +- **Spec quirk worth flagging:** xenia reads `ctx.gpr[instr.rb()] as u32`, which uses the low 32 bits of `RB`, not the spec's `RB & 0x3F`. For ordinary code these agree (counts ≤ 63), but a maliciously high `RB` could in principle differ. In practice this is a non-issue. +- **No `XER[CA]` for left shifts.** +- **`Rc=1` CR0 update truncates to 32 bits** ([`interpreter.rs:420`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L420)). Since the high 32 bits are zero, this matches spec exactly. +- **No `OE` bit.** + +## Related Instructions + +- [`sldx`](sldx.md) — 64-bit logical left shift. +- [`srwx`](srwx.md), [`srawx`](srawx.md), [`srawix`](srawix.md) — 32-bit right shifts. +- [`rlwinmx`](rlwinmx.md) — `slwi` simplified mnemonic uses this. + +## IBM Reference + +- [AIX 7.3 — `slw` (Shift Left Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-slw-shift-left-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/sradix.md b/migration/project-root/ppc-manual/alu/sradix.md new file mode 100644 index 0000000..a6a336e --- /dev/null +++ b/migration/project-root/ppc-manual/alu/sradix.md @@ -0,0 +1,132 @@ +# `sradix` — Shift Right Algebraic Doubleword Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XS](../forms/XS.md) · **Opcode:** `0x7c000674` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sradi` | `sradix` | — | Shift Right Algebraic Doubleword Immediate | +| `sradi.` | `sradix` | Rc=1 | Shift Right Algebraic Doubleword Immediate | + +## Syntax + +```asm +sradi[Rc] [RA], [RS], [SH] +``` + +## Encoding + +### `sradix` — form `XS` + +- **Opcode word:** `0x7c000674` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `826` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–29 | `XO` | extended opcode (9 bits) | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | sradix: read | Source GPR (alias for RD in some stores). | +| `SH` | sradix: read | Shift amount. | +| `RA` | sradix: write | Source GPR (`r0`–`r31`). | +| `CR` | sradix: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CA` | sradix: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `sradix` + +- **Reads (always):** `RS`, `SH` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `sradix`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RA <- ((RS) >>a SH) sign-extended +CA <- (RS signed < 0) && any_bit_shifted_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sradix`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sradix"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1230`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1230) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:743`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L743) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:709-722`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L709-L722) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sradix => { + let rs = ctx.gpr[instr.rs()] as i64; + let sh = instr.sh64(); + if sh == 0 { + ctx.gpr[instr.ra()] = rs as u64; + ctx.xer_ca = 0; + } else { + let result = rs >> sh; + ctx.xer_ca = if rs < 0 && (rs as u64) << (64 - sh) != 0 { 1 } else { 0 }; + ctx.gpr[instr.ra()] = result as u64; + } + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← (i64)RS >> SH`**, with `XER[CA]` set when `RS` is negative AND any one-bit was shifted out. +- **`SH` is 6 bits.** Encoded in bits 16–20 (`sh`) plus bit 30 (`sh5`); xenia uses `instr.sh64()` to assemble the 6 bits ([`interpreter.rs:496`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L496)). Range `0..63`. +- **`SH = 0`** is a no-op (sign-extends `RS` to itself), and explicitly clears `XER[CA]` ([`interpreter.rs:498`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L498)). This matches spec. +- **Spec divergence: 6-bit immediate, no saturation arm.** Unlike [`sradx`](sradx.md) which has a 7-bit register count and saturates at `≥ 64`, `sradi` always uses a count `< 64` so no special saturation case is needed. +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:506`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L506). +- **Idiom: `sradi rA, rS, n; addze rA, rA`** — signed integer divide by `2^n` rounded toward zero (the textbook PPC sequence). +- **No `OE` bit.** + +## Related Instructions + +- [`sradx`](sradx.md) — register-shift form. +- [`srawix`](srawix.md), [`srawx`](srawx.md) — 32-bit arithmetic right. +- [`addzex`](addzex.md) — pair for signed-divide-rounding. +- [`rldiclx`](rldiclx.md) — when arithmetic semantics not required (logical shift), `srdi` simplified mnemonic. + +## IBM Reference + +- [AIX 7.3 — `sradi` (Shift Right Algebraic Doubleword Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sradi-shift-right-algebraic-double-word-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/sradx.md b/migration/project-root/ppc-manual/alu/sradx.md new file mode 100644 index 0000000..07cf0c9 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/sradx.md @@ -0,0 +1,136 @@ +# `sradx` — Shift Right Algebraic Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000634` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `srad` | `sradx` | — | Shift Right Algebraic Doubleword | +| `srad.` | `sradx` | Rc=1 | Shift Right Algebraic Doubleword | + +## Syntax + +```asm +srad[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `sradx` — form `X` + +- **Opcode word:** `0x7c000634` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `794` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | sradx: read | Source GPR (alias for RD in some stores). | +| `RB` | sradx: read | Source GPR. | +| `RA` | sradx: write | Source GPR (`r0`–`r31`). | +| `CR` | sradx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CA` | sradx: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `sradx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `sradx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +n <- (RB)[57:63] +RA <- ((RS) >>a n) sign-extended if n < 64 +CA <- (RS signed < 0) && any_bit_shifted_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sradx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sradx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1201`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1201) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:841`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L841) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:692-708`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L692-L708) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sradx => { + let rs = ctx.gpr[instr.rs()] as i64; + let sh = ctx.gpr[instr.rb()] & 0x7F; + if sh == 0 { + ctx.gpr[instr.ra()] = rs as u64; + ctx.xer_ca = 0; + } else if sh < 64 { + let result = rs >> sh; + ctx.xer_ca = if rs < 0 && (rs as u64) << (64 - sh) != 0 { 1 } else { 0 }; + ctx.gpr[instr.ra()] = result as u64; + } else { + ctx.gpr[instr.ra()] = if rs < 0 { u64::MAX } else { 0 }; + ctx.xer_ca = if rs < 0 { 1 } else { 0 }; + } + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **64-bit arithmetic (sign-propagating) right shift.** `RA ← (i64)RS >> (RB & 0x7F)` with bits shifted in matching the sign bit of `RS`. Counts ≥ 64 saturate: `RA` becomes all-ones if `RS < 0`, else zero. +- **`XER[CA]` is the "lost-ones" indicator.** `CA = 1` iff `RS` is negative AND any of the bits shifted out were `1`. This makes `srad` / `sradi` the standard idiom for "divide negative integer by power of 2 with round-toward-zero" — followed by `addze` to compensate when `CA = 1`. +- **Three branches in xenia.** `sh == 0` (no shift, `CA=0`), `sh < 64` (normal shift, `CA` per spec), and `sh ≥ 64` (saturate to `0` or `−1`, `CA` reflects sign). The `(rs as u64) << (64 - sh) != 0` check at [`interpreter.rs:486`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L486) extracts whether any non-zero bit was shifted out. +- **Shift count is 7 bits.** Same as [`sldx`](sldx.md): `RB[57:63]`. +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:489`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L489). +- **No `OE` bit.** +- **Used by signed-divide-by-power-of-2 idiom:** `srad rA, rS, n; addze rA, rA` produces `rS / 2^n` with truncation toward zero rather than toward `-∞`. + +## Related Instructions + +- [`sradix`](sradix.md) — immediate-shift form (`SH` 6-bit immediate). +- [`srawx`](srawx.md), [`srawix`](srawix.md) — 32-bit arithmetic right shifts. +- [`srdx`](srdx.md) — 64-bit *logical* right shift (no `XER[CA]`). +- [`addzex`](addzex.md) — companion for the divide-rounding idiom. +- [`sldx`](sldx.md) — left shift. + +## IBM Reference + +- [AIX 7.3 — `srad` (Shift Right Algebraic Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-srad-shift-right-algebraic-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/srawix.md b/migration/project-root/ppc-manual/alu/srawix.md new file mode 100644 index 0000000..baae2ab --- /dev/null +++ b/migration/project-root/ppc-manual/alu/srawix.md @@ -0,0 +1,132 @@ +# `srawix` — Shift Right Algebraic Word Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000670` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `srawi` | `srawix` | — | Shift Right Algebraic Word Immediate | +| `srawi.` | `srawix` | Rc=1 | Shift Right Algebraic Word Immediate | + +## Syntax + +```asm +srawi[Rc] [RA], [RS], [SH] +``` + +## Encoding + +### `srawix` — form `X` + +- **Opcode word:** `0x7c000670` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `824` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | srawix: read | Source GPR (alias for RD in some stores). | +| `SH` | srawix: read | Shift amount. | +| `RA` | srawix: write | Source GPR (`r0`–`r31`). | +| `CR` | srawix: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CA` | srawix: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `srawix` + +- **Reads (always):** `RS`, `SH` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `srawix`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +RA <- ((RS)[32:63] >>a SH) sign-extended +CA <- (RS[32] signed) && any_low_bit_shifted_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`srawix`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="srawix"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1291`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1291) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:843`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L843) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:661-675`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L661-L675) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::srawix => { + // PPCBUG-042+043 coupled: same shape as srawx for the sh-immediate form. + let rs = ctx.gpr[instr.rs()] as i32; + let sh = instr.sh(); + if sh == 0 { + ctx.gpr[instr.ra()] = rs as u32 as u64; + ctx.xer_ca = 0; + } else { + let result = rs >> sh; + ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 }; + ctx.gpr[instr.ra()] = result as u32 as u64; + } + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← ((i32)RS >> SH) sign-extended`** with `XER[CA]` set when `RS` is negative AND any low bit was shifted out. +- **`SH` is 5 bits** (immediate, range `0..31`). Unlike [`srawx`](srawx.md), there is no saturation case because the count cannot exceed 31. Xenia reads it via `instr.sh()`. +- **`SH = 0`** sign-extends `RS[32:63]` to 64 bits and clears `CA`. This is *not* a no-op when `RS`'s high 32 bits differ from the sign extension of bit 32. +- **Common idiom: `srawi rA, rS, 31`** materialises the 32-bit sign of `rS` as `0` or `−1` — the canonical "sign mask" pattern. Often used for branchless `abs` or conditional negation. +- **Idiom: `srawi rA, rS, n; addze rA, rA`** — divide signed by `2^n` rounding toward zero. +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:457`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L457) — matches spec because the sign-extended result has consistent low/high 32-bit signs. +- **No `OE` bit.** + +## Related Instructions + +- [`srawx`](srawx.md) — register-shift form. +- [`sradix`](sradix.md), [`sradx`](sradx.md) — 64-bit arithmetic right. +- [`addzex`](addzex.md) — divide-rounding companion. +- [`extswx`](extswx.md) — `srawi rA, rS, 0` is functionally a sign-extend-32-to-64 plus `CA = 0` clear; `extsw` is preferred when CA isn't wanted. + +## IBM Reference + +- [AIX 7.3 — `srawi` (Shift Right Algebraic Word Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-srawi-shift-right-algebraic-word-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/srawx.md b/migration/project-root/ppc-manual/alu/srawx.md new file mode 100644 index 0000000..f07950a --- /dev/null +++ b/migration/project-root/ppc-manual/alu/srawx.md @@ -0,0 +1,138 @@ +# `srawx` — Shift Right Algebraic Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000630` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sraw` | `srawx` | — | Shift Right Algebraic Word | +| `sraw.` | `srawx` | Rc=1 | Shift Right Algebraic Word | + +## Syntax + +```asm +sraw[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `srawx` — form `X` + +- **Opcode word:** `0x7c000630` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `792` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | srawx: read | Source GPR (alias for RD in some stores). | +| `RB` | srawx: read | Source GPR. | +| `RA` | srawx: write | Source GPR (`r0`–`r31`). | +| `CR` | srawx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CA` | srawx: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `srawx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA`, `CA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `srawx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +n <- (RB)[58:63] +RA <- ((RS)[32:63] >>a n) sign-extended +CA <- 1 if (signed RS < 0) && any_bit_shifted_out else 0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`srawx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="srawx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1262`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1262) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:840`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L840) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:642-660`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L642-L660) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::srawx => { + // PPCBUG-041+043 coupled: 32-bit ABI writeback truncation + CR0 i32. + // CA logic is independently correct (uses u32 shifted-out test). + let rs = ctx.gpr[instr.rs()] as i32; + let sh = ctx.gpr[instr.rb()] as u32 & 0x3F; + if sh == 0 { + ctx.gpr[instr.ra()] = rs as u32 as u64; + ctx.xer_ca = 0; + } else if sh < 32 { + let result = rs >> sh; + ctx.xer_ca = if rs < 0 && (rs as u32) << (32 - sh) != 0 { 1 } else { 0 }; + ctx.gpr[instr.ra()] = result as u32 as u64; + } else { + ctx.gpr[instr.ra()] = if rs < 0 { 0xFFFF_FFFFu64 } else { 0 }; + ctx.xer_ca = if rs < 0 { 1 } else { 0 }; + } + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit arithmetic right shift, sign-extended to 64.** `RA ← ((i32)RS >> n) sign-extended`, with `XER[CA]` set when `RS[32] = 1` (negative) AND any low bit was shifted out. +- **Shift count is 6 bits**, `RB[58:63]`. Counts `≥ 32` saturate: `RA = -1` (all-ones, sign-extended) if `RS < 0`, else `0`. Xenia handles this in three branches ([`interpreter.rs:432-444`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L432-L444)). +- **`SH = 0`** sign-extends `RS` to 64 bits and clears `XER[CA]` — like `extsw`, but additionally writing CA. +- **Result is always sign-extended to 64 bits.** `RA[0:31]` matches the sign of `RA[32]`. This is the key difference from [`srwx`](srwx.md) (zero-extension). +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:443`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L443) — but since the result is sign-extended, the low 32 bits' sign matches the full 64-bit sign, so spec and xenia agree here. +- **Used with [`addzex`](addzex.md)** for signed divide by `2^n` rounding toward zero. +- **No `OE` bit.** + +## Related Instructions + +- [`srawix`](srawix.md) — immediate-shift form. +- [`sradx`](sradx.md), [`sradix`](sradix.md) — 64-bit arithmetic right shifts. +- [`srwx`](srwx.md) — 32-bit *logical* right shift (no `XER[CA]`). +- [`addzex`](addzex.md) — companion for divide-rounding idiom. +- [`slwx`](slwx.md) — left shift. + +## IBM Reference + +- [AIX 7.3 — `sraw` (Shift Right Algebraic Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sraw-shift-right-algebraic-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/srdx.md b/migration/project-root/ppc-manual/alu/srdx.md new file mode 100644 index 0000000..3496596 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/srdx.md @@ -0,0 +1,123 @@ +# `srdx` — Shift Right Doubleword + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000436` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `srd` | `srdx` | — | Shift Right Doubleword | +| `srd.` | `srdx` | Rc=1 | Shift Right Doubleword | + +## Syntax + +```asm +srd[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `srdx` — form `X` + +- **Opcode word:** `0x7c000436` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `539` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | srdx: read | Source GPR (alias for RD in some stores). | +| `RB` | srdx: read | Source GPR. | +| `RA` | srdx: write | Source GPR (`r0`–`r31`). | +| `CR` | srdx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `srdx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `srdx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- (RB)[57:63] +RA <- ((RS) >> n) if n < 64 else 0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`srdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="srdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1161`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1161) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:821`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L821) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:684-691`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L684-L691) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::srdx => { + let sh = ctx.gpr[instr.rb()] & 0x7F; + ctx.gpr[instr.ra()] = if sh < 64 { + ctx.gpr[instr.rs()] >> sh + } else { 0 }; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **64-bit logical right shift.** `RA ← RS >> (RB & 0x7F)` if the count is `< 64`, else `RA = 0`. Bits shifted in from the high end are zero (no sign extension). +- **Shift count is 7 bits** (`RB[57:63]`). Counts `64..127` produce zero, not `RS >> (count mod 64)`. Xenia respects this with `& 0x7F` and an explicit `if sh < 64` check ([`interpreter.rs:472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L472)). +- **No `XER[CA]` produced.** This is the logical right shift; for arithmetic shift with `XER[CA]` use [`sradx`](sradx.md). +- **`Rc=1` CR0 is correctly 64-bit.** [`interpreter.rs:475`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L475). Result is non-negative as a signed value (high bit is always cleared by the shift), so CR0 will only ever be `EQ` or `GT`. +- **No `OE` bit.** +- **The `srdi` simplified mnemonic** uses [`rldiclx`](rldiclx.md) instead — `rldicl rA, rS, 64-n, n` — because it can be combined with masking. `srd` is for runtime-variable counts. + +## Related Instructions + +- [`srwx`](srwx.md) — 32-bit logical right shift. +- [`sradx`](sradx.md), [`sradix`](sradix.md) — 64-bit arithmetic right. +- [`sldx`](sldx.md) — 64-bit left shift. +- [`rldiclx`](rldiclx.md) — `srdi` immediate-shift expansion. + +## IBM Reference + +- [AIX 7.3 — `srd` (Shift Right Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-srd-shift-right-double-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/srwx.md b/migration/project-root/ppc-manual/alu/srwx.md new file mode 100644 index 0000000..86f4799 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/srwx.md @@ -0,0 +1,125 @@ +# `srwx` — Shift Right Word + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000430` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `srw` | `srwx` | — | Shift Right Word | +| `srw.` | `srwx` | Rc=1 | Shift Right Word | + +## Syntax + +```asm +srw[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `srwx` — form `X` + +- **Opcode word:** `0x7c000430` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `536` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | srwx: read | Source GPR (alias for RD in some stores). | +| `RB` | srwx: read | Source GPR. | +| `RA` | srwx: write | Source GPR (`r0`–`r31`). | +| `CR` | srwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `srwx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `srwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +n <- (RB)[58:63] +RA <- ((RS)[32:63] >> n) zero-extended if n < 32 else 0 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`srwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="srwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:1180`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L1180) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:65`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L65) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:820`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L820) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:632-641`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L632-L641) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::srwx => { + // PPCBUG-044: 32-bit ABI CR0 view (zero-extended right shift can never + // have bit 31 set, but use the canonical form for consistency). + let sh = ctx.gpr[instr.rb()] as u32; + ctx.gpr[instr.ra()] = if sh < 32 { + ((ctx.gpr[instr.rs()] as u32) >> sh) as u64 + } else { 0 }; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit logical right shift, zero-extended to 64.** `RA ← (u32)RS >> (RB & 0x3F)` if count `< 32`, else `RA = 0`. The high 32 bits of `RA` are always zero. +- **Shift count is 6 bits**, `RB[58:63]`. Counts `[32, 63]` produce zero (not `RS >> (count mod 32)`); xenia's explicit `if sh < 32` guards against Rust UB ([`interpreter.rs:425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L425)). +- **No `XER[CA]` produced.** For arithmetic shift with `XER[CA]` use [`srawx`](srawx.md). +- **`Rc=1` CR0 update truncates to 32 bits in xenia-rs.** [`interpreter.rs:428`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L428). Since the result has zeroed high 32 bits and zeroed sign bit (high bit of the 32-bit result is always 0 after a non-zero shift), CR0 will be `EQ` or `GT`. +- **No `OE` bit.** +- **`srwi` simplified mnemonic** uses [`rlwinmx`](rlwinmx.md), not this instruction. `srw` is for runtime-variable counts. + +## Related Instructions + +- [`srdx`](srdx.md) — 64-bit logical right shift. +- [`srawx`](srawx.md), [`srawix`](srawix.md) — 32-bit arithmetic right. +- [`slwx`](slwx.md) — 32-bit left shift. +- [`rlwinmx`](rlwinmx.md) — `srwi` immediate expansion. + +## IBM Reference + +- [AIX 7.3 — `srw` (Shift Right Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-srw-shift-right-word-instruction) diff --git a/migration/project-root/ppc-manual/alu/subfcx.md b/migration/project-root/ppc-manual/alu/subfcx.md new file mode 100644 index 0000000..0b6255a --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subfcx.md @@ -0,0 +1,138 @@ +# `subfcx` — Subtract From Carrying + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000010` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subfc` | `subfcx` | — | Subtract From Carrying | +| `subfco` | `subfcx` | OE=1 | Subtract From Carrying | +| `subfc.` | `subfcx` | Rc=1 | Subtract From Carrying | +| `subfco.` | `subfcx` | OE=1, Rc=1 | Subtract From Carrying | + +## Syntax + +```asm +subfc[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `subfcx` — form `XO` + +- **Opcode word:** `0x7c000010` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `8` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subfcx: read | Source GPR (`r0`–`r31`). | +| `RB` | subfcx: read | Source GPR. | +| `RD` | subfcx: write | Destination GPR. | +| `CR` | subfcx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | subfcx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `subfcx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `subfcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ~(RA) + (RB) + 1 +CA <- carry_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subfcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subfcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:441`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L441) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:859`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L859) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:270-287`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L270-L287) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subfcx => { + // PPCBUG-007: 32-bit ABI. The `rb >= ra` u64 unsigned compare is + // exactly the shape that broke addis. Defensive 32-bit truncation + // is required for correct CA even after upstream cleanup. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let result32 = rb32.wrapping_sub(ra32); + ctx.xer_ca = if rb32 >= ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128); + overflow::apply(ctx, true_diff != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RT ← RB − RA` with `XER[CA]` set on no-borrow.** Same operand-order convention as [`subfx`](subfx.md): the *first* source is subtracted *from* the second. +- **`XER[CA] = 1` means *no borrow occurred*** — i.e. `RB >= RA` as unsigned. PowerISA encodes this as the carry-out of `~RA + RB + 1`, not as a borrow flag. Xenia's `if rb >= ra` test ([`interpreter.rs:157`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L157)) is the correct boolean encoding. +- **No trap on signed overflow.** `subfco` / `subfco.` set `XER[OV]` and sticky `XER[SO]`; xenia-rs leaves the `OE` arm unimplemented. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:160`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L160) truncates with `as i32 as i64`. Spec demands a full 64-bit signed compare for `subfc.`. +- **Seeds a multi-word subtract chain.** Use as the low-word op; continue with [`subfex`](subfex.md) for middle words and [`subfmex`](subfmex.md)/[`subfzex`](subfzex.md) for the high word. +- **Operand aliasing fine.** `subfc r3, r3, r3` always yields `0` with `CA = 1`. + +## Related Instructions + +- [`subfx`](subfx.md) — same op without `XER[CA]`. +- [`subfex`](subfex.md) — `~RA + RB + CA` (chain continuation). +- [`subfmex`](subfmex.md), [`subfzex`](subfzex.md) — chain terminators. +- [`subfic`](subficx.md) — D-form: `RT ← SIMM − RA` with `XER[CA]`. +- [`addcx`](addcx.md) — dual: addition seed-of-chain. + +## IBM Reference + +- [AIX 7.3 — `subfc` (Subtract From Carrying)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subfc-subtract-from-carrying-instruction) diff --git a/migration/project-root/ppc-manual/alu/subfex.md b/migration/project-root/ppc-manual/alu/subfex.md new file mode 100644 index 0000000..40d2cdc --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subfex.md @@ -0,0 +1,138 @@ +# `subfex` — Subtract From Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000110` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subfe` | `subfex` | — | Subtract From Extended | +| `subfeo` | `subfex` | OE=1 | Subtract From Extended | +| `subfe.` | `subfex` | Rc=1 | Subtract From Extended | +| `subfeo.` | `subfex` | OE=1, Rc=1 | Subtract From Extended | + +## Syntax + +```asm +subfe[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `subfex` — form `XO` + +- **Opcode word:** `0x7c000110` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `136` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subfex: read | Source GPR (`r0`–`r31`). | +| `RB` | subfex: read | Source GPR. | +| `RD` | subfex: write | Destination GPR. | +| `CR` | subfex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | subfex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `subfex` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `subfex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ~(RA) + (RB) + CA +CA <- carry_out +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subfex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subfex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:468`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L468) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:867`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L867) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:288-306`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L288-L306) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subfex => { + // PPCBUG-008: 32-bit ABI. Compute in u32 space — `!ra` on u64 always + // pollutes the upper 32 bits, making this an active poisoner. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = (!ra32).wrapping_add(rb32).wrapping_add(ca); + ctx.xer_ca = if rb32 > ra32 || (rb32 == ra32 && ca != 0) { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + // RT <- !RA + RB + CA == RB - RA - 1 + CA (32-bit semantics). + let true_sum = (rb32 as i32 as i128) - (ra32 as i32 as i128) - 1 + (ca as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RT ← ~RA + RB + XER[CA]`.** The middle link of a multi-word subtract chain seeded by [`subfcx`](subfcx.md). `XER[CA]` propagates the borrow from the previous word. +- **Carry-out predicate handles the boundary case.** Xenia computes `CA' = (rb > ra) || (rb == ra && CA != 0)` ([`interpreter.rs:170`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L170)). The second clause covers when `RB == RA` and the previous chain added a `+1` from the input carry — without it, the carry-out would be wrong. +- **`OE=1`** should set `XER[OV]` on signed overflow; xenia-rs ignores it. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:173`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L173). +- **`XER[CA]` must be initialised** (typically by [`subfcx`](subfcx.md)). Reading stale `CA` is a frequent multi-word-subtract bug. +- **Symmetry with [`addex`](addex.md).** `subfe RT, RA, RB` ≡ `adde RT, ~RA, RB` (with the implicit complement). + +## Related Instructions + +- [`subfcx`](subfcx.md) — seeds the chain (no `CA` read). +- [`subfmex`](subfmex.md), [`subfzex`](subfzex.md) — terminate the chain (`~RA + −1 + CA`, `~RA + 0 + CA`). +- [`subfx`](subfx.md) — plain subtract. +- [`addex`](addex.md) — dual. + +## IBM Reference + +- [AIX 7.3 — `subfe` (Subtract From Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subfe-subtract-from-extended-instruction) diff --git a/migration/project-root/ppc-manual/alu/subficx.md b/migration/project-root/ppc-manual/alu/subficx.md new file mode 100644 index 0000000..3ab4de0 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subficx.md @@ -0,0 +1,131 @@ +# `subficx` — Subtract From Immediate Carrying + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x20000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subfic` | `subficx` | — | Subtract From Immediate Carrying | + +## Syntax + +```asm +subfic [RD], [RA], [SIMM] +``` + +## Encoding + +### `subficx` — form `D` + +- **Opcode word:** `0x20000000` +- **Primary opcode (bits 0–5):** `8` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subficx: read | Source GPR (`r0`–`r31`). | +| `SIMM` | subficx: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `RD` | subficx: write | Destination GPR. | +| `CA` | subficx: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | + +## Register Effects + +### `subficx` + +- **Reads (always):** `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `subficx`: **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subficx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subficx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:459`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L459) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:333`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L333) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:155-164`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L155-L164) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subficx => { + // PPCBUG-005: 32-bit ABI. Sign-extended imm has bits 32-63 set for + // negative SIMM, poisoning the writeback. Canary uses 32-bit form. + let ra32 = ctx.gpr[instr.ra()] as u32; + let imm32 = instr.simm16() as i32 as u32; + let result32 = imm32.wrapping_sub(ra32); + ctx.xer_ca = if imm32 >= ra32 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RT ← SIMM − RA` with `XER[CA]` always set.** Note the operand order: the *immediate* is the minuend, not the subtrahend. `subfic rD, rA, 1` computes `1 - rA`, useful for negation-plus-one or one's-complement-style operations. +- **Immediate is sign-extended** to 64 bits before the subtract. So `subfic rD, rA, -1` computes `-1 - rA`, equivalent to `~rA`. +- **`XER[CA] = 1` when `SIMM >= RA`** (no borrow). Computed in xenia as `if imm >= ra` ([`interpreter.rs:73`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L73)) — comparing the sign-extended unsigned representations. +- **No `Rc`, no `OE`.** This D-form has no flag bits beyond the implicit `CA` write. +- **No record-form variant.** There is no `subfic.` in the ISA; if you need CR0 to also reflect the result, follow with a `cmpwi`. +- **Synthesised "subtract immediate"**. Assemblers sometimes accept `subi rD, rA, value` as a shorthand for `addi rD, rA, -value`, but for the carry-producing variant you must use `subfic` explicitly. +- **Common idiom: `subfic rD, rA, 0`** computes `-rA` and sets `CA` according to whether `rA == 0` (no borrow) or `rA != 0` (borrow). Equivalent to [`negx`](negx.md) when `CA` doesn't matter. + +## Related Instructions + +- [`subfcx`](subfcx.md) — XO-form register version. +- [`subfx`](subfx.md) — register subtract without `XER[CA]`. +- [`addic`](addic.md), [`addicx`](addicx.md) — dual: add immediate carrying. +- [`negx`](negx.md) — equivalent to `subfic rD, rA, 0` when `CA` is unwanted. + +## IBM Reference + +- [AIX 7.3 — `subfic` (Subtract From Immediate Carrying)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subfic-subtract-from-immediate-carrying-instruction) diff --git a/migration/project-root/ppc-manual/alu/subfmex.md b/migration/project-root/ppc-manual/alu/subfmex.md new file mode 100644 index 0000000..aa4611a --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subfmex.md @@ -0,0 +1,145 @@ +# `subfmex` — Subtract From Minus One Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c0001d0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subfme` | `subfmex` | — | Subtract From Minus One Extended | +| `subfmeo` | `subfmex` | OE=1 | Subtract From Minus One Extended | +| `subfme.` | `subfmex` | Rc=1 | Subtract From Minus One Extended | +| `subfmeo.` | `subfmex` | OE=1, Rc=1 | Subtract From Minus One Extended | + +## Syntax + +```asm +subfme[OE][Rc] [RD], [RA] +``` + +## Encoding + +### `subfmex` — form `XO` + +- **Opcode word:** `0x7c0001d0` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `232` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subfmex: read | Source GPR (`r0`–`r31`). | +| `CA` | subfmex: read; subfmex: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `RD` | subfmex: write | Destination GPR. | +| `CR` | subfmex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | subfmex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `subfmex` + +- **Reads (always):** `RA`, `CA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `subfmex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subfmex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subfmex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:486`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L486) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:871`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L871) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:325-341`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L325-L341) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subfmex => { + // PPCBUG-019: also fixes the always-true CA edge — `!ra` on u64 + // is non-zero when ra32==0xFFFFFFFF and ca==0, so CA was stuck at 1. + let ra32 = ctx.gpr[instr.ra()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = (!ra32).wrapping_add(ca).wrapping_sub(1); + ctx.xer_ca = if (!ra32) != 0 || ca != 0 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = -(ra32 as i32 as i128) - 2 + (ca as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RT ← ~RA + (−1) + XER[CA]` ≡ `~RA − 1 + CA`.** Terminator for a multi-word subtract chain when the high "minuend" word is implicitly all-ones (e.g. when computing `~x` style negation across many words). +- **`RB` field unused.** XO-form but only `RA` is read. +- **Carry-out predicate.** `CA' = (~RA != 0) || (CA != 0)` ([`interpreter.rs:191`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L191)). Only when `RA == ~0` (all ones) AND `CA == 0` does `CA'` become 0 — every other case produces no borrow on this final word. +- **`OE=1`** should set `XER[OV]` on signed overflow; xenia-rs ignores it. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:194`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L194). +- **`XER[CA]` must be initialised** (typically by a [`subfcx`](subfcx.md) or [`subfex`](subfex.md) earlier in the chain). +- **Symmetric with [`addmex`](addmex.md)**, the add-side terminator. + +## Related Instructions + +- [`subfzex`](subfzex.md) — terminate with `~RA + 0 + CA` instead. +- [`subfex`](subfex.md) — middle-of-chain. +- [`subfcx`](subfcx.md) — chain seed. +- [`addmex`](addmex.md) — dual. + +## IBM Reference + +- [AIX 7.3 — `subfme` (Subtract From Minus One Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subfme-subtract-from-minus-one-extended-instruction) diff --git a/migration/project-root/ppc-manual/alu/subfx.md b/migration/project-root/ppc-manual/alu/subfx.md new file mode 100644 index 0000000..98bf740 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subfx.md @@ -0,0 +1,148 @@ +# `subfx` — Subtract From + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000050` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subf` | `subfx` | — | Subtract From | +| `subfo` | `subfx` | OE=1 | Subtract From | +| `subf.` | `subfx` | Rc=1 | Subtract From | +| `subfo.` | `subfx` | OE=1, Rc=1 | Subtract From | + +## Syntax + +```asm +subf[OE][Rc] [RD], [RA], [RB] +``` + +## Encoding + +### `subfx` — form `XO` + +- **Opcode word:** `0x7c000050` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `40` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subfx: read | Source GPR (`r0`–`r31`). | +| `RB` | subfx: read | Source GPR. | +| `RD` | subfx: write | Destination GPR. | +| `CR` | subfx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | subfx: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `subfx` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `subfx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`. + +## Operation (pseudocode) + +``` +RT <- ~(RA) + (RB) + 1 ; = (RB) − (RA) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subfx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subfx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:427`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L427) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:863`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L863) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:255-269`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L255-L269) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subfx => { + // PPCBUG-017+020: 32-bit truncation. + let ra32 = ctx.gpr[instr.ra()] as u32; + let rb32 = ctx.gpr[instr.rb()] as u32; + let result32 = rb32.wrapping_sub(ra32); + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_diff = (rb32 as i32 as i128) - (ra32 as i32 as i128); + overflow::apply(ctx, true_diff != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +RT <- ~(RA) + (RB) + 1 ; = (RB) − (RA) +if OE then + XER[OV] <- signed_overflow_of_subtract((RB), (RA), RT) + XER[SO] <- XER[SO] | XER[OV] +if Rc then + CR0[LT,GT,EQ] <- signed_compare(RT, 0) + CR0[SO] <- XER[SO] +``` + +## Special Cases & Edge Conditions + +- **Operand order gotcha.** `subf RT, RA, RB` computes `RT ← RB − RA`, **not** `RA − RB`. This reverses the intuitive ordering seen in x86/ARM. The assembler exposes a simplified mnemonic `sub RT, RX, RY` ≡ `subf RT, RY, RX` that restores the natural order — watch for both forms in disassembly. +- **Implemented as add-with-complement.** Hardware (and xenia) compute `~RA + RB + 1`. All overflow/CR semantics are the same as [`addx`](addx.md) with one operand complemented. +- **No `XER[CA]` update** — use [`subfcx`](subfcx.md) if you need a borrow-out bit. +- **No trap on overflow.** `subfo` / `subfo.` only record the event in `XER[OV]` and sticky-set `XER[SO]`. +- **Signed-overflow predicate.** `OV = ((RA ^ RB) & (RB ^ RT)) >> 63` — set when operands have different signs and the result's sign differs from `RB`'s. +- **64-bit CR update on Xenon** (xenia-rs truncates to 32 bits; see [`addx`](addx.md) note). + +## Related Instructions + +- [`subfcx`](subfcx.md) — subtract-from producing `XER[CA]` (borrow-out). +- [`subfex`](subfex.md) — `~RA + RB + XER[CA]` (subtract-with-borrow chain). +- [`subfmex`](subfmex.md), [`subfzex`](subfzex.md) — subtract-from `−1` / `0` with carry-in (propagates borrows). +- [`subfic`](subfic.md) — D-form: `RT ← SIMM − RA` with `XER[CA]`. +- [`negx`](negx.md) — specialises to `0 − RA`. +- [`addx`](addx.md) — inverse; shares overflow machinery. + +## IBM Reference + +- [AIX 7.3 — `subf` (Subtract From)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subf-subtract-from-instruction) +- [AIX 7.3 — `sub` (simplified mnemonic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-sub-subtract) diff --git a/migration/project-root/ppc-manual/alu/subfzex.md b/migration/project-root/ppc-manual/alu/subfzex.md new file mode 100644 index 0000000..53ef64d --- /dev/null +++ b/migration/project-root/ppc-manual/alu/subfzex.md @@ -0,0 +1,146 @@ +# `subfzex` — Subtract From Zero Extended + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [XO](../forms/XO.md) · **Opcode:** `0x7c000190` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `subfze` | `subfzex` | — | Subtract From Zero Extended | +| `subfzeo` | `subfzex` | OE=1 | Subtract From Zero Extended | +| `subfze.` | `subfzex` | Rc=1 | Subtract From Zero Extended | +| `subfzeo.` | `subfzex` | OE=1, Rc=1 | Subtract From Zero Extended | + +## Syntax + +```asm +subfze[OE][Rc] [RD], [RA] +``` + +## Encoding + +### `subfzex` — form `XO` + +- **Opcode word:** `0x7c000190` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `200` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA` | subfzex: read | Source GPR (`r0`–`r31`). | +| `CA` | subfzex: read; subfzex: write | XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions. | +| `RD` | subfzex: write | Destination GPR. | +| `CR` | subfzex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `OE` | subfzex: write (conditional) | Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow. | + +## Register Effects + +### `subfzex` + +- **Reads (always):** `RA`, `CA` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `CA` +- **Writes (conditional):** `CR`, `OE` + +## Status-Register Effects + +- `subfzex`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.; **XER[CA]** ← carry-out of the add / borrow-in of the subtract (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`subfzex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="subfzex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:504`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L504) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:83`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L83) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:869`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L869) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:307-324`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L307-L324) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::subfzex => { + // PPCBUG-018: same active-poisoning shape as subfex; operate in u32. + let ra32 = ctx.gpr[instr.ra()] as u32; + let ca = ctx.xer_ca as u32; + let result32 = (!ra32).wrapping_add(ca); + // RT <- !RA + CA (no -1 term). 32-bit carry-out only when + // !ra32 = u32::MAX (i.e. ra32 = 0) AND ca = 1. + ctx.xer_ca = if ra32 == 0 && ca != 0 { 1 } else { 0 }; + ctx.gpr[instr.rd()] = result32 as u64; + if instr.oe() { + let true_sum = -(ra32 as i32 as i128) - 1 + (ca as i128); + overflow::apply(ctx, true_sum != (result32 as i32) as i128); + } + if instr.rc_bit() { + ctx.update_cr_signed(0, result32 as i32 as i64); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RT ← ~RA + 0 + XER[CA]` ≡ `~RA + CA`.** The subtract-side high-word terminator for a multi-word subtract chain. Implements `0 - (...) - borrow` for the high word. +- **`RB` field unused.** +- **Carry-out predicate.** `CA' = (~RA != 0) || (CA != 0)` ([`interpreter.rs:180`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L180)). Only `RA == ~0 && CA == 0` produces `CA' = 0`; every other case gives no-borrow. +- **`OE=1`** should set `XER[OV]` on signed overflow; xenia-rs ignores. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:183`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L183). +- **`XER[CA]` must be initialised** by an earlier carrying instruction. +- **Common idiom: extracting `XER[CA]` as 0/-1.** `subfze rT, rN` (where `rN == 0`) materialises `XER[CA]` to `0` or `-1` (`-1 = ~0 + CA = -1 + CA`); pair with [`addzex`](addzex.md) for `0/1` instead. + +## Related Instructions + +- [`subfmex`](subfmex.md) — terminator with `~RA + (−1) + CA`. +- [`subfex`](subfex.md), [`subfcx`](subfcx.md) — chain middle / seed. +- [`addzex`](addzex.md) — dual; produces 0/1 from `CA`. +- [`negx`](negx.md) — `subfze` with `RA = 0` and `CA = 1` is functionally similar. + +## IBM Reference + +- [AIX 7.3 — `subfze` (Subtract From Zero Extended)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-subfze-subtract-from-zero-extended-instruction) diff --git a/migration/project-root/ppc-manual/alu/sync.md b/migration/project-root/ppc-manual/alu/sync.md new file mode 100644 index 0000000..4c29bc9 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/sync.md @@ -0,0 +1,113 @@ +# `sync` — Synchronize + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0004ac` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sync` | `sync` | — | Synchronize | + +## Syntax + +```asm +sync +``` + +## Encoding + +### `sync` — form `X` + +- **Opcode word:** `0x7c0004ac` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `598` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `sync` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +multi-thread memory barrier (heavy). L=0 full sync; L=1 lightweight sync. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sync`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sync"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:754`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L754) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:85`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L85) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:825`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L825) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1691-1693`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1691-L1693) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sync | PpcOpcode::eieio | PpcOpcode::isync => { + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Heavy multi-thread memory barrier.** All memory accesses (loads and stores, cacheable and not) issued by this thread before `sync` complete with respect to all other threads/processors before any subsequent memory access begins. Drains the store queue. +- **`L` field selects sync class.** `L=0` is full *hwsync* (the default). `L=1` is `lwsync` — orders only loads-after-loads, loads-after-stores, and stores-after-stores (not stores-after-loads). The Xenon implements both via the same encoding with `L` (bit 9) selecting variant. Most disassembly shows the unsuffixed `sync` mnemonic, which assembles to `L=0`. +- **No register or CR effects.** Pure ordering primitive. +- **Used to implement release semantics.** A typical lock-release sequence is `sync; stw r0, lock`. Acquire side uses `lwsync` after the load. +- **Xenia-rs is a no-op.** [`interpreter.rs:1267`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1267) collapses `sync`, `eieio`, `isync` into PC-advance. Since xenia is single-threaded interpretation, host program order subsumes all PPC ordering. +- **Distinct from [`isync`](isync.md)**, which orders the *instruction* stream — `sync` does not refetch instructions. +- **Slow on real hardware.** Hundreds of cycles when the store queue is full; hot paths avoid `sync` and use `lwsync` or no barrier when only single-thread ordering is needed. + +## Related Instructions + +- [`isync`](isync.md) — instruction-fetch barrier. +- [`eieio`](eieio.md) — lighter I/O barrier for caching-inhibited storage. +- `lwsync` — same encoding, `L=1`; not separately enumerated in this page set. + +## IBM Reference + +- [AIX 7.3 — `sync` (Synchronize)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sync-synchronize-instruction) +- PowerISA v2.07B, Book II, §1.7 — defines `hwsync`/`lwsync`/`ptesync` semantics. diff --git a/migration/project-root/ppc-manual/alu/xori.md b/migration/project-root/ppc-manual/alu/xori.md new file mode 100644 index 0000000..369dfdf --- /dev/null +++ b/migration/project-root/ppc-manual/alu/xori.md @@ -0,0 +1,115 @@ +# `xori` — XOR Immediate + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x68000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `xori` | `xori` | — | XOR Immediate | + +## Syntax + +```asm +xori [RA], [RS], [UIMM] +``` + +## Encoding + +### `xori` — form `D` + +- **Opcode word:** `0x68000000` +- **Primary opcode (bits 0–5):** `26` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | xori: read | Source GPR (alias for RD in some stores). | +| `UIMM` | xori: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | xori: write | Source GPR (`r0`–`r31`). | + +## Register Effects + +### `xori` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RA <- (RS) ^ (0x0000 || UIMM) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`xori`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="xori"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:839`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L839) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:132`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L132) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:349`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L349) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:520-523`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L520-L523) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::xori => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] ^ (instr.uimm16() as u64); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No record form.** Like [`ori`](ori.md), there is no `xori.`. To get a CR0 update follow with `cmpwi` or use [`xorx`](xorx.md) with `Rc=1`. +- **Immediate is zero-extended** to 64 bits. Only the low 16 bits of `RA` can be flipped; the high 48 bits are passed through from `RS` unchanged. +- **`xori 0, 0, 0` is a valid NOP encoding** but the canonical NOP is `ori 0, 0, 0`. Disassemblers should still display this as `xori r0, r0, 0` or recognise it as a no-op. +- **64-bit operation in xenia-rs.** [`interpreter.rs:338`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L338) — full `u64` XOR with the immediate. +- **No `XER`, no `CR`** side effects. +- **`RA = 0` reads `r0`** (not literal zero); see [`ori`](ori.md). +- **Useful for masked toggle.** `xori rA, rS, mask` flips the bits of `rS` indicated by `mask` (low 16 bits only). + +## Related Instructions + +- [`xoris`](xoris.md) — companion (immediate shifted left 16). +- [`xorx`](xorx.md) — register-register XOR. +- [`eqvx`](eqvx.md) — `~(RS ^ RB)`. +- [`ori`](ori.md), [`andix`](andix.md) — sister immediate logicals. + +## IBM Reference + +- [AIX 7.3 — `xori` (XOR Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-xori-immediate-instruction) diff --git a/migration/project-root/ppc-manual/alu/xoris.md b/migration/project-root/ppc-manual/alu/xoris.md new file mode 100644 index 0000000..afd2a9e --- /dev/null +++ b/migration/project-root/ppc-manual/alu/xoris.md @@ -0,0 +1,114 @@ +# `xoris` — XOR Immediate Shifted + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x6c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `xoris` | `xoris` | — | XOR Immediate Shifted | + +## Syntax + +```asm +xoris [RA], [RS], [UIMM] +``` + +## Encoding + +### `xoris` — form `D` + +- **Opcode word:** `0x6c000000` +- **Primary opcode (bits 0–5):** `27` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | xoris: read | Source GPR (alias for RD in some stores). | +| `UIMM` | xoris: read | 16-bit unsigned immediate. Zero-extended. | +| `RA` | xoris: write | Source GPR (`r0`–`r31`). | + +## Register Effects + +### `xoris` + +- **Reads (always):** `RS`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RA <- (RS) ^ (UIMM || 0x0000) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`xoris`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="xoris"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:846`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L846) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:132`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L132) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:350`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L350) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:524-527`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L524-L527) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::xoris => { + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] ^ ((instr.uimm16() as u64) << 16); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No record form.** Like all immediate logicals other than `andi.`/`andis.`, `xoris` does not update CR0. +- **Immediate is zero-extended *then* shifted left 16.** Only bits 32–47 of `RA` (PowerISA bit numbering) can be flipped; the high 32 bits and low 16 bits of `RA` come from `RS` unchanged. +- **Common pattern with [`xori`](xori.md)** to flip arbitrary 32-bit bitmasks: `xoris RA, RS, hi16; xori RA, RA, lo16`. +- **64-bit operation.** [`interpreter.rs:342`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L342). +- **No `XER`, no `CR`.** +- **`RA = 0` reads `r0`** (not literal zero). +- **Used to toggle the high half of a 32-bit word**, e.g. `xoris r3, r3, 0x8000` flips bit 32 (the sign bit of the low word) — a one-instruction sign-flip on a 32-bit value. + +## Related Instructions + +- [`xori`](xori.md) — companion (immediate not shifted). +- [`xorx`](xorx.md) — register-register XOR. +- [`oris`](oris.md), [`andisx`](andisx.md) — sister immediate-shifted logicals. + +## IBM Reference + +- [AIX 7.3 — `xoris` (XOR Immediate Shifted)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-xoris-immediate-shifted-instruction) diff --git a/migration/project-root/ppc-manual/alu/xorx.md b/migration/project-root/ppc-manual/alu/xorx.md new file mode 100644 index 0000000..4e80934 --- /dev/null +++ b/migration/project-root/ppc-manual/alu/xorx.md @@ -0,0 +1,121 @@ +# `xorx` — XOR + +> **Category:** [Integer ALU](../categories/alu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000278` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `xor` | `xorx` | — | XOR | +| `xor.` | `xorx` | Rc=1 | XOR | + +## Syntax + +```asm +xor[Rc] [RA], [RS], [RB] +``` + +## Encoding + +### `xorx` — form `X` + +- **Opcode word:** `0x7c000278` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `316` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | xorx: read | Source GPR (alias for RD in some stores). | +| `RB` | xorx: read | Source GPR. | +| `RA` | xorx: write | Source GPR (`r0`–`r31`). | +| `CR` | xorx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `xorx` + +- **Reads (always):** `RS`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `xorx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +RA <- (RS) ^ (RB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`xorx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="xorx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_alu.cc:829`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_alu.cc#L829) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:132`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L132) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:798`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L798) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:556-561`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L556-L561) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::xorx => { + // PPCBUG-032+020: 32-bit ABI CR0 view. + ctx.gpr[instr.ra()] = ctx.gpr[instr.rs()] ^ ctx.gpr[instr.rb()]; + if instr.rc_bit() { ctx.update_cr_signed(0, ctx.gpr[instr.ra()] as u32 as i32 as i64); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`RA ← RS XOR RB`.** Bit-wise XOR. +- **Idiom: `xor RA, RS, RS`** zeroes `RA` — the canonical "clear register" instruction. Cheaper than `li RA, 0` because no immediate-extraction stage is involved. +- **Operand convention** is X-form (`RA` destination, `RS`/`RB` sources). +- **64-bit operation** on Xenon. +- **No `OE` or `XER` side effects.** Only `Rc=1` updates `CR0`. +- **64-bit CR update on Xenon, 32-bit in xenia-rs.** [`interpreter.rs:367`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L367) truncates with `as i32 as i64`. For `xor.` whose result has differing high/low halves, spec and xenia diverge; `xor. RA, RS, RS` gives `EQ` either way. +- **Useful as bitmask toggle.** `xor r3, r3, r4` flips in `r3` every bit set in `r4`. +- **No `XER[CA]`.** + +## Related Instructions + +- [`xori`](xori.md), [`xoris`](xoris.md) — D-form immediate variants. +- [`eqvx`](eqvx.md) — NXOR (`~(RS ^ RB)`). +- [`andx`](andx.md), [`orx`](orx.md), [`norx`](norx.md), [`nandx`](nandx.md) — sister logicals. + +## IBM Reference + +- [AIX 7.3 — `xor` (XOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-xor-instruction) diff --git a/migration/project-root/ppc-manual/branch/bcctrx.md b/migration/project-root/ppc-manual/branch/bcctrx.md new file mode 100644 index 0000000..af2a2c3 --- /dev/null +++ b/migration/project-root/ppc-manual/branch/bcctrx.md @@ -0,0 +1,164 @@ +# `bcctrx` — Branch Conditional to Count Register + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000420` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `bcctr` | `bcctrx` | — | Branch Conditional to Count Register | +| `bcctrl` | `bcctrx` | LK=1 | Branch Conditional to Count Register | + +## Syntax + +```asm +bcctr[LK] [BO], [BI] +``` + +## Encoding + +### `bcctrx` — form `XL` + +- **Opcode word:** `0x4c000420` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `528` +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `LK` | bcctrx: read | Link bit. When 1, LR ← address-of-next-instruction before the branch is taken. | +| `BO` | bcctrx: read | 5-bit branch options — selects CTR decrement, CTR test polarity, and CR bit test polarity. See `forms/XL.md`. | +| `BI` | bcctrx: read | CR bit index (0–31) selected by BO's condition test. | +| `CR` | bcctrx: read | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CTR` | bcctrx: read | Count register. Decremented and optionally tested by conditional branches when `BO[2]=0`. | +| `LR` | bcctrx: write (conditional) | Link register. Written by `bl`/`bla`/`bcl`/`bclrl`/`bcctrl`; read by `bclr`/`bclrl`. | + +## Register Effects + +### `bcctrx` + +- **Reads (always):** `LK`, `BO`, `BI`, `CR`, `CTR` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** `LR` + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +cond_ok <- BO[0] | (CR[BI] ≡ BO[1]) +if cond_ok then NIA <- CTR[0:61] || 0b00 +if LK then LR <- CIA + 4 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`bcctrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="bcctrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:250`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L250) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:11`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L11) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:721`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L721) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:962-981`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L962-L981) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::bcctrx => { + let bo = instr.bo(); + let bi = instr.bi(); + + let cond_ok = (bo & 0b10000) != 0 + || (ctx.get_cr_bit(bi) == ((bo & 0b01000) != 0)); + + if cond_ok { + let next_pc = ctx.pc + 4; + ctx.pc = (ctx.ctr as u32) & !3; + if instr.lk() { + ctx.lr = next_pc as u64; + } + } else { + if instr.lk() { + ctx.lr = (ctx.pc + 4) as u64; + } + ctx.pc += 4; + } + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No CTR decrement.** Unlike [`bcx`](bcx.md) and [`bclrx`](bclrx.md), `bcctr` cannot decrement CTR (the CTR is the *target*). The PowerISA reserves `BO[2] = 0` encodings — they are *invalid* on `bcctrx`. xenia silently ignores `BO[2]`/`BO[3]` and treats every `bcctr` as a pure CR-conditional branch, which matches both the canary emit and real Xenon hardware behaviour. +- **CTR alignment mask.** The target is `CTR & ~3`. Like `bclr`, the low two bits are stripped — a misaligned CTR is silently rounded down rather than trapping. +- **BO encoding (CR-only subset).** Because CTR-test bits are unused, only four `BO` patterns are meaningful: + + | BO (binary) | Meaning | + | --- | --- | + | `0100z` | branch if `CR[BI]` false | + | `0101z` | branch if `CR[BI]` true | + | `1z1zz` | branch always (`bctr`) | + | `0000z`/`001at`/etc. | reserved — implementation-defined | + +- **Indirect call/dispatch idiom.** `mtctr rN; bctrl` is the canonical PPC indirect call: load function pointer into CTR, call. The xenia interpreter writes `LR ← CIA + 4` only when the branch is taken — this matches the PowerISA, but contrast with `bcx` where `LK` always writes LR (even if the branch is not taken). The C-translation reference handles this asymmetry explicitly. +- **`bctr` for switch tables.** Compilers emit `bctr` (not `bctrl`) for jump-table dispatch, with CTR loaded from a base + (index*4) lookup. Xenia honours this by simply jumping to `CTR & ~3`. +- **Synchronisation.** Marked `sync` in xenia's XML — context-synchronising. JIT backends must ensure prior side effects have committed before the indirect transfer. +- **No prediction hint sensitivity.** Xenon predicts indirect branches via a separate target cache; the `BO[4]` hint is mostly cosmetic for `bcctr`. + +## Related Instructions + +- [`bclrx`](bclrx.md) — branch conditional to **LR** (function returns). +- [`bcx`](bcx.md) — branch conditional to displacement (B-form). +- [`bx`](bx.md) — unconditional displacement branch (I-form). +- [`mtctr`](../control/mtspr.md), [`mfctr`](../control/mfspr.md) — load/read CTR via `mtspr 9` / `mfspr 9`. +- [`sc`](sc.md) — alternative control-flow exit (system call). + +### Simplified Mnemonics + +| Simplified | Expansion | +| --- | --- | +| `bctr` | `bcctr BO=0b10100, BI=0` — unconditional indirect branch | +| `bctrl` | `bcctrl BO=0b10100, BI=0` — unconditional indirect call | +| `beqctr crN` | `bcctr BO=0b01100, BI=4·N+2` — call CTR if `crN.EQ` | +| `bnectr crN` | `bcctr BO=0b00100, BI=4·N+2` — call CTR if `crN.NE` | +| `bltctr crN` | `bcctr BO=0b01100, BI=4·N+0` — call CTR if `crN.LT` | +| `bgectr crN` | `bcctr BO=0b00100, BI=4·N+0` — call CTR if `crN.GE` | +| `bgtctr crN` | `bcctr BO=0b01100, BI=4·N+1` — call CTR if `crN.GT` | +| `blectr crN` | `bcctr BO=0b00100, BI=4·N+1` — call CTR if `crN.LE` | + +The unconditional `bctr`/`bctrl` are by far the most common in Xbox 360 disassembly (compiler-emitted indirect calls and switch dispatch). + +## IBM Reference + +- [AIX 7.3 — `bcctr` (Branch Conditional to Count Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-bcctr-bcctrl-branch-conditional-count-register-instruction) +- [AIX 7.3 — Branch simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-branch-simplified) diff --git a/migration/project-root/ppc-manual/branch/bclrx.md b/migration/project-root/ppc-manual/branch/bclrx.md new file mode 100644 index 0000000..3279453 --- /dev/null +++ b/migration/project-root/ppc-manual/branch/bclrx.md @@ -0,0 +1,180 @@ +# `bclrx` — Branch Conditional to Link Register + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000020` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `bclr` | `bclrx` | — | Branch Conditional to Link Register | +| `bclrl` | `bclrx` | LK=1 | Branch Conditional to Link Register | + +## Syntax + +```asm +bclr[LK] [BO], [BI] +``` + +## Encoding + +### `bclrx` — form `XL` + +- **Opcode word:** `0x4c000020` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `16` +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `LK` | bclrx: read | Link bit. When 1, LR ← address-of-next-instruction before the branch is taken. | +| `BO` | bclrx: read | 5-bit branch options — selects CTR decrement, CTR test polarity, and CR bit test polarity. See `forms/XL.md`. | +| `BI` | bclrx: read | CR bit index (0–31) selected by BO's condition test. | +| `CR` | bclrx: read (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CTR` | bclrx: read (conditional); bclrx: write (conditional) | Count register. Decremented and optionally tested by conditional branches when `BO[2]=0`. | +| `LR` | bclrx: write (conditional) | Link register. Written by `bl`/`bla`/`bcl`/`bclrl`/`bcctrl`; read by `bclr`/`bclrl`. | + +## Register Effects + +### `bclrx` + +- **Reads (always):** `LK`, `BO`, `BI` +- **Reads (conditional):** `CR`, `CTR` +- **Writes (always):** _none_ +- **Writes (conditional):** `CTR`, `LR` + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if ¬BO[2] then CTR <- CTR − 1 +ctr_ok <- BO[2] | ((CTR ≠ 0) XOR BO[3]) +cond_ok <- BO[0] | (CR[BI] ≡ BO[1]) +if ctr_ok & cond_ok then NIA <- LR[0:61] || 0b00 +if LK then LR <- CIA + 4 +``` + +## C Translation Example + +```c +/* bclr/bclrl — branch conditional to LR */ +if (!(insn.BO & 4)) ctr -= 1; +bool ctr_ok = (insn.BO & 4) || ((ctr != 0) ^ !!(insn.BO & 2)); +bool cond_ok = (insn.BO & 16) || (cr_bit(insn.BI) == !!(insn.BO & 8)); +uint32_t next = pc + 4; +if (ctr_ok && cond_ok) pc = lr & ~3u; else pc = next; +if (insn.LK) lr = next; +``` + +## Implementation References + +**`bclrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="bclrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:282`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L282) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:11`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L11) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:711`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L711) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:939-961`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L939-L961) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::bclrx => { + let bo = instr.bo(); + let bi = instr.bi(); + + if bo & 0b00100 == 0 { + ctx.ctr = ctx.ctr.wrapping_sub(1); + } + + let ctr_ok = (bo & 0b00100) != 0 + || (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0)); + let cond_ok = (bo & 0b10000) != 0 + || (ctx.get_cr_bit(bi) == ((bo & 0b01000) != 0)); + + let next_pc = ctx.pc + 4; + if ctr_ok && cond_ok { + ctx.pc = (ctx.lr as u32) & !3; + } else { + ctx.pc = next_pc; + } + if instr.lk() { + ctx.lr = next_pc as u64; + } + } +``` +
+ + + +## BO Encoding (5 bits) + +`BO` controls two independent tests and two "hints". Bit 0 is the MSB. + +| BO (binary) | CTR decrement? | CTR test | CR test | Meaning | +| --- | --- | --- | --- | --- | +| `0000z` | yes | `CTR ≠ 0` | `¬CR[BI]` | decrement, branch if CTR ≠ 0 **and** CR[BI] false | +| `0001z` | yes | `CTR = 0` | `¬CR[BI]` | decrement, branch if CTR = 0 **and** CR[BI] false | +| `001at` | yes | `CTR ≠ 0` / `CTR = 0` | — | decrement, branch on CTR only | +| `0100z` | no | — | `¬CR[BI]` | branch if CR[BI] false | +| `0101z` | no | — | `CR[BI]` | branch if CR[BI] true | +| `011at` | no | — | — | branch always (`z` and `t` are prediction hints) | +| `1z00z` | yes | `CTR ≠ 0` | — | decrement, branch if CTR ≠ 0 | +| `1z01z` | yes | `CTR = 0` | — | decrement, branch if CTR = 0 | +| `1z1zz` | no | — | — | branch always | + +Bit **`BO[0]` = 1** disables the CR test; **`BO[2]` = 1** disables the CTR decrement/test. `BO[1]` and `BO[3]` select the polarity of each test. `BO[4]` is a branch-prediction hint (0 = not taken, 1 = taken; ignored on the Xenon in most cases). + +The most common bclr instance in Xbox 360 disassembly is `BO = 0b10100` → `blr` (branch always to LR), the function epilogue. `BO = 0b01100, BI = 2` → `beqlr` (return if `cr0.EQ`), also common. + +## Special Cases & Edge Conditions + +- **LR alignment mask.** The target address is `LR & ~3` — the low 2 bits are cleared. This silently ignores a misaligned LR; incoming code should always produce 4-byte-aligned LR values. +- **Ordering of CTR decrement and branch.** The CTR is decremented **first**, then compared to zero **after** the decrement. So after `bdnz` at `CTR = 1`, the CTR becomes `0` and the branch is not taken. +- **Self-referential LR write.** `bclrl` writes `LR ← CIA + 4` **before** reading `LR` to set `NIA`. Per the PowerISA, `bclrl` reads the *old* `LR` for the branch target and writes the *new* `LR` with the return address, atomically from software's perspective. Xenia implements it this way (`next_pc` captured first, then `lr` written). +- **Branch prediction hints (`BO[4]`).** The Xenon does static prediction on the basis of these hints, but behaviour is architecturally unobservable. Translators may ignore them. +- **Synchronisation.** `bclr` is **context-synchronising** (hence the `sync` flag in xenia's XML). Translators must ensure side-effecting instructions preceding the branch have committed — trivial in a sequential C translation but relevant for JIT backends. +- **xenia's `LR_HALT_SENTINEL`.** Xenia sets `LR` to `0xBCBCBCBC` at thread start; when the top-level guest function returns via `blr`, the interpreter loop halts cleanly. Translators replicating guest behaviour don't need this — but if you generate a test harness, the sentinel is a convenient "function returned" signal. + +## Related Instructions + +- [`bcctrx`](bcctrx.md) — branch conditional to **CTR** (used by indirect calls / vtables). +- [`bcx`](bcx.md) — branch conditional to an immediate displacement (D-form). +- [`bx`](bx.md) — unconditional branch (I-form). +- [`mtlr`](../control/mtspr.md), [`mflr`](../control/mfspr.md) — set/get LR via `mtspr 8, …` / `mfspr …, 8`. +- [`sc`](sc.md) — system call (alternative control-flow exit). + +## Simplified Mnemonics + +Assemblers fold common `BO`/`BI` patterns to single mnemonics: + +| Simplified | Expansion | +| --- | --- | +| `blr` | `bclr BO=0b10100, BI=0` — branch always to LR | +| `blrl` | `bclrl BO=0b10100, BI=0` — branch always to LR with link (tail-call trampoline) | +| `beqlr crN` | `bclr BO=0b01100, BI=4·N+2` — return if `crN.EQ` | +| `bnelr crN` | `bclr BO=0b00100, BI=4·N+2` — return if `crN.NE` | +| `bltlr crN` | `bclr BO=0b01100, BI=4·N+0` — return if `crN.LT` | +| `bgelr crN` | `bclr BO=0b00100, BI=4·N+0` — return if `crN.GE` | +| `bgtlr crN` | `bclr BO=0b01100, BI=4·N+1` — return if `crN.GT` | +| `blelr crN` | `bclr BO=0b00100, BI=4·N+1` — return if `crN.LE` | + +Xbox 360 disassemblers almost always emit the simplified form; the translation agent should learn to recognise them. + +## IBM Reference + +- [AIX 7.3 — `bclr` (Branch Conditional to Link Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-bclr-bclrl-branch-conditional-link-register-instruction) +- [AIX 7.3 — Branch simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-branch-simplified) diff --git a/migration/project-root/ppc-manual/branch/bcx.md b/migration/project-root/ppc-manual/branch/bcx.md new file mode 100644 index 0000000..0fc880e --- /dev/null +++ b/migration/project-root/ppc-manual/branch/bcx.md @@ -0,0 +1,192 @@ +# `bcx` — Branch Conditional + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [B](../forms/B.md) · **Opcode:** `0x40000000` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `bc` | `bcx` | — | Branch Conditional | +| `bcl` | `bcx` | LK=1 | Branch Conditional | + +## Syntax + +```asm +bc[LK][AA] [BO], [BI], [ADDR] +``` + +## Encoding + +### `bcx` — form `B` + +- **Opcode word:** `0x40000000` +- **Primary opcode (bits 0–5):** `16` +- **Extended opcode:** — +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `BO` | branch options | +| 11–15 | `BI` | CR bit to test | +| 16–29 | `BD` | signed 14-bit word-offset target | +| 30 | `AA` | absolute-address flag | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `LK` | bcx: read | Link bit. When 1, LR ← address-of-next-instruction before the branch is taken. | +| `AA` | bcx: read | Absolute-address bit. When 1, the branch target is the sign-extended displacement itself; when 0, it is added to the current instruction address. | +| `BO` | bcx: read | 5-bit branch options — selects CTR decrement, CTR test polarity, and CR bit test polarity. See `forms/XL.md`. | +| `BI` | bcx: read | CR bit index (0–31) selected by BO's condition test. | +| `ADDR` | bcx: read | Encoded branch target displacement (24-bit for I-form, 14-bit for B-form, word-shifted). | +| `CR` | bcx: read (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CTR` | bcx: read (conditional); bcx: write (conditional) | Count register. Decremented and optionally tested by conditional branches when `BO[2]=0`. | +| `LR` | bcx: write (conditional) | Link register. Written by `bl`/`bla`/`bcl`/`bclrl`/`bcctrl`; read by `bclr`/`bclrl`. | + +## Register Effects + +### `bcx` + +- **Reads (always):** `LK`, `AA`, `BO`, `BI`, `ADDR` +- **Reads (conditional):** `CR`, `CTR` +- **Writes (always):** _none_ +- **Writes (conditional):** `CTR`, `LR` + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +if ¬BO[2] then CTR <- CTR − 1 +ctr_ok <- BO[2] | ((CTR ≠ 0) XOR BO[3]) +cond_ok <- BO[0] | (CR[BI] ≡ BO[1]) +if ctr_ok & cond_ok then NIA <- CIA + EXTS(BD || 0b00) (AA=0) + EXTS(BD || 0b00) (AA=1) +if LK then LR <- CIA + 4 +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`bcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="bcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:173`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L173) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:11`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L11) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:340`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L340) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:908-938`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L908-L938) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::bcx => { + let bo = instr.bo(); + let bi = instr.bi(); + + // Decrement CTR if needed + if bo & 0b00100 == 0 { + ctx.ctr = ctx.ctr.wrapping_sub(1); + } + + let ctr_ok = (bo & 0b00100) != 0 + || (((ctx.ctr as u32) != 0) ^ ((bo & 0b00010) != 0)); + let cond_ok = (bo & 0b10000) != 0 + || (ctx.get_cr_bit(bi) == ((bo & 0b01000) != 0)); + + if ctr_ok && cond_ok { + let target = if instr.aa() { + instr.bd() as u32 + } else { + ctx.pc.wrapping_add(instr.bd() as u32) + }; + if instr.lk() { + ctx.lr = (ctx.pc + 4) as u64; + } + ctx.pc = target; + } else { + if instr.lk() { + ctx.lr = (ctx.pc + 4) as u64; + } + ctx.pc += 4; + } + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **14-bit signed displacement.** `BD` is a 14-bit signed word-count, scaled by 4 — yielding a ±32 KiB byte range (`−2^15 … +2^15 − 4`). For longer-range conditional control flow, compilers emit a short `bc` over an unconditional `b`. +- **CTR decrement happens before the test.** `BO[2]=0` decrements CTR *first*, then `ctr_ok` evaluates against the new value. The classic `bdnz loop` loops `N` times when CTR is initialised to `N`. +- **LR write is unconditional in xenia.** Xenia writes `LR ← CIA + 4` whenever `LK=1`, even on the not-taken path. This matches the PowerISA: `bcl` always sets `LR` regardless of branch outcome — exploited by `bcl 20, 31, $+4` as a self-PC capture (PIC trick). +- **`BO` encoding** — see `bclrx.md` for the full 5-bit table. `bcx` supports the full set, including CTR-only branches (`bdnz`, `bdz`). +- **Branch hint encoding.** PPC overloads `BO[4]` as a static prediction hint: 0 = "predict not taken", 1 = "predict taken". The Xenon honours it for forward branches; backwards conditional branches are predicted taken regardless. Translators may ignore the hint. +- **Synchronisation.** Marked `sync` — like all branches, `bcx` is context-synchronising. Trivial in interpretation; matters for JIT reorder windows. +- **No `Rc`.** B-form has no record bit; the apparent `Rc` operand-table entry under "Status-Register Effects" is N/A here. + +### BO/BI encoding (compact table) + +| BO | Effect | Common simplified | +| --- | --- | --- | +| `0000z` | dec CTR, branch if `CTR≠0` & `¬CR[BI]` | `bdnzf BI, addr` | +| `0001z` | dec CTR, branch if `CTR=0` & `¬CR[BI]` | `bdzf BI, addr` | +| `0010y` | dec CTR, branch if `CTR≠0` | `bdnz addr` | +| `0011y` | dec CTR, branch if `CTR=0` | `bdz addr` | +| `0100z` | branch if `¬CR[BI]` | `bf BI, addr` (or `bne`/`bge`/...) | +| `0101z` | branch if `CR[BI]` | `bt BI, addr` (or `beq`/`blt`/...) | +| `1z1zz` | branch always | `b addr` (prefer plain `b` though) | + +Bit `z` is the prediction hint (`0` = not taken, `1` = taken). + +## Related Instructions + +- [`bx`](bx.md) — unconditional displacement branch (24-bit range). +- [`bclrx`](bclrx.md) — branch conditional to LR (function return). +- [`bcctrx`](bcctrx.md) — branch conditional to CTR (indirect call / dispatch). +- [`crand`](../control/crand.md), [`cror`](../control/cror.md), … — combine multiple CR bits before a single `bc`. +- [`mtctr`](../control/mtspr.md), [`mfctr`](../control/mfspr.md) — set/get loop counter for `bdnz`/`bdz`. +- [`sc`](sc.md) — alternative control-flow exit. + +### Simplified Mnemonics + +The `bc` mnemonic is rarely written directly; assemblers fold most uses into form-specific aliases: + +| Simplified | Expansion | +| --- | --- | +| `beq crN, addr` | `bc 0b01100, 4·N+2, addr` — branch if `crN.EQ` | +| `bne crN, addr` | `bc 0b00100, 4·N+2, addr` — branch if `crN.NE` | +| `blt crN, addr` | `bc 0b01100, 4·N+0, addr` — branch if `crN.LT` | +| `bge crN, addr` | `bc 0b00100, 4·N+0, addr` — branch if `crN.GE` | +| `bgt crN, addr` | `bc 0b01100, 4·N+1, addr` — branch if `crN.GT` | +| `ble crN, addr` | `bc 0b00100, 4·N+1, addr` — branch if `crN.LE` | +| `bso crN, addr` | `bc 0b01100, 4·N+3, addr` — branch on summary overflow | +| `bns crN, addr` | `bc 0b00100, 4·N+3, addr` — branch on no SO | +| `bdnz addr` | `bc 0b10000, 0, addr` — decrement CTR, branch if non-zero | +| `bdz addr` | `bc 0b10010, 0, addr` — decrement CTR, branch if zero | +| `bdnzt BI, addr` | combined CTR + CR test (rare) | + +When `crN` is omitted in disassembly, `cr0` is implied. + +## IBM Reference + +- [AIX 7.3 — `bc` (Branch Conditional)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-bc-branch-conditional-instruction) +- [AIX 7.3 — Branch simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-branch-simplified) diff --git a/migration/project-root/ppc-manual/branch/bx.md b/migration/project-root/ppc-manual/branch/bx.md new file mode 100644 index 0000000..fd781a5 --- /dev/null +++ b/migration/project-root/ppc-manual/branch/bx.md @@ -0,0 +1,133 @@ +# `bx` — Branch + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [I](../forms/I.md) · **Opcode:** `0x48000000` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `b` | `bx` | — | Branch | +| `bl` | `bx` | LK=1 | Branch | + +## Syntax + +```asm +b[LK][AA] [ADDR] +``` + +## Encoding + +### `bx` — form `I` + +- **Opcode word:** `0x48000000` +- **Primary opcode (bits 0–5):** `18` +- **Extended opcode:** — +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–29 | `LI` | signed 24-bit word-offset target | +| 30 | `AA` | absolute-address flag | +| 31 | `LK` | link flag (bl/ba/bla) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `LK` | bx: read | Link bit. When 1, LR ← address-of-next-instruction before the branch is taken. | +| `AA` | bx: read | Absolute-address bit. When 1, the branch target is the sign-extended displacement itself; when 0, it is added to the current instruction address. | +| `ADDR` | bx: read | Encoded branch target displacement (24-bit for I-form, 14-bit for B-form, word-shifted). | +| `LR` | bx: write (conditional) | Link register. Written by `bl`/`bla`/`bcl`/`bclrl`/`bcctrl`; read by `bclr`/`bclrl`. | + +## Register Effects + +### `bx` + +- **Reads (always):** `LK`, `AA`, `ADDR` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** `LR` + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +NIA <- (CIA + EXTS(LI || 0b00)) if AA=0 + <- EXTS(LI || 0b00) if AA=1 +if LK then LR <- CIA + 4 +``` + +## C Translation Example + +```c +/* b / bl / ba / bla — unconditional branch (I-form, primary 18) */ +int32_t li = (int32_t)(insn.LI << 2); /* sign-extended word-offset */ +uint32_t target = insn.AA ? (uint32_t)li : (uint32_t)(pc + li); +uint32_t next = pc + 4; +if (insn.LK) lr = next; /* bl / bla save return addr */ +pc = target; +``` + +## Implementation References + +**`bx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="bx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:154`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L154) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:11`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L11) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:342`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L342) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:897-907`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L897-L907) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::bx => { + let target = if instr.aa() { + instr.li() as u32 + } else { + ctx.pc.wrapping_add(instr.li() as u32) + }; + if instr.lk() { + ctx.lr = (ctx.pc + 4) as u64; + } + ctx.pc = target; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **24-bit word-aligned target.** `LI` is a 24-bit signed word-count. Hardware concatenates `LI || 0b00` (adds the implicit two low zero bits) and sign-extends to 64 bits before using it as an address. The displacement range is therefore ±32 MiB in bytes (`−2^25 … +2^25 − 4`). +- **Four mnemonics, one opcode.** The four runtime variants selected by `AA` and `LK`: + - `b` — `AA = 0, LK = 0` — PC-relative, no link. + - `bl` — `AA = 0, LK = 1` — PC-relative, `LR = CIA + 4` (the ubiquitous function-call primitive). + - `ba` — `AA = 1, LK = 0` — absolute, no link. + - `bla` — `AA = 1, LK = 1` — absolute, link. + Xbox 360 code almost exclusively uses `b` and `bl`; `ba` / `bla` appear only in kernel / firmware stubs. +- **Target alignment.** `LI` is scaled by 4, so all targets are 4-byte aligned by construction. There is no low-bit encoding of ARM-style Thumb — PPC has one instruction width. +- **LR write is *before* the branch.** In `bl`, `LR` receives `CIA + 4` (the address of the instruction *after* the branch) before execution transfers to the target. Nested calls naturally overwrite LR; callees must spill it ([`mflr`](../control/mfspr.md) + `std`) before making their own `bl`. +- **Indirect tail calls.** A tail call to an indirect target is encoded as `mtctr` + `bctr` (see [`bcctrx`](bcctrx.md)), not `bx` — `bx` has no register-based form. +- **No condition test.** Use [`bcx`](bcx.md) for conditional displacement branches or [`bclrx`](bclrx.md) / [`bcctrx`](bcctrx.md) for conditional LR/CTR jumps. +- **Speculative execution.** The Xenon fetches past `bx`; translators that mask control flow must treat the target as a single-destination control transfer. + +## Related Instructions + +- [`bcx`](bcx.md) — conditional displacement branch (B-form, ±32 KiB range). +- [`bclrx`](bclrx.md), [`bcctrx`](bcctrx.md) — branch to LR / CTR, conditional and unconditional. +- [`mtlr`](../control/mtspr.md), [`mflr`](../control/mfspr.md) — LR save/restore for nested `bl` calls. +- [`sc`](sc.md) — system call; an alternative control-flow exit to the kernel. + +## Simplified Mnemonics + +Assemblers emit `b`, `bl`, `ba`, `bla` for the four runtime combinations. There is no further simplification. + +## IBM Reference + +- [AIX 7.3 — `b` (Branch)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-b-branch-instruction) +- [AIX 7.3 — `bl` (Branch with Link, simplified mnemonic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-branch-simplified) diff --git a/migration/project-root/ppc-manual/branch/sc.md b/migration/project-root/ppc-manual/branch/sc.md new file mode 100644 index 0000000..1ec660a --- /dev/null +++ b/migration/project-root/ppc-manual/branch/sc.md @@ -0,0 +1,129 @@ +# `sc` — System Call + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [SC](../forms/SC.md) · **Opcode:** `0x44000002` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sc` | `sc` | — | System Call | + +## Syntax + +```asm +sc [LEV] +``` + +## Encoding + +### `sc` — form `SC` + +- **Opcode word:** `0x44000002` +- **Primary opcode (bits 0–5):** `17` +- **Extended opcode:** — +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (17) | +| 6–19 | `—` | reserved | +| 20–26 | `LEV` | exception level | +| 27–29 | `—` | reserved | +| 30 | `1` | fixed 1 | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `LEV` | sc: read | System-call exception level (for `sc`). | + +## Register Effects + +### `sc` + +- **Reads (always):** `LEV` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +system_call_exception(LEV) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sc`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sc"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:455`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L455) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:63`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L63) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:341`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L341) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:984-997`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L984-L997) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sc => { + // PPCBUG-064: log non-zero LEV (`sc 2` is the Xbox 360 hypervisor-call + // convention; canary dispatches it to a different handler than `sc 0`). + // Routing LEV=2 requires a StepResult variant extension; deferred. + let lev = (instr.raw >> 5) & 0x7F; + if lev != 0 { + tracing::warn!( + "sc with LEV={} at {:#010x}: dispatched as plain SystemCall (HVcall routing not implemented)", + lev, ctx.pc + ); + } + ctx.pc += 4; + return StepResult::SystemCall; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`LEV` field — kernel vs hypervisor.** The 7-bit `LEV` operand selects the privilege level of the syscall: + - `LEV = 0` — supervisor (kernel) syscall. Standard application → kernel transition; targets the `0xC00` system-call vector. + - `LEV = 1` — reserved. + - `LEV = 2` — **hypervisor syscall** (`HVcall`). On the Xenon, `sc 2` traps to the Xbox 360 hypervisor; this is how the kernel itself talks to the supervisor below it (e.g., for security operations, encrypted-memory accesses, page table updates). +- **`sc` as written by titles.** Almost all guest game code uses `LEV = 0` to call `XboxKrnl.exe`. Game disassembly will show large jump tables of small thunks each ending in `li r0, syscall_no; sc; blr`. +- **No condition or status side effects.** `sc` updates *no* general-purpose register on entry — neither LR nor CR. The kernel sees the GPR/FPR snapshot as-is and reads the syscall number out of `r0` (Xbox 360 ABI convention, not architectural). +- **Return path.** Hardware returns from `sc` via [`rfid`](../control/mtmsrd.md)-class instructions in the kernel handler; from the application's perspective execution resumes at `CIA + 4`. Xenia's interpreter realises this by simply pre-incrementing `pc` then returning `StepResult::SystemCall` — the host driver dispatches the syscall and re-enters the loop. +- **xenia divergence vs hardware.** xenia-rs *does not* model the `0xC00` exception vector or save SRR0/SRR1; the `LEV` operand is currently ignored. All `sc` instructions are treated identically and serviced by the host. This is sufficient because Xbox 360 titles don't observe SRR registers and the host kernel is implemented natively. +- **Synchronisation.** Marked `sync` in xenia's XML — `sc` is context-synchronising (hardware completes all prior instructions before raising the exception). JITs must flush pending state before emitting the host call. +- **Reserved bits.** Bit 30 is fixed `1`; bits 6–19 and 27–29 are reserved (must be 0). The 1-bit field at position 30 distinguishes the `sc` encoding from `scv` (later PowerISA addition, not present on the Xenon). + +## Related Instructions + +- [`bx`](bx.md), [`bcx`](bcx.md), [`bclrx`](bclrx.md), [`bcctrx`](bcctrx.md) — ordinary control flow alternatives. +- [`tw`](tw.md), [`twi`](twi.md), [`td`](td.md), [`tdi`](tdi.md) — synchronous *trap* exceptions; another way to enter the kernel. +- [`mtmsr`](../control/mtmsr.md), [`mtmsrd`](../control/mtmsrd.md) — machine-state changes used by the kernel's `sc` handler on return (`rfid`/`hrfid` chain not separately documented in this manual). +- [`isync`](../control/mtmsr.md) — context-synchronising sibling; `sc` itself implies an isync-like fence. + +## IBM Reference + +- [AIX 7.3 — `sc` (System Call)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sc-system-call-instruction) +- PowerISA v2.07B, Book III §7 — System Linkage interrupt definitions and `LEV` field semantics. diff --git a/migration/project-root/ppc-manual/branch/td.md b/migration/project-root/ppc-manual/branch/td.md new file mode 100644 index 0000000..8a23ccd --- /dev/null +++ b/migration/project-root/ppc-manual/branch/td.md @@ -0,0 +1,184 @@ +# `td` — Trap Doubleword + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000088` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `td` | `td` | — | Trap Doubleword | + +## Syntax + +```asm +td [TO], [RA], [RB] +``` + +## Encoding + +### `td` — form `X` + +- **Opcode word:** `0x7c000088` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `68` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `TO` | td: read | Trap-on condition mask (5 bits) — LT, GT, EQ, LGT, LLT bits. | +| `RA` | td: read | Source GPR (`r0`–`r31`). | +| `RB` | td: read | Source GPR. | + +## Register Effects + +### `td` + +- **Reads (always):** `TO`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`td`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="td"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:552`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L552) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:87`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L87) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:769`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L769) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1762-1796`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1762-L1796) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::tw | PpcOpcode::twi | PpcOpcode::td | PpcOpcode::tdi => { + // PPCBUG-063: save CIA before incrementing so a trap handler reads + // the faulting instruction address, not CIA+4. + // PPCBUG-065: log the SIMM type code on `twi 31, r0, IMM` (Xbox 360 + // typed-trap convention used by the CRT/kernel for C++ exception + // class dispatch). The audit notes this is relevant to the Sylpheed + // throw investigation; routing the type code via a payload requires + // a StepResult enum extension that's deferred for now. + let trap_pc = ctx.pc; + let a = ctx.gpr[instr.ra()]; + let b = match instr.opcode { + PpcOpcode::twi | PpcOpcode::tdi => instr.simm16() as i64 as u64, + _ => ctx.gpr[instr.rb()], + }; + let width = match instr.opcode { + PpcOpcode::tw | PpcOpcode::twi => trap::TrapWidth::Word, + _ => trap::TrapWidth::Doubleword, + }; + let fired = trap::evaluate(instr.to(), a, b, width); + if fired { + let typed_trap_simm = if matches!(instr.opcode, PpcOpcode::twi) + && instr.to() == 31 && instr.ra() == 0 { + Some(instr.simm16() as u16) + } else { None }; + tracing::warn!( + "Trap fired at {:#010x}: {:?} TO={} a={:#x} b={:#x}{}", + trap_pc, instr.opcode, instr.to(), a, b, + typed_trap_simm.map_or(String::new(), |t| format!(" typed_trap_simm={:#06x}", t)) + ); + // Leave ctx.pc at CIA (NOT NIA) so trap handlers / SEH delivery + // can read the faulting instruction address from ctx.pc. + return StepResult::Trap; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`TO` mask encoding (5 bits, MSB-first).** Each bit selects one comparison; the trap fires if *any* selected condition is true. Both operands are treated as 64-bit doublewords: + + | Bit | Mnemonic | Triggered when | + | --- | --- | --- | + | `TO[0]` (16) | LT | `(int64) RA < (int64) RB` | + | `TO[1]` (8) | GT | `(int64) RA > (int64) RB` | + | `TO[2]` (4) | EQ | `RA == RB` | + | `TO[3]` (2) | LGT | `(uint64) RA < (uint64) RB` (logical less) | + | `TO[4]` (1) | LLT | `(uint64) RA > (uint64) RB` (logical greater — historical naming) | + +- **`TO = 31` is unconditional trap.** All five bits set ⇒ the trap fires regardless of operand values; the simplified mnemonic `trap` is `tw 31, 0, 0` for words and **`td 31, 0, 0`** for doublewords. The PowerISA also uses `tdi 31, 0, 0` (or `twi`) as a debugger break. +- **64-bit comparison.** Unlike [`tw`](tw.md), `td` always compares the full 64-bit GPRs. On the Xenon (64-bit) this is meaningful; PPC32 implementations don't have `td`. +- **No register effects.** Only the side effect is the trap. No CR/LR/CTR/XER updates. +- **Hardware behaviour.** When the trap fires, hardware raises a Program interrupt with `SRR1[TRAP]` set and vectors to `0x700`. The Xbox 360 hypervisor / kernel handles it (assertion failure, debugger trap, etc.). +- **xenia simplification.** xenia-rs collapses all four trap variants (`td`, `tdi`, `tw`, `twi`) into one match arm that *unconditionally* logs and returns `StepResult::Trap` — it does **not** evaluate `TO` against the operands. This is a material divergence from the spec: in xenia every trap fires even if the condition is false. Real Xenon code rarely uses non-trivial `TO` masks (typical use is the unconditional `trap` for `__assert` / debugger break), so the divergence is normally invisible. +- **Distinguishing assert vs. break.** Compilers commonly emit `tdne r3, r3` (impossible) or `tdi 0, r0, 0` patterns that *cannot* trap as inert markers. Xenia's blanket trap would mis-fire on these — a small known bug; track it if you see spurious traps. + +## Related Instructions + +- [`tdi`](tdi.md) — same condition test against a 16-bit signed immediate (D-form). +- [`tw`](tw.md) / [`twi`](twi.md) — 32-bit (word) variants for comparing low halves of GPRs. +- [`sc`](sc.md) — synchronous kernel entry via system-call exception (different vector, different intent). +- [`mtmsr`](../control/mtmsr.md) — kernel returns from the trap handler via `rfid`-family instructions. + +### Simplified Mnemonics + +| Simplified | Expansion | Triggered when | +| --- | --- | --- | +| `td RA, RB` (=`tdu`) | `td 31, RA, RB` | unconditional trap | +| `tdeq RA, RB` | `td 4, RA, RB` | `RA == RB` | +| `tdne RA, RB` | `td 24, RA, RB` | `RA != RB` | +| `tdlt RA, RB` | `td 16, RA, RB` | signed less than | +| `tdle RA, RB` | `td 20, RA, RB` | signed less or equal | +| `tdgt RA, RB` | `td 8, RA, RB` | signed greater than | +| `tdge RA, RB` | `td 12, RA, RB` | signed greater or equal | +| `tdllt RA, RB` | `td 2, RA, RB` | unsigned less than | +| `tdlge RA, RB` | `td 5, RA, RB` | unsigned greater or equal | +| `tdlgt RA, RB` | `td 1, RA, RB` | unsigned greater than | +| `tdlle RA, RB` | `td 6, RA, RB` | unsigned less or equal | + +## IBM Reference + +- [AIX 7.3 — `td` (Trap Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-td-trap-doubleword-instruction) +- [AIX 7.3 — Trap simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-trap-simplified) +- PowerISA v2.07B, Book I §3.3.11 — fixed-point trap instructions (`TO` semantics). diff --git a/migration/project-root/ppc-manual/branch/tdi.md b/migration/project-root/ppc-manual/branch/tdi.md new file mode 100644 index 0000000..77f10da --- /dev/null +++ b/migration/project-root/ppc-manual/branch/tdi.md @@ -0,0 +1,173 @@ +# `tdi` — Trap Doubleword Immediate + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x08000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `tdi` | `tdi` | — | Trap Doubleword Immediate | + +## Syntax + +```asm +tdi [TO], [RA], [SIMM] +``` + +## Encoding + +### `tdi` — form `D` + +- **Opcode word:** `0x08000000` +- **Primary opcode (bits 0–5):** `2` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `TO` | tdi: read | Trap-on condition mask (5 bits) — LT, GT, EQ, LGT, LLT bits. | +| `RA` | tdi: read | Source GPR (`r0`–`r31`). | +| `SIMM` | tdi: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | + +## Register Effects + +### `tdi` + +- **Reads (always):** `TO`, `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`tdi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="tdi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:568`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L568) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:87`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L87) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:327`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L327) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1762-1796`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1762-L1796) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::tw | PpcOpcode::twi | PpcOpcode::td | PpcOpcode::tdi => { + // PPCBUG-063: save CIA before incrementing so a trap handler reads + // the faulting instruction address, not CIA+4. + // PPCBUG-065: log the SIMM type code on `twi 31, r0, IMM` (Xbox 360 + // typed-trap convention used by the CRT/kernel for C++ exception + // class dispatch). The audit notes this is relevant to the Sylpheed + // throw investigation; routing the type code via a payload requires + // a StepResult enum extension that's deferred for now. + let trap_pc = ctx.pc; + let a = ctx.gpr[instr.ra()]; + let b = match instr.opcode { + PpcOpcode::twi | PpcOpcode::tdi => instr.simm16() as i64 as u64, + _ => ctx.gpr[instr.rb()], + }; + let width = match instr.opcode { + PpcOpcode::tw | PpcOpcode::twi => trap::TrapWidth::Word, + _ => trap::TrapWidth::Doubleword, + }; + let fired = trap::evaluate(instr.to(), a, b, width); + if fired { + let typed_trap_simm = if matches!(instr.opcode, PpcOpcode::twi) + && instr.to() == 31 && instr.ra() == 0 { + Some(instr.simm16() as u16) + } else { None }; + tracing::warn!( + "Trap fired at {:#010x}: {:?} TO={} a={:#x} b={:#x}{}", + trap_pc, instr.opcode, instr.to(), a, b, + typed_trap_simm.map_or(String::new(), |t| format!(" typed_trap_simm={:#06x}", t)) + ); + // Leave ctx.pc at CIA (NOT NIA) so trap handlers / SEH delivery + // can read the faulting instruction address from ctx.pc. + return StepResult::Trap; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Immediate is sign-extended.** `SIMM` is treated as a 16-bit signed value, then sign-extended to 64 bits before comparison. To trap against a small unsigned constant, the same encoding works because both signed and unsigned interpretations agree for `SIMM ∈ [0, 0x7FFF]`. +- **`TO` mask.** Identical bit layout to [`td`](td.md): bit 0 = signed LT, 1 = signed GT, 2 = EQ, 3 = unsigned LT (LGT), 4 = unsigned GT (LLT). Trap fires if any selected bit's condition is true. +- **`TO = 31` is unconditional.** `tdi 31, 0, 0` is a debugger / assert trap. Compilers sometimes use it as a "should not reach" marker. +- **64-bit comparison only.** Unlike [`twi`](twi.md), `tdi` always compares the full 64-bit GPR — it has no PPC32 analogue. The Xenon's PPC64 mode makes this meaningful. +- **No register effects.** Pure side effect on success: Program interrupt → vector `0x700` with `SRR1[TRAP]=1`. +- **xenia simplification.** xenia-rs unconditionally treats `tdi` as a fired trap, regardless of `TO`/`RA`/`SIMM` values. This diverges from the spec — real hardware would silently fall through when no `TO` bit's condition holds. Most title code uses only the unconditional `trap` form, so the divergence is normally invisible; non-firing assertion patterns (e.g. `tdi 0, r0, 0`) will mis-fire under xenia. +- **Reserved bits.** Bits 6–10 carry the `TO` field; there is no `Rc` / `OE` on D-form trap immediates. + +## Related Instructions + +- [`td`](td.md) — register-register doubleword trap (X-form). +- [`twi`](twi.md) / [`tw`](tw.md) — 32-bit-comparison siblings. +- [`sc`](sc.md) — kernel-entry counterpart via system call exception. +- [`mtmsrd`](mtmsrd.md) (control category) — kernel `rfid`-style return path after handling. + +### Simplified Mnemonics + +| Simplified | Expansion | Triggered when | +| --- | --- | --- | +| `tdi RA, value` | `tdi 31, RA, value` | unconditional trap | +| `tdeqi RA, value` | `tdi 4, RA, value` | `RA == EXTS(value)` | +| `tdnei RA, value` | `tdi 24, RA, value` | `RA != EXTS(value)` | +| `tdlti RA, value` | `tdi 16, RA, value` | signed less than | +| `tdlei RA, value` | `tdi 20, RA, value` | signed less or equal | +| `tdgti RA, value` | `tdi 8, RA, value` | signed greater than | +| `tdgei RA, value` | `tdi 12, RA, value` | signed greater or equal | +| `tdllti RA, value` | `tdi 2, RA, value` | unsigned less than | +| `tdlgei RA, value` | `tdi 5, RA, value` | unsigned greater or equal | +| `tdlgti RA, value` | `tdi 1, RA, value` | unsigned greater than | +| `tdllei RA, value` | `tdi 6, RA, value` | unsigned less or equal | + +## IBM Reference + +- [AIX 7.3 — `tdi` (Trap Doubleword Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-tdi-trap-doubleword-immediate-instruction) +- [AIX 7.3 — Trap simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-trap-simplified) +- PowerISA v2.07B, Book I §3.3.11 — fixed-point trap instructions. diff --git a/migration/project-root/ppc-manual/branch/tw.md b/migration/project-root/ppc-manual/branch/tw.md new file mode 100644 index 0000000..60b6281 --- /dev/null +++ b/migration/project-root/ppc-manual/branch/tw.md @@ -0,0 +1,184 @@ +# `tw` — Trap Word + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000008` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `tw` | `tw` | — | Trap Word | + +## Syntax + +```asm +tw [TO], [RA], [RB] +``` + +## Encoding + +### `tw` — form `X` + +- **Opcode word:** `0x7c000008` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `4` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `TO` | tw: read | Trap-on condition mask (5 bits) — LT, GT, EQ, LGT, LLT bits. | +| `RA` | tw: read | Source GPR (`r0`–`r31`). | +| `RB` | tw: read | Source GPR. | + +## Register Effects + +### `tw` + +- **Reads (always):** `TO`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`tw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="tw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:583`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L583) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:87`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L87) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:750`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L750) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1762-1796`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1762-L1796) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::tw | PpcOpcode::twi | PpcOpcode::td | PpcOpcode::tdi => { + // PPCBUG-063: save CIA before incrementing so a trap handler reads + // the faulting instruction address, not CIA+4. + // PPCBUG-065: log the SIMM type code on `twi 31, r0, IMM` (Xbox 360 + // typed-trap convention used by the CRT/kernel for C++ exception + // class dispatch). The audit notes this is relevant to the Sylpheed + // throw investigation; routing the type code via a payload requires + // a StepResult enum extension that's deferred for now. + let trap_pc = ctx.pc; + let a = ctx.gpr[instr.ra()]; + let b = match instr.opcode { + PpcOpcode::twi | PpcOpcode::tdi => instr.simm16() as i64 as u64, + _ => ctx.gpr[instr.rb()], + }; + let width = match instr.opcode { + PpcOpcode::tw | PpcOpcode::twi => trap::TrapWidth::Word, + _ => trap::TrapWidth::Doubleword, + }; + let fired = trap::evaluate(instr.to(), a, b, width); + if fired { + let typed_trap_simm = if matches!(instr.opcode, PpcOpcode::twi) + && instr.to() == 31 && instr.ra() == 0 { + Some(instr.simm16() as u16) + } else { None }; + tracing::warn!( + "Trap fired at {:#010x}: {:?} TO={} a={:#x} b={:#x}{}", + trap_pc, instr.opcode, instr.to(), a, b, + typed_trap_simm.map_or(String::new(), |t| format!(" typed_trap_simm={:#06x}", t)) + ); + // Leave ctx.pc at CIA (NOT NIA) so trap handlers / SEH delivery + // can read the faulting instruction address from ctx.pc. + return StepResult::Trap; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit comparison only.** `tw` compares the *low 32 bits* of `RA` and `RB`. The high halves of the 64-bit GPRs on the Xenon are ignored. Use [`td`](td.md) for full 64-bit comparisons. +- **`TO` mask (5 bits, MSB-first).** Same encoding as the doubleword form: + + | Bit | Mnemonic | Triggered when | + | --- | --- | --- | + | `TO[0]` (16) | LT | `(int32) RA < (int32) RB` | + | `TO[1]` (8) | GT | `(int32) RA > (int32) RB` | + | `TO[2]` (4) | EQ | `(uint32) RA == (uint32) RB` | + | `TO[3]` (2) | LGT | `(uint32) RA < (uint32) RB` | + | `TO[4]` (1) | LLT | `(uint32) RA > (uint32) RB` | + +- **`tw 31, 0, 0` is `trap`.** The simplified mnemonic `trap` expands to `tw 31, r0, r0` — all five `TO` bits set ⇒ unconditional trap. Compilers and the kernel use this as the assertion / debugger break primitive; it appears as `0x7FE00008` in raw bytes. +- **Conditional asserts.** GCC's `__builtin_trap` and MSVC's `__assert` macros emit `tw` variants like `twge`/`twlt` to fault on bound-check failures. +- **No register effects.** Side effect only: Program interrupt (`0x700`) with `SRR1[TRAP]=1`. +- **xenia simplification.** xenia-rs collapses `td/tdi/tw/twi` into a single arm that *unconditionally* logs and returns `StepResult::Trap` — the `TO` operand is **not evaluated**. Real hardware would silently fall through when no `TO` bit's condition holds. In practice titles use mostly the unconditional `trap`, so the divergence rarely manifests, but inert-marker patterns like `tw 0, r0, r0` will fire under xenia. +- **Inert encoding.** `tw 0, r0, r0` (no `TO` bits set) can never trap on real hardware. It encodes as `0x7C000008` — sometimes used as a structured-NOP marker. Watch for it in xenia traces. + +## Related Instructions + +- [`twi`](twi.md) — same 32-bit comparison against a 16-bit signed immediate. +- [`td`](td.md) / [`tdi`](tdi.md) — 64-bit (doubleword) variants. +- [`sc`](sc.md) — kernel entry via system-call exception (different vector). +- [`mtmsr`](../control/mtmsr.md) — kernel returns from `0x700` via `rfid`. + +### Simplified Mnemonics + +| Simplified | Expansion | Triggered when | +| --- | --- | --- | +| `trap` | `tw 31, 0, 0` | unconditional | +| `tweq RA, RB` | `tw 4, RA, RB` | `RA == RB` | +| `twne RA, RB` | `tw 24, RA, RB` | `RA != RB` | +| `twlt RA, RB` | `tw 16, RA, RB` | signed less than | +| `twle RA, RB` | `tw 20, RA, RB` | signed less or equal | +| `twgt RA, RB` | `tw 8, RA, RB` | signed greater than | +| `twge RA, RB` | `tw 12, RA, RB` | signed greater or equal | +| `twllt RA, RB` | `tw 2, RA, RB` | unsigned less than | +| `twlge RA, RB` | `tw 5, RA, RB` | unsigned greater or equal | +| `twlgt RA, RB` | `tw 1, RA, RB` | unsigned greater than | +| `twlle RA, RB` | `tw 6, RA, RB` | unsigned less or equal | + +## IBM Reference + +- [AIX 7.3 — `tw` (Trap Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-tw-trap-word-instruction) +- [AIX 7.3 — Trap simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-trap-simplified) +- PowerISA v2.07B, Book I §3.3.11 — fixed-point trap instructions. diff --git a/migration/project-root/ppc-manual/branch/twi.md b/migration/project-root/ppc-manual/branch/twi.md new file mode 100644 index 0000000..7a793fb --- /dev/null +++ b/migration/project-root/ppc-manual/branch/twi.md @@ -0,0 +1,172 @@ +# `twi` — Trap Word Immediate + +> **Category:** [Branch & System](../categories/branch.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x0c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `twi` | `twi` | — | Trap Word Immediate | + +## Syntax + +```asm +tw [TO], [RA], [SIMM] +``` + +## Encoding + +### `twi` — form `D` + +- **Opcode word:** `0x0c000000` +- **Primary opcode (bits 0–5):** `3` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `TO` | twi: read | Trap-on condition mask (5 bits) — LT, GT, EQ, LGT, LLT bits. | +| `RA` | twi: read | Source GPR (`r0`–`r31`). | +| `SIMM` | twi: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | + +## Register Effects + +### `twi` + +- **Reads (always):** `TO`, `RA`, `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`twi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="twi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:601`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L601) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:87`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L87) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:328`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L328) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1762-1796`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1762-L1796) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::tw | PpcOpcode::twi | PpcOpcode::td | PpcOpcode::tdi => { + // PPCBUG-063: save CIA before incrementing so a trap handler reads + // the faulting instruction address, not CIA+4. + // PPCBUG-065: log the SIMM type code on `twi 31, r0, IMM` (Xbox 360 + // typed-trap convention used by the CRT/kernel for C++ exception + // class dispatch). The audit notes this is relevant to the Sylpheed + // throw investigation; routing the type code via a payload requires + // a StepResult enum extension that's deferred for now. + let trap_pc = ctx.pc; + let a = ctx.gpr[instr.ra()]; + let b = match instr.opcode { + PpcOpcode::twi | PpcOpcode::tdi => instr.simm16() as i64 as u64, + _ => ctx.gpr[instr.rb()], + }; + let width = match instr.opcode { + PpcOpcode::tw | PpcOpcode::twi => trap::TrapWidth::Word, + _ => trap::TrapWidth::Doubleword, + }; + let fired = trap::evaluate(instr.to(), a, b, width); + if fired { + let typed_trap_simm = if matches!(instr.opcode, PpcOpcode::twi) + && instr.to() == 31 && instr.ra() == 0 { + Some(instr.simm16() as u16) + } else { None }; + tracing::warn!( + "Trap fired at {:#010x}: {:?} TO={} a={:#x} b={:#x}{}", + trap_pc, instr.opcode, instr.to(), a, b, + typed_trap_simm.map_or(String::new(), |t| format!(" typed_trap_simm={:#06x}", t)) + ); + // Leave ctx.pc at CIA (NOT NIA) so trap handlers / SEH delivery + // can read the faulting instruction address from ctx.pc. + return StepResult::Trap; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **32-bit comparison against sign-extended immediate.** `SIMM` is a 16-bit signed value, sign-extended to 32 bits, then compared against the low 32 bits of `RA`. The high half of `RA` is *ignored*. +- **`TO` mask.** Identical to [`tw`](tw.md): bit 0 = signed LT, 1 = signed GT, 2 = EQ, 3 = unsigned LT (LGT), 4 = unsigned GT (LLT). Trap fires if any selected bit's condition is true. +- **`twi 31, 0, 0` is unconditional trap.** All `TO` bits set ⇒ guaranteed trap. The simplified mnemonic family (`twnei`, `twgei`, …) is much more common in real code: bound checks, null checks, integer-divide-by-zero pre-checks. +- **Compiler usage.** Xbox 360 GCC emits `twnei rN, -1` and similar to validate handle-style return values; the kernel handler turns the trap into an exception delivered to the title. +- **No register effects.** Side effect: Program interrupt → vector `0x700` with `SRR1[TRAP]=1`. +- **xenia simplification.** Same as the other three trap forms — xenia-rs unconditionally returns `StepResult::Trap` whenever it decodes any of `tdi`/`twi`/`td`/`tw`, regardless of the `TO` mask or operands. This means `twi 0, r0, 0` (architecturally a guaranteed-no-trap encoding) will spuriously fire under xenia. Keep this in mind when triaging unexpected trap signals. +- **No `Rc` / `OE`.** D-form trap immediates have neither. + +## Related Instructions + +- [`tw`](tw.md) — register-register 32-bit trap (X-form). +- [`tdi`](tdi.md) / [`td`](td.md) — 64-bit (doubleword) siblings. +- [`sc`](sc.md) — alternative synchronous kernel entry. +- [`cmpi`](../alu/cmpi.md), [`cmpli`](../alu/cmpli.md) — set CR for a subsequent [`bcx`](bcx.md) when you want a regular branch instead of a trap. + +### Simplified Mnemonics + +| Simplified | Expansion | Triggered when | +| --- | --- | --- | +| `tweqi RA, value` | `twi 4, RA, value` | `RA == EXTS(value)` | +| `twnei RA, value` | `twi 24, RA, value` | `RA != EXTS(value)` | +| `twlti RA, value` | `twi 16, RA, value` | signed less than | +| `twlei RA, value` | `twi 20, RA, value` | signed less or equal | +| `twgti RA, value` | `twi 8, RA, value` | signed greater than | +| `twgei RA, value` | `twi 12, RA, value` | signed greater or equal | +| `twllti RA, value` | `twi 2, RA, value` | unsigned less than | +| `twlgei RA, value` | `twi 5, RA, value` | unsigned greater or equal | +| `twlgti RA, value` | `twi 1, RA, value` | unsigned greater than | +| `twllei RA, value` | `twi 6, RA, value` | unsigned less or equal | + +## IBM Reference + +- [AIX 7.3 — `twi` (Trap Word Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-twi-trap-word-immediate-instruction) +- [AIX 7.3 — Trap simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-trap-simplified) +- PowerISA v2.07B, Book I §3.3.11 — fixed-point trap instructions. diff --git a/migration/project-root/ppc-manual/categories/alu.md b/migration/project-root/ppc-manual/categories/alu.md new file mode 100644 index 0000000..e730e3b --- /dev/null +++ b/migration/project-root/ppc-manual/categories/alu.md @@ -0,0 +1,82 @@ +# Integer ALU + +Fixed-point add/sub/multiply/divide, logical, rotate, shift, compare, count-leading-zeros, sign-extension, trap-on-condition. + +**70 families** · **70 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`addcx`](addcx.md) | `XO` | Add Carrying | `addcx` | +| [`addex`](addex.md) | `XO` | Add Extended | `addex` | +| [`addi`](addi.md) | `D` | Add Immediate | `addi` | +| [`addic`](addic.md) | `D` | Add Immediate Carrying | `addic` | +| [`addic.`](addicx.md) | `D` | Add Immediate Carrying and Record | `addic.` | +| [`addis`](addis.md) | `D` | Add Immediate Shifted | `addis` | +| [`addmex`](addmex.md) | `XO` | Add to Minus One Extended | `addmex` | +| [`addx`](addx.md) | `XO` | Add | `addx` | +| [`addzex`](addzex.md) | `XO` | Add to Zero Extended | `addzex` | +| [`andcx`](andcx.md) | `X` | AND with Complement | `andcx` | +| [`andi.`](andix.md) | `D` | AND Immediate | `andi.` | +| [`andis.`](andisx.md) | `D` | AND Immediate Shifted | `andis.` | +| [`andx`](andx.md) | `X` | AND | `andx` | +| [`cmp`](cmp.md) | `X` | Compare | `cmp` | +| [`cmpi`](cmpi.md) | `D` | Compare Immediate | `cmpi` | +| [`cmpl`](cmpl.md) | `X` | Compare Logical | `cmpl` | +| [`cmpli`](cmpli.md) | `D` | Compare Logical Immediate | `cmpli` | +| [`cntlzdx`](cntlzdx.md) | `X` | Count Leading Zeros Doubleword | `cntlzdx` | +| [`cntlzwx`](cntlzwx.md) | `X` | Count Leading Zeros Word | `cntlzwx` | +| [`divdux`](divdux.md) | `XO` | Divide Doubleword Unsigned | `divdux` | +| [`divdx`](divdx.md) | `XO` | Divide Doubleword | `divdx` | +| [`divwux`](divwux.md) | `XO` | Divide Word Unsigned | `divwux` | +| [`divwx`](divwx.md) | `XO` | Divide Word | `divwx` | +| [`eieio`](eieio.md) | `X` | Enforce In-Order Execution of I/O | `eieio` | +| [`eqvx`](eqvx.md) | `X` | Equivalent | `eqvx` | +| [`extsbx`](extsbx.md) | `X` | Extend Sign Byte | `extsbx` | +| [`extshx`](extshx.md) | `X` | Extend Sign Half Word | `extshx` | +| [`extswx`](extswx.md) | `X` | Extend Sign Word | `extswx` | +| [`isync`](isync.md) | `XL` | Instruction Synchronize | `isync` | +| [`mulhdux`](mulhdux.md) | `XO` | Multiply High Doubleword Unsigned | `mulhdux` | +| [`mulhdx`](mulhdx.md) | `XO` | Multiply High Doubleword | `mulhdx` | +| [`mulhwux`](mulhwux.md) | `XO` | Multiply High Word Unsigned | `mulhwux` | +| [`mulhwx`](mulhwx.md) | `XO` | Multiply High Word | `mulhwx` | +| [`mulldx`](mulldx.md) | `XO` | Multiply Low Doubleword | `mulldx` | +| [`mulli`](mulli.md) | `D` | Multiply Low Immediate | `mulli` | +| [`mullwx`](mullwx.md) | `XO` | Multiply Low Word | `mullwx` | +| [`nandx`](nandx.md) | `X` | NAND | `nandx` | +| [`negx`](negx.md) | `XO` | Negate | `negx` | +| [`norx`](norx.md) | `X` | NOR | `norx` | +| [`orcx`](orcx.md) | `X` | OR with Complement | `orcx` | +| [`ori`](ori.md) | `D` | OR Immediate | `ori` | +| [`oris`](oris.md) | `D` | OR Immediate Shifted | `oris` | +| [`orx`](orx.md) | `X` | OR | `orx` | +| [`rldclx`](rldclx.md) | `MDS` | Rotate Left Doubleword then Clear Left | `rldclx` | +| [`rldcrx`](rldcrx.md) | `MDS` | Rotate Left Doubleword then Clear Right | `rldcrx` | +| [`rldiclx`](rldiclx.md) | `MD` | Rotate Left Doubleword Immediate then Clear Left | `rldiclx` | +| [`rldicrx`](rldicrx.md) | `MD` | Rotate Left Doubleword Immediate then Clear Right | `rldicrx` | +| [`rldicx`](rldicx.md) | `MD` | Rotate Left Doubleword Immediate then Clear | `rldicx` | +| [`rldimix`](rldimix.md) | `MD` | Rotate Left Doubleword Immediate then Mask Insert | `rldimix` | +| [`rlwimix`](rlwimix.md) | `M` | Rotate Left Word Immediate then Mask Insert | `rlwimix` | +| [`rlwinmx`](rlwinmx.md) | `M` | Rotate Left Word Immediate then AND with Mask | `rlwinmx` | +| [`rlwnmx`](rlwnmx.md) | `M` | Rotate Left Word then AND with Mask | `rlwnmx` | +| [`sldx`](sldx.md) | `X` | Shift Left Doubleword | `sldx` | +| [`slwx`](slwx.md) | `X` | Shift Left Word | `slwx` | +| [`sradix`](sradix.md) | `XS` | Shift Right Algebraic Doubleword Immediate | `sradix` | +| [`sradx`](sradx.md) | `X` | Shift Right Algebraic Doubleword | `sradx` | +| [`srawix`](srawix.md) | `X` | Shift Right Algebraic Word Immediate | `srawix` | +| [`srawx`](srawx.md) | `X` | Shift Right Algebraic Word | `srawx` | +| [`srdx`](srdx.md) | `X` | Shift Right Doubleword | `srdx` | +| [`srwx`](srwx.md) | `X` | Shift Right Word | `srwx` | +| [`subfcx`](subfcx.md) | `XO` | Subtract From Carrying | `subfcx` | +| [`subfex`](subfex.md) | `XO` | Subtract From Extended | `subfex` | +| [`subficx`](subficx.md) | `D` | Subtract From Immediate Carrying | `subficx` | +| [`subfmex`](subfmex.md) | `XO` | Subtract From Minus One Extended | `subfmex` | +| [`subfx`](subfx.md) | `XO` | Subtract From | `subfx` | +| [`subfzex`](subfzex.md) | `XO` | Subtract From Zero Extended | `subfzex` | +| [`sync`](sync.md) | `X` | Synchronize | `sync` | +| [`xori`](xori.md) | `D` | XOR Immediate | `xori` | +| [`xoris`](xoris.md) | `D` | XOR Immediate Shifted | `xoris` | +| [`xorx`](xorx.md) | `X` | XOR | `xorx` | + + diff --git a/migration/project-root/ppc-manual/categories/branch.md b/migration/project-root/ppc-manual/categories/branch.md new file mode 100644 index 0000000..9f2eba1 --- /dev/null +++ b/migration/project-root/ppc-manual/categories/branch.md @@ -0,0 +1,21 @@ +# Branch & System + +Unconditional / conditional branches, branch to LR/CTR, traps, system call. + +**9 families** · **9 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`bcctrx`](bcctrx.md) | `XL` | Branch Conditional to Count Register | `bcctrx` | +| [`bclrx`](bclrx.md) | `XL` | Branch Conditional to Link Register | `bclrx` | +| [`bcx`](bcx.md) | `B` | Branch Conditional | `bcx` | +| [`bx`](bx.md) | `I` | Branch | `bx` | +| [`sc`](sc.md) | `SC` | System Call | `sc` | +| [`td`](td.md) | `X` | Trap Doubleword | `td` | +| [`tdi`](tdi.md) | `D` | Trap Doubleword Immediate | `tdi` | +| [`tw`](tw.md) | `X` | Trap Word | `tw` | +| [`twi`](twi.md) | `D` | Trap Word Immediate | `twi` | + + diff --git a/migration/project-root/ppc-manual/categories/control.md b/migration/project-root/ppc-manual/categories/control.md new file mode 100644 index 0000000..3f5cf57 --- /dev/null +++ b/migration/project-root/ppc-manual/categories/control.md @@ -0,0 +1,38 @@ +# Control / CR / SPR + +Condition-register logical ops, CR field moves, mfspr/mtspr/mtcrf, time-base reads, synchronisation (sync, isync, eieio). + +**26 families** · **26 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`crand`](crand.md) | `XL` | Condition Register AND | `crand` | +| [`crandc`](crandc.md) | `XL` | Condition Register AND with Complement | `crandc` | +| [`creqv`](creqv.md) | `XL` | Condition Register Equivalent | `creqv` | +| [`crnand`](crnand.md) | `XL` | Condition Register NAND | `crnand` | +| [`crnor`](crnor.md) | `XL` | Condition Register NOR | `crnor` | +| [`cror`](cror.md) | `XL` | Condition Register OR | `cror` | +| [`crorc`](crorc.md) | `XL` | Condition Register OR with Complement | `crorc` | +| [`crxor`](crxor.md) | `XL` | Condition Register XOR | `crxor` | +| [`mcrf`](mcrf.md) | `XL` | Move Condition Register Field | `mcrf` | +| [`mcrfs`](mcrfs.md) | `X` | Move to Condition Register from FPSCR | `mcrfs` | +| [`mcrxr`](mcrxr.md) | `X` | Move to Condition Register from XER | `mcrxr` | +| [`mfcr`](mfcr.md) | `X` | Move from Condition Register | `mfcr` | +| [`mffsx`](mffsx.md) | `X` | Move from FPSCR | `mffsx` | +| [`mfmsr`](mfmsr.md) | `X` | Move from Machine State Register | `mfmsr` | +| [`mfspr`](mfspr.md) | `XFX` | Move from Special-Purpose Register | `mfspr` | +| [`mftb`](mftb.md) | `XFX` | Move from Time Base | `mftb` | +| [`mfvscr`](mfvscr.md) | `VX` | Move from VSCR | `mfvscr` | +| [`mtcrf`](mtcrf.md) | `XFX` | Move to Condition Register Fields | `mtcrf` | +| [`mtfsb0x`](mtfsb0x.md) | `X` | Move to FPSCR Bit 0 | `mtfsb0x` | +| [`mtfsb1x`](mtfsb1x.md) | `X` | Move to FPSCR Bit 1 | `mtfsb1x` | +| [`mtfsfix`](mtfsfix.md) | `X` | Move to FPSCR Field Immediate | `mtfsfix` | +| [`mtfsfx`](mtfsfx.md) | `XFL` | Move to FPSCR Fields | `mtfsfx` | +| [`mtmsr`](mtmsr.md) | `X` | Move to Machine State Register | `mtmsr` | +| [`mtmsrd`](mtmsrd.md) | `X` | Move to Machine State Register Doubleword | `mtmsrd` | +| [`mtspr`](mtspr.md) | `XFX` | Move to Special-Purpose Register | `mtspr` | +| [`mtvscr`](mtvscr.md) | `VX` | Move to VSCR | `mtvscr` | + + diff --git a/migration/project-root/ppc-manual/categories/fpu.md b/migration/project-root/ppc-manual/categories/fpu.md new file mode 100644 index 0000000..363d9f8 --- /dev/null +++ b/migration/project-root/ppc-manual/categories/fpu.md @@ -0,0 +1,45 @@ +# Floating-Point + +IEEE-754 add/sub/mul/div/sqrt, fused multiply-add, conversions, compares, FPSCR moves. + +**33 families** · **33 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`fabsx`](fabsx.md) | `X` | Floating Absolute Value | `fabsx` | +| [`faddsx`](faddsx.md) | `A` | Floating Add Single | `faddsx` | +| [`faddx`](faddx.md) | `A` | Floating Add | `faddx` | +| [`fcfidx`](fcfidx.md) | `X` | Floating Convert From Integer Doubleword | `fcfidx` | +| [`fcmpo`](fcmpo.md) | `X` | Floating Compare Ordered | `fcmpo` | +| [`fcmpu`](fcmpu.md) | `X` | Floating Compare Unordered | `fcmpu` | +| [`fctidx`](fctidx.md) | `X` | Floating Convert to Integer Doubleword | `fctidx` | +| [`fctidzx`](fctidzx.md) | `X` | Floating Convert to Integer Doubleword with Round Toward Zero | `fctidzx` | +| [`fctiwx`](fctiwx.md) | `X` | Floating Convert to Integer Word | `fctiwx` | +| [`fctiwzx`](fctiwzx.md) | `X` | Floating Convert to Integer Word with Round Toward Zero | `fctiwzx` | +| [`fdivsx`](fdivsx.md) | `A` | Floating Divide Single | `fdivsx` | +| [`fdivx`](fdivx.md) | `A` | Floating Divide | `fdivx` | +| [`fmaddsx`](fmaddsx.md) | `A` | Floating Multiply-Add Single | `fmaddsx` | +| [`fmaddx`](fmaddx.md) | `A` | Floating Multiply-Add | `fmaddx` | +| [`fmrx`](fmrx.md) | `X` | Floating Move Register | `fmrx` | +| [`fmsubsx`](fmsubsx.md) | `A` | Floating Multiply-Subtract Single | `fmsubsx` | +| [`fmsubx`](fmsubx.md) | `A` | Floating Multiply-Subtract | `fmsubx` | +| [`fmulsx`](fmulsx.md) | `A` | Floating Multiply Single | `fmulsx` | +| [`fmulx`](fmulx.md) | `A` | Floating Multiply | `fmulx` | +| [`fnabsx`](fnabsx.md) | `X` | Floating Negative Absolute Value | `fnabsx` | +| [`fnegx`](fnegx.md) | `X` | Floating Negate | `fnegx` | +| [`fnmaddsx`](fnmaddsx.md) | `A` | Floating Negative Multiply-Add Single | `fnmaddsx` | +| [`fnmaddx`](fnmaddx.md) | `A` | Floating Negative Multiply-Add | `fnmaddx` | +| [`fnmsubsx`](fnmsubsx.md) | `A` | Floating Negative Multiply-Subtract Single | `fnmsubsx` | +| [`fnmsubx`](fnmsubx.md) | `A` | Floating Negative Multiply-Subtract | `fnmsubx` | +| [`fresx`](fresx.md) | `A` | Floating Reciprocal Estimate Single | `fresx` | +| [`frspx`](frspx.md) | `X` | Floating Round to Single | `frspx` | +| [`frsqrtex`](frsqrtex.md) | `A` | Floating Reciprocal Square Root Estimate | `frsqrtex` | +| [`fselx`](fselx.md) | `A` | Floating Select | `fselx` | +| [`fsqrtsx`](fsqrtsx.md) | `A` | Floating Square Root Single | `fsqrtsx` | +| [`fsqrtx`](fsqrtx.md) | `A` | Floating Square Root | `fsqrtx` | +| [`fsubsx`](fsubsx.md) | `A` | Floating Subtract Single | `fsubsx` | +| [`fsubx`](fsubx.md) | `A` | Floating Subtract | `fsubx` | + + diff --git a/migration/project-root/ppc-manual/categories/memory.md b/migration/project-root/ppc-manual/categories/memory.md new file mode 100644 index 0000000..fb8273e --- /dev/null +++ b/migration/project-root/ppc-manual/categories/memory.md @@ -0,0 +1,68 @@ +# Memory + +Loads/stores for byte, half, word, doubleword, float, multiple and string; cache management (dcbt, dcbf, dcbz); reservation pair lwarx/stwcx. + +**56 families** · **112 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`dcbf`](dcbf.md) | `X` | Data Cache Block Flush | `dcbf` | +| [`dcbi`](dcbi.md) | `X` | Data Cache Block Invalidate | `dcbi` | +| [`dcbst`](dcbst.md) | `X` | Data Cache Block Store | `dcbst` | +| [`dcbt`](dcbt.md) | `X` | Data Cache Block Touch | `dcbt` | +| [`dcbtst`](dcbtst.md) | `X` | Data Cache Block Touch for Store | `dcbtst` | +| [`dcbz`](dcbz.md) | `DCBZ` | Data Cache Block Clear to Zero | `dcbz`, `dcbz128` | +| [`icbi`](icbi.md) | `X` | Instruction Cache Block Invalidate | `icbi` | +| [`lbz`](lbz.md) | `D` | Load Byte and Zero | `lbz`, `lbzu`, `lbzux`, `lbzx` | +| [`ld`](ld.md) | `DS` | Load Doubleword | `ld`, `ldu`, `ldux`, `ldx` | +| [`ldarx`](ldarx.md) | `X` | Load Doubleword and Reserve Indexed | `ldarx` | +| [`ldbrx`](ldbrx.md) | `X` | Load Doubleword Byte-Reverse Indexed | `ldbrx` | +| [`lfd`](lfd.md) | `D` | Load Floating-Point Double | `lfd`, `lfdu`, `lfdux`, `lfdx` | +| [`lfs`](lfs.md) | `D` | Load Floating-Point Single | `lfs`, `lfsu`, `lfsux`, `lfsx` | +| [`lha`](lha.md) | `D` | Load Half Word Algebraic | `lha`, `lhau`, `lhaux`, `lhax` | +| [`lhbrx`](lhbrx.md) | `X` | Load Half Word Byte-Reverse Indexed | `lhbrx` | +| [`lhz`](lhz.md) | `D` | Load Half Word and Zero | `lhz`, `lhzu`, `lhzux`, `lhzx` | +| [`lmw`](lmw.md) | `D` | Load Multiple Word | `lmw` | +| [`lswi`](lswi.md) | `X` | Load String Word Immediate | `lswi` | +| [`lswx`](lswx.md) | `X` | Load String Word Indexed | `lswx` | +| [`lvebx`](lvebx.md) | `X` | Load Vector Element Byte Indexed | `lvebx` | +| [`lvehx`](lvehx.md) | `X` | Load Vector Element Half Word Indexed | `lvehx` | +| [`lvewx`](lvewx.md) | `X` | Load Vector Element Word Indexed | `lvewx`, `lvewx128` | +| [`lvlx`](lvlx.md) | `X` | Load Vector Left Indexed | `lvlx`, `lvlx128` | +| [`lvlxl`](lvlxl.md) | `X` | Load Vector Left Indexed LRU | `lvlxl`, `lvlxl128` | +| [`lvrx`](lvrx.md) | `X` | Load Vector Right Indexed | `lvrx`, `lvrx128` | +| [`lvrxl`](lvrxl.md) | `X` | Load Vector Right Indexed LRU | `lvrxl`, `lvrxl128` | +| [`lvx`](lvx.md) | `X` | Load Vector Indexed | `lvx`, `lvx128` | +| [`lvxl`](lvxl.md) | `X` | Load Vector Indexed LRU | `lvxl`, `lvxl128` | +| [`lwa`](lwa.md) | `DS` | Load Word Algebraic | `lwa`, `lwaux`, `lwax` | +| [`lwarx`](lwarx.md) | `X` | Load Word and Reserve Indexed | `lwarx` | +| [`lwbrx`](lwbrx.md) | `X` | Load Word Byte-Reverse Indexed | `lwbrx` | +| [`lwz`](lwz.md) | `D` | Load Word and Zero | `lwz`, `lwzu`, `lwzux`, `lwzx` | +| [`stb`](stb.md) | `D` | Store Byte | `stb`, `stbu`, `stbux`, `stbx` | +| [`std`](std.md) | `DS` | Store Doubleword | `std`, `stdu`, `stdux`, `stdx` | +| [`stdbrx`](stdbrx.md) | `X` | Store Doubleword Byte-Reverse Indexed | `stdbrx` | +| [`stdcx`](stdcx.md) | `X` | Store Doubleword Conditional Indexed | `stdcx` | +| [`stfd`](stfd.md) | `D` | Store Floating-Point Double | `stfd`, `stfdu`, `stfdux`, `stfdx` | +| [`stfiwx`](stfiwx.md) | `X` | Store Floating-Point as Integer Word Indexed | `stfiwx` | +| [`stfs`](stfs.md) | `D` | Store Floating-Point Single | `stfs`, `stfsu`, `stfsux`, `stfsx` | +| [`sth`](sth.md) | `D` | Store Half Word | `sth`, `sthu`, `sthux`, `sthx` | +| [`sthbrx`](sthbrx.md) | `X` | Store Half Word Byte-Reverse Indexed | `sthbrx` | +| [`stmw`](stmw.md) | `D` | Store Multiple Word | `stmw` | +| [`stswi`](stswi.md) | `X` | Store String Word Immediate | `stswi` | +| [`stswx`](stswx.md) | `X` | Store String Word Indexed | `stswx` | +| [`stvebx`](stvebx.md) | `X` | Store Vector Element Byte Indexed | `stvebx` | +| [`stvehx`](stvehx.md) | `X` | Store Vector Element Half Word Indexed | `stvehx` | +| [`stvewx`](stvewx.md) | `X` | Store Vector Element Word Indexed | `stvewx`, `stvewx128` | +| [`stvlx`](stvlx.md) | `X` | Store Vector Left Indexed | `stvlx`, `stvlx128` | +| [`stvlxl`](stvlxl.md) | `X` | Store Vector Left Indexed LRU | `stvlxl`, `stvlxl128` | +| [`stvrx`](stvrx.md) | `X` | Store Vector Right Indexed | `stvrx`, `stvrx128` | +| [`stvrxl`](stvrxl.md) | `X` | Store Vector Right Indexed LRU | `stvrxl`, `stvrxl128` | +| [`stvx`](stvx.md) | `X` | Store Vector Indexed | `stvx`, `stvx128` | +| [`stvxl`](stvxl.md) | `X` | Store Vector Indexed LRU | `stvxl`, `stvxl128` | +| [`stw`](stw.md) | `D` | Store Word | `stw`, `stwu`, `stwux`, `stwx` | +| [`stwbrx`](stwbrx.md) | `X` | Store Word Byte-Reverse Indexed | `stwbrx` | +| [`stwcx`](stwcx.md) | `X` | Store Word Conditional Indexed | `stwcx` | + + diff --git a/migration/project-root/ppc-manual/categories/vmx.md b/migration/project-root/ppc-manual/categories/vmx.md new file mode 100644 index 0000000..288e9d9 --- /dev/null +++ b/migration/project-root/ppc-manual/categories/vmx.md @@ -0,0 +1,156 @@ +# VMX (Altivec) + +128-bit SIMD over 32 registers V0–V31. Integer/float arithmetic, logical, compare, permute/merge, pack/unpack, saturation helpers. + +**144 families** · **193 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`lvsl`](lvsl.md) | `X` | Load Vector for Shift Left Indexed | `lvsl`, `lvsl128` | +| [`lvsr`](lvsr.md) | `X` | Load Vector for Shift Right Indexed | `lvsr`, `lvsr128` | +| [`vaddcuw`](vaddcuw.md) | `VX` | Vector Add Carryout Unsigned Word | `vaddcuw` | +| [`vaddfp`](vaddfp.md) | `VX` | Vector Add Floating Point | `vaddfp`, `vaddfp128` | +| [`vaddsbs`](vaddsbs.md) | `VX` | Vector Add Signed Byte Saturate | `vaddsbs` | +| [`vaddshs`](vaddshs.md) | `VX` | Vector Add Signed Half Word Saturate | `vaddshs` | +| [`vaddsws`](vaddsws.md) | `VX` | Vector Add Signed Word Saturate | `vaddsws` | +| [`vaddubm`](vaddubm.md) | `VX` | Vector Add Unsigned Byte Modulo | `vaddubm` | +| [`vaddubs`](vaddubs.md) | `VX` | Vector Add Unsigned Byte Saturate | `vaddubs` | +| [`vadduhm`](vadduhm.md) | `VX` | Vector Add Unsigned Half Word Modulo | `vadduhm` | +| [`vadduhs`](vadduhs.md) | `VX` | Vector Add Unsigned Half Word Saturate | `vadduhs` | +| [`vadduwm`](vadduwm.md) | `VX` | Vector Add Unsigned Word Modulo | `vadduwm` | +| [`vadduws`](vadduws.md) | `VX` | Vector Add Unsigned Word Saturate | `vadduws` | +| [`vand`](vand.md) | `VX` | Vector Logical AND | `vand`, `vand128` | +| [`vandc`](vandc.md) | `VX` | Vector Logical AND with Complement | `vandc`, `vandc128` | +| [`vavgsb`](vavgsb.md) | `VX` | Vector Average Signed Byte | `vavgsb` | +| [`vavgsh`](vavgsh.md) | `VX` | Vector Average Signed Half Word | `vavgsh` | +| [`vavgsw`](vavgsw.md) | `VX` | Vector Average Signed Word | `vavgsw` | +| [`vavgub`](vavgub.md) | `VX` | Vector Average Unsigned Byte | `vavgub` | +| [`vavguh`](vavguh.md) | `VX` | Vector Average Unsigned Half Word | `vavguh` | +| [`vavguw`](vavguw.md) | `VX` | Vector Average Unsigned Word | `vavguw` | +| [`vcfsx`](vcfsx.md) | `VX` | Vector Convert from Signed Fixed-Point Word | `vcfsx` | +| [`vcfux`](vcfux.md) | `VX` | Vector Convert from Unsigned Fixed-Point Word | `vcfux` | +| [`vcmpbfp`](vcmpbfp.md) | `VC` | Vector Compare Bounds Floating Point | `vcmpbfp`, `vcmpbfp128` | +| [`vcmpeqfp`](vcmpeqfp.md) | `VC` | Vector Compare Equal-to Floating Point | `vcmpeqfp`, `vcmpeqfp128` | +| [`vcmpequb`](vcmpequb.md) | `VC` | Vector Compare Equal-to Unsigned Byte | `vcmpequb` | +| [`vcmpequh`](vcmpequh.md) | `VC` | Vector Compare Equal-to Unsigned Half Word | `vcmpequh` | +| [`vcmpequw`](vcmpequw.md) | `VC` | Vector Compare Equal-to Unsigned Word | `vcmpequw`, `vcmpequw128` | +| [`vcmpgefp`](vcmpgefp.md) | `VC` | Vector Compare Greater-Than-or-Equal-to Floating Point | `vcmpgefp`, `vcmpgefp128` | +| [`vcmpgtfp`](vcmpgtfp.md) | `VC` | Vector Compare Greater-Than Floating Point | `vcmpgtfp`, `vcmpgtfp128` | +| [`vcmpgtsb`](vcmpgtsb.md) | `VC` | Vector Compare Greater-Than Signed Byte | `vcmpgtsb` | +| [`vcmpgtsh`](vcmpgtsh.md) | `VC` | Vector Compare Greater-Than Signed Half Word | `vcmpgtsh` | +| [`vcmpgtsw`](vcmpgtsw.md) | `VC` | Vector Compare Greater-Than Signed Word | `vcmpgtsw` | +| [`vcmpgtub`](vcmpgtub.md) | `VC` | Vector Compare Greater-Than Unsigned Byte | `vcmpgtub` | +| [`vcmpgtuh`](vcmpgtuh.md) | `VC` | Vector Compare Greater-Than Unsigned Half Word | `vcmpgtuh` | +| [`vcmpgtuw`](vcmpgtuw.md) | `VC` | Vector Compare Greater-Than Unsigned Word | `vcmpgtuw` | +| [`vctsxs`](vctsxs.md) | `VX` | Vector Convert to Signed Fixed-Point Word Saturate | `vctsxs` | +| [`vctuxs`](vctuxs.md) | `VX` | Vector Convert to Unsigned Fixed-Point Word Saturate | `vctuxs` | +| [`vexptefp`](vexptefp.md) | `VX` | Vector 2 Raised to the Exponent Estimate Floating Point | `vexptefp`, `vexptefp128` | +| [`vlogefp`](vlogefp.md) | `VX` | Vector Log2 Estimate Floating Point | `vlogefp`, `vlogefp128` | +| [`vmaddfp`](vmaddfp.md) | `VA` | Vector Multiply-Add Floating Point | `vmaddfp`, `vmaddfp128` | +| [`vmaxfp`](vmaxfp.md) | `VX` | Vector Maximum Floating Point | `vmaxfp`, `vmaxfp128` | +| [`vmaxsb`](vmaxsb.md) | `VX` | Vector Maximum Signed Byte | `vmaxsb` | +| [`vmaxsh`](vmaxsh.md) | `VX` | Vector Maximum Signed Half Word | `vmaxsh` | +| [`vmaxsw`](vmaxsw.md) | `VX` | Vector Maximum Signed Word | `vmaxsw` | +| [`vmaxub`](vmaxub.md) | `VX` | Vector Maximum Unsigned Byte | `vmaxub` | +| [`vmaxuh`](vmaxuh.md) | `VX` | Vector Maximum Unsigned Half Word | `vmaxuh` | +| [`vmaxuw`](vmaxuw.md) | `VX` | Vector Maximum Unsigned Word | `vmaxuw` | +| [`vmhaddshs`](vmhaddshs.md) | `VA` | Vector Multiply-High and Add Signed Signed Half Word Saturate | `vmhaddshs` | +| [`vmhraddshs`](vmhraddshs.md) | `VA` | Vector Multiply-High Round and Add Signed Signed Half Word Saturate | `vmhraddshs` | +| [`vminfp`](vminfp.md) | `VX` | Vector Minimum Floating Point | `vminfp`, `vminfp128` | +| [`vminsb`](vminsb.md) | `VX` | Vector Minimum Signed Byte | `vminsb` | +| [`vminsh`](vminsh.md) | `VX` | Vector Minimum Signed Half Word | `vminsh` | +| [`vminsw`](vminsw.md) | `VX` | Vector Minimum Signed Word | `vminsw` | +| [`vminub`](vminub.md) | `VX` | Vector Minimum Unsigned Byte | `vminub` | +| [`vminuh`](vminuh.md) | `VX` | Vector Minimum Unsigned Half Word | `vminuh` | +| [`vminuw`](vminuw.md) | `VX` | Vector Minimum Unsigned Word | `vminuw` | +| [`vmladduhm`](vmladduhm.md) | `VA` | Vector Multiply-Low and Add Unsigned Half Word Modulo | `vmladduhm` | +| [`vmrghb`](vmrghb.md) | `VX` | Vector Merge High Byte | `vmrghb` | +| [`vmrghh`](vmrghh.md) | `VX` | Vector Merge High Half Word | `vmrghh` | +| [`vmrghw`](vmrghw.md) | `VX` | Vector Merge High Word | `vmrghw`, `vmrghw128` | +| [`vmrglb`](vmrglb.md) | `VX` | Vector Merge Low Byte | `vmrglb` | +| [`vmrglh`](vmrglh.md) | `VX` | Vector Merge Low Half Word | `vmrglh` | +| [`vmrglw`](vmrglw.md) | `VX` | Vector Merge Low Word | `vmrglw`, `vmrglw128` | +| [`vmsummbm`](vmsummbm.md) | `VA` | Vector Multiply-Sum Mixed-Sign Byte Modulo | `vmsummbm` | +| [`vmsumshm`](vmsumshm.md) | `VA` | Vector Multiply-Sum Signed Half Word Modulo | `vmsumshm` | +| [`vmsumshs`](vmsumshs.md) | `VA` | Vector Multiply-Sum Signed Half Word Saturate | `vmsumshs` | +| [`vmsumubm`](vmsumubm.md) | `VA` | Vector Multiply-Sum Unsigned Byte Modulo | `vmsumubm` | +| [`vmsumuhm`](vmsumuhm.md) | `VA` | Vector Multiply-Sum Unsigned Half Word Modulo | `vmsumuhm` | +| [`vmsumuhs`](vmsumuhs.md) | `VA` | Vector Multiply-Sum Unsigned Half Word Saturate | `vmsumuhs` | +| [`vmulesb`](vmulesb.md) | `VX` | Vector Multiply Even Signed Byte | `vmulesb` | +| [`vmulesh`](vmulesh.md) | `VX` | Vector Multiply Even Signed Half Word | `vmulesh` | +| [`vmuleub`](vmuleub.md) | `VX` | Vector Multiply Even Unsigned Byte | `vmuleub` | +| [`vmuleuh`](vmuleuh.md) | `VX` | Vector Multiply Even Unsigned Half Word | `vmuleuh` | +| [`vmulosb`](vmulosb.md) | `VX` | Vector Multiply Odd Signed Byte | `vmulosb` | +| [`vmulosh`](vmulosh.md) | `VX` | Vector Multiply Odd Signed Half Word | `vmulosh` | +| [`vmuloub`](vmuloub.md) | `VX` | Vector Multiply Odd Unsigned Byte | `vmuloub` | +| [`vmulouh`](vmulouh.md) | `VX` | Vector Multiply Odd Unsigned Half Word | `vmulouh` | +| [`vnmsubfp`](vnmsubfp.md) | `VA` | Vector Negative Multiply-Subtract Floating Point | `vnmsubfp`, `vnmsubfp128` | +| [`vnor`](vnor.md) | `VX` | Vector Logical NOR | `vnor`, `vnor128` | +| [`vor`](vor.md) | `VX` | Vector Logical OR | `vor`, `vor128` | +| [`vperm`](vperm.md) | `VA` | Vector Permute | `vperm`, `vperm128` | +| [`vpkpx`](vpkpx.md) | `VX` | Vector Pack Pixel | `vpkpx` | +| [`vpkshss`](vpkshss.md) | `VX` | Vector Pack Signed Half Word Signed Saturate | `vpkshss`, `vpkshss128` | +| [`vpkshus`](vpkshus.md) | `VX` | Vector Pack Signed Half Word Unsigned Saturate | `vpkshus`, `vpkshus128` | +| [`vpkswss`](vpkswss.md) | `VX` | Vector Pack Signed Word Signed Saturate | `vpkswss`, `vpkswss128` | +| [`vpkswus`](vpkswus.md) | `VX` | Vector Pack Signed Word Unsigned Saturate | `vpkswus`, `vpkswus128` | +| [`vpkuhum`](vpkuhum.md) | `VX` | Vector Pack Unsigned Half Word Unsigned Modulo | `vpkuhum`, `vpkuhum128` | +| [`vpkuhus`](vpkuhus.md) | `VX` | Vector Pack Unsigned Half Word Unsigned Saturate | `vpkuhus`, `vpkuhus128` | +| [`vpkuwum`](vpkuwum.md) | `VX` | Vector Pack Unsigned Word Unsigned Modulo | `vpkuwum`, `vpkuwum128` | +| [`vpkuwus`](vpkuwus.md) | `VX` | Vector Pack Unsigned Word Unsigned Saturate | `vpkuwus`, `vpkuwus128` | +| [`vrefp`](vrefp.md) | `VX` | Vector Reciprocal Estimate Floating Point | `vrefp`, `vrefp128` | +| [`vrfim`](vrfim.md) | `VX` | Vector Round to Floating-Point Integer toward -Infinity | `vrfim`, `vrfim128` | +| [`vrfin`](vrfin.md) | `VX` | Vector Round to Floating-Point Integer Nearest | `vrfin`, `vrfin128` | +| [`vrfip`](vrfip.md) | `VX` | Vector Round to Floating-Point Integer toward +Infinity | `vrfip`, `vrfip128` | +| [`vrfiz`](vrfiz.md) | `VX` | Vector Round to Floating-Point Integer toward Zero | `vrfiz`, `vrfiz128` | +| [`vrlb`](vrlb.md) | `VX` | Vector Rotate Left Integer Byte | `vrlb` | +| [`vrlh`](vrlh.md) | `VX` | Vector Rotate Left Integer Half Word | `vrlh` | +| [`vrlw`](vrlw.md) | `VX` | Vector Rotate Left Integer Word | `vrlw`, `vrlw128` | +| [`vrsqrtefp`](vrsqrtefp.md) | `VX` | Vector Reciprocal Square Root Estimate Floating Point | `vrsqrtefp`, `vrsqrtefp128` | +| [`vsel`](vsel.md) | `VA` | Vector Conditional Select | `vsel`, `vsel128` | +| [`vsl`](vsl.md) | `VX` | Vector Shift Left | `vsl` | +| [`vslb`](vslb.md) | `VX` | Vector Shift Left Integer Byte | `vslb` | +| [`vsldoi`](vsldoi.md) | `VA` | Vector Shift Left Double by Octet Immediate | `vsldoi`, `vsldoi128` | +| [`vslh`](vslh.md) | `VX` | Vector Shift Left Integer Half Word | `vslh` | +| [`vslo`](vslo.md) | `VX` | Vector Shift Left by Octet | `vslo`, `vslo128` | +| [`vslw`](vslw.md) | `VX` | Vector Shift Left Integer Word | `vslw`, `vslw128` | +| [`vspltb`](vspltb.md) | `VX` | Vector Splat Byte | `vspltb` | +| [`vsplth`](vsplth.md) | `VX` | Vector Splat Half Word | `vsplth` | +| [`vspltisb`](vspltisb.md) | `VX` | Vector Splat Immediate Signed Byte | `vspltisb` | +| [`vspltish`](vspltish.md) | `VX` | Vector Splat Immediate Signed Half Word | `vspltish` | +| [`vspltisw`](vspltisw.md) | `VX` | Vector Splat Immediate Signed Word | `vspltisw`, `vspltisw128` | +| [`vspltw`](vspltw.md) | `VX` | Vector Splat Word | `vspltw`, `vspltw128` | +| [`vsr`](vsr.md) | `VX` | Vector Shift Right | `vsr` | +| [`vsrab`](vsrab.md) | `VX` | Vector Shift Right Algebraic Byte | `vsrab` | +| [`vsrah`](vsrah.md) | `VX` | Vector Shift Right Algebraic Half Word | `vsrah` | +| [`vsraw`](vsraw.md) | `VX` | Vector Shift Right Algebraic Word | `vsraw`, `vsraw128` | +| [`vsrb`](vsrb.md) | `VX` | Vector Shift Right Byte | `vsrb` | +| [`vsrh`](vsrh.md) | `VX` | Vector Shift Right Half Word | `vsrh` | +| [`vsro`](vsro.md) | `VX` | Vector Shift Right Octet | `vsro`, `vsro128` | +| [`vsrw`](vsrw.md) | `VX` | Vector Shift Right Word | `vsrw`, `vsrw128` | +| [`vsubcuw`](vsubcuw.md) | `VX` | Vector Subtract Carryout Unsigned Word | `vsubcuw` | +| [`vsubfp`](vsubfp.md) | `VX` | Vector Subtract Floating Point | `vsubfp`, `vsubfp128` | +| [`vsubsbs`](vsubsbs.md) | `VX` | Vector Subtract Signed Byte Saturate | `vsubsbs` | +| [`vsubshs`](vsubshs.md) | `VX` | Vector Subtract Signed Half Word Saturate | `vsubshs` | +| [`vsubsws`](vsubsws.md) | `VX` | Vector Subtract Signed Word Saturate | `vsubsws` | +| [`vsububm`](vsububm.md) | `VX` | Vector Subtract Unsigned Byte Modulo | `vsububm` | +| [`vsububs`](vsububs.md) | `VX` | Vector Subtract Unsigned Byte Saturate | `vsububs` | +| [`vsubuhm`](vsubuhm.md) | `VX` | Vector Subtract Unsigned Half Word Modulo | `vsubuhm` | +| [`vsubuhs`](vsubuhs.md) | `VX` | Vector Subtract Unsigned Half Word Saturate | `vsubuhs` | +| [`vsubuwm`](vsubuwm.md) | `VX` | Vector Subtract Unsigned Word Modulo | `vsubuwm` | +| [`vsubuws`](vsubuws.md) | `VX` | Vector Subtract Unsigned Word Saturate | `vsubuws` | +| [`vsum2sws`](vsum2sws.md) | `VX` | Vector Sum Across Partial (1/2) Signed Word Saturate | `vsum2sws` | +| [`vsum4sbs`](vsum4sbs.md) | `VX` | Vector Sum Across Partial (1/4) Signed Byte Saturate | `vsum4sbs` | +| [`vsum4shs`](vsum4shs.md) | `VX` | Vector Sum Across Partial (1/4) Signed Half Word Saturate | `vsum4shs` | +| [`vsum4ubs`](vsum4ubs.md) | `VX` | Vector Sum Across Partial (1/4) Unsigned Byte Saturate | `vsum4ubs` | +| [`vsumsws`](vsumsws.md) | `VX` | Vector Sum Across Signed Word Saturate | `vsumsws` | +| [`vupkhpx`](vupkhpx.md) | `VX` | Vector Unpack High Pixel | `vupkhpx` | +| [`vupkhsb`](vupkhsb.md) | `VX` | Vector Unpack High Signed Byte | `vupkhsb`, `vupkhsb128` | +| [`vupkhsh`](vupkhsh.md) | `VX` | Vector Unpack High Signed Half Word | `vupkhsh` | +| [`vupklpx`](vupklpx.md) | `VX` | Vector Unpack Low Pixel | `vupklpx` | +| [`vupklsb`](vupklsb.md) | `VX` | Vector Unpack Low Signed Byte | `vupklsb`, `vupklsb128` | +| [`vupklsh`](vupklsh.md) | `VX` | Vector Unpack Low Signed Half Word | `vupklsh` | +| [`vxor`](vxor.md) | `VX` | Vector Logical XOR | `vxor`, `vxor128` | + + diff --git a/migration/project-root/ppc-manual/categories/vmx128.md b/migration/project-root/ppc-manual/categories/vmx128.md new file mode 100644 index 0000000..9526fa0 --- /dev/null +++ b/migration/project-root/ppc-manual/categories/vmx128.md @@ -0,0 +1,24 @@ +# VMX128 + +Xbox-360-specific Altivec extension that widens the vector register file to 128 registers (V0–V127). Register IDs are encoded with bit-fusion across non-contiguous fields. + +**12 families** · **12 XML entries**. + + + +| Family | Form | Description | Members | +| --- | --- | --- | --- | +| [`vcfpsxws128`](vcfpsxws128.md) | `VX128_3` | Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate | `vcfpsxws128` | +| [`vcfpuxws128`](vcfpuxws128.md) | `VX128_3` | Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate | `vcfpuxws128` | +| [`vcsxwfp128`](vcsxwfp128.md) | `VX128_3` | Vector128 Convert From Signed Fixed-Point Word to Floating-Point | `vcsxwfp128` | +| [`vcuxwfp128`](vcuxwfp128.md) | `VX128_3` | Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point | `vcuxwfp128` | +| [`vmaddcfp128`](vmaddcfp128.md) | `VX128` | Vector128 Multiply Add Floating Point | `vmaddcfp128` | +| [`vmsum3fp128`](vmsum3fp128.md) | `VX128` | Vector128 Multiply Sum 3-way Floating Point | `vmsum3fp128` | +| [`vmsum4fp128`](vmsum4fp128.md) | `VX128` | Vector128 Multiply Sum 4-way Floating-Point | `vmsum4fp128` | +| [`vmulfp128`](vmulfp128.md) | `VX128` | Vector128 Multiply Floating-Point | `vmulfp128` | +| [`vpermwi128`](vpermwi128.md) | `VX128_P` | Vector128 Permutate Word Immediate | `vpermwi128` | +| [`vpkd3d128`](vpkd3d128.md) | `VX128_4` | Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert | `vpkd3d128` | +| [`vrlimi128`](vrlimi128.md) | `VX128_4` | Vector128 Rotate Left Immediate and Mask Insert | `vrlimi128` | +| [`vupkd3d128`](vupkd3d128.md) | `VX128_3` | Vector128 Unpack D3Dtype | `vupkd3d128` | + + diff --git a/migration/project-root/ppc-manual/control/crand.md b/migration/project-root/ppc-manual/control/crand.md new file mode 100644 index 0000000..0ae3769 --- /dev/null +++ b/migration/project-root/ppc-manual/control/crand.md @@ -0,0 +1,127 @@ +# `crand` — Condition Register AND + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000202` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crand` | `crand` | — | Condition Register AND | + +## Syntax + +```asm +crand [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crand` — form `XL` + +- **Opcode word:** `0x4c000202` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `257` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crand: read | CR source bit A (0–31). | +| `CRBB` | crand: read | CR source bit B (0–31). | +| `CRBD` | crand: write | CR destination bit (0–31). | + +## Register Effects + +### `crand` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crand`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crand"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:352`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L352) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:717`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L717) + + + +## Special Cases & Edge Conditions + +- **Bit-level granularity.** All eight CR-logical instructions operate on **single CR bits**, not whole 4-bit fields. `CRBD`, `CRBA`, `CRBB` are 5-bit absolute indices into the 32-bit CR register: `BI = 4·field + bit-within-field`, where bit 0 = LT, 1 = GT, 2 = EQ, 3 = SO. +- **Operation.** `CR[CRBD] ← CR[CRBA] AND CR[CRBB]`. All other CR bits are preserved. +- **Same-source / same-destination quirks.** Identical sources and destinations are legal: `crand 6, 6, 6` is a NOP-style "force CR bit 6 to itself"; `crand bt, bt, bt` reads-then-writes the same bit (no observable change). Compilers exploit `crxor BT,BT,BT` ("clear bit") and `creqv BT,BT,BT` ("set bit") for similar tricks — see those pages. +- **Combining branch conditions.** The classic use: synthesise complex branch conditions from multiple compare results. Example: `cmpw cr0, r3, r4; cmpw cr1, r5, r6; crand 4*cr0+2, 4*cr0+2, 4*cr1+2; beq cr0, label` branches if `r3==r4 AND r5==r6` using a single conditional branch. +- **No `Rc` / `OE`.** XL-form CR-logical ops never set CR0 or XER; they only update the named CR bit. +- **Not synchronising.** Pure data-flow on CR; freely reorderable. +- **xenia status.** xenia-rs decodes `crand` (decoder slot 540) but the interpreter snapshot is not embedded on this page — implementation lives in [`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs). xenia-canary's `InstrEmit_crand` emits the equivalent host AND of the two CR bits. + +## Related Instructions + +- [`crandc`](crandc.md) — AND with complement: `CR[BT] ← CR[BA] AND ¬CR[BB]`. +- [`cror`](cror.md), [`crorc`](crorc.md) — OR / OR-with-complement. +- [`crnand`](crnand.md), [`crnor`](crnor.md) — negated AND / OR. +- [`crxor`](crxor.md), [`creqv`](creqv.md) — XOR and equivalence (XNOR). +- [`mcrf`](mcrf.md) — copy a whole 4-bit CR field. +- [`bcx`](../branch/bcx.md) consumers — the typical reason to compute composite CR bits. + +### Simplified Mnemonics + +- `crmove BT, BA` ≡ `cror BT, BA, BA` (use `cror`, not `crand`). +- `crset BT` ≡ `creqv BT, BT, BT` (set to 1). +- `crclr BT` ≡ `crxor BT, BT, BT` (clear to 0). + +`crand` itself has no dedicated simplified mnemonic. + +## IBM Reference + +- [AIX 7.3 — `crand` (Condition Register AND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crand-condition-register-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/crandc.md b/migration/project-root/ppc-manual/control/crandc.md new file mode 100644 index 0000000..cc395b5 --- /dev/null +++ b/migration/project-root/ppc-manual/control/crandc.md @@ -0,0 +1,119 @@ +# `crandc` — Condition Register AND with Complement + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000102` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crandc` | `crandc` | — | Condition Register AND with Complement | + +## Syntax + +```asm +crandc [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crandc` — form `XL` + +- **Opcode word:** `0x4c000102` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `129` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crandc: read | CR source bit A (0–31). | +| `CRBB` | crandc: read | CR source bit B (0–31). | +| `CRBD` | crandc: write | CR destination bit (0–31). | + +## Register Effects + +### `crandc` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crandc`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crandc"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:361`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L361) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:713`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L713) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← CR[CRBA] AND ¬CR[CRBB]` — a one-instruction "A and not B" that would otherwise need a complement-then-AND sequence. All other CR bits are preserved. +- **Bit-level operands.** `CRBD`, `CRBA`, `CRBB` are 5-bit indices into the 32 CR bits (0=CR0.LT, 1=CR0.GT, 2=CR0.EQ, 3=CR0.SO, 4=CR1.LT, …, 31=CR7.SO). They need not lie in the same CR field. +- **Identity / corner cases.** `crandc BT, BA, BA` always yields 0 (a clear-bit idiom, equivalent to but slower than `crxor BT, BT, BT`). `crandc BT, BA, BB` with `BB` always 0 reduces to `crmove BT, BA`. +- **Use case.** Synthesises "branch if A *and not* B" predicates without a dedicated `cmp` of `B`. Example: branch only if `cr0.EQ` *and not* `cr1.SO` — `crandc 2, 2, 7` then `beq` on `cr0`. +- **No `Rc` / `OE`.** Pure CR-bit dataflow; doesn't update CR0 or XER. +- **Not synchronising.** Reorderable. +- **xenia status.** Interpreter handles via the generic CR-logical helper. xenia-canary's `InstrEmit_crandc` emits a host AND of `A` and bitwise-NOT of `B`. + +## Related Instructions + +- [`crand`](crand.md), [`cror`](cror.md), [`crorc`](crorc.md) — closest siblings. +- [`crnand`](crnand.md), [`crnor`](crnor.md), [`crxor`](crxor.md), [`creqv`](creqv.md) — full set of CR Boolean ops. +- [`mcrf`](mcrf.md) — copy entire CR field; complementary "broad" CR move. +- [`bcx`](../branch/bcx.md) — typical consumer of synthesised CR bits. + +`crandc` has no dedicated simplified mnemonic. See [`crand`](crand.md) for the standard `crmove` / `crset` / `crclr` family. + +## IBM Reference + +- [AIX 7.3 — `crandc` (Condition Register AND with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crandc-condition-register-complement-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/creqv.md b/migration/project-root/ppc-manual/control/creqv.md new file mode 100644 index 0000000..03f6a05 --- /dev/null +++ b/migration/project-root/ppc-manual/control/creqv.md @@ -0,0 +1,124 @@ +# `creqv` — Condition Register Equivalent + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000242` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `creqv` | `creqv` | — | Condition Register Equivalent | + +## Syntax + +```asm +creqv [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `creqv` — form `XL` + +- **Opcode word:** `0x4c000242` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `289` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | creqv: read | CR source bit A (0–31). | +| `CRBB` | creqv: read | CR source bit B (0–31). | +| `CRBD` | creqv: write | CR destination bit (0–31). | + +## Register Effects + +### `creqv` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`creqv`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="creqv"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:370`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L370) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:718`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L718) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← ¬(CR[CRBA] XOR CR[CRBB])` — i.e. logical equivalence (XNOR). Result is 1 iff `CRBA` and `CRBB` agree. +- **`crset BT` idiom.** With identical operands, `creqv BT, BT, BT` always yields 1 (any bit XNOR'd with itself is 1). This is the canonical PowerPC **set-to-1** for a single CR bit; assemblers recognise the simplified mnemonic `crset BT`. +- **Bit-level operands.** Like all CR-logical ops, the three operands are 5-bit absolute CR-bit indices (0..31). Mixing CR fields is fine. +- **Use case.** Branch on "A == B" of two prior compare results. Example: `crxor` of CR0.SO and CR1.SO gives "differ"; `creqv` gives "agree". +- **No `Rc` / `OE`.** Doesn't touch CR0, XER, or any other state beyond the named bit. +- **Not synchronising.** Reorderable. +- **xenia status.** Interpreter dispatches through the generic CR-logical helper; canary emits the host XNOR equivalent. The `crset` simplified form is the most common occurrence in real Xbox 360 code. + +## Related Instructions + +- [`crand`](crand.md), [`crandc`](crandc.md) — AND family. +- [`cror`](cror.md), [`crorc`](crorc.md), [`crnor`](crnor.md), [`crnand`](crnand.md) — OR family. +- [`crxor`](crxor.md) — the dual; `crxor BT, BT, BT` is the standard **clear-to-0** idiom. +- [`mcrf`](mcrf.md) — bulk CR-field move. +- [`bcx`](../branch/bcx.md) — consumes the synthesised bit. + +### Simplified Mnemonics + +| Simplified | Expansion | Effect | +| --- | --- | --- | +| `crset BT` | `creqv BT, BT, BT` | force `CR[BT] ← 1` | + +## IBM Reference + +- [AIX 7.3 — `creqv` (Condition Register Equivalent)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-creqv-condition-register-equivalent-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/crnand.md b/migration/project-root/ppc-manual/control/crnand.md new file mode 100644 index 0000000..7b0ea7c --- /dev/null +++ b/migration/project-root/ppc-manual/control/crnand.md @@ -0,0 +1,121 @@ +# `crnand` — Condition Register NAND + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c0001c2` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crnand` | `crnand` | — | Condition Register NAND | + +## Syntax + +```asm +crnand [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crnand` — form `XL` + +- **Opcode word:** `0x4c0001c2` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `225` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crnand: read | CR source bit A (0–31). | +| `CRBB` | crnand: read | CR source bit B (0–31). | +| `CRBD` | crnand: write | CR destination bit (0–31). | + +## Register Effects + +### `crnand` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crnand`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crnand"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:379`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L379) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:716`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L716) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← ¬(CR[CRBA] AND CR[CRBB])`. Result is 0 only when both source bits are 1; otherwise 1. +- **Bit-level operands.** 5-bit absolute CR-bit indices, identical to the rest of the CR-logical family. Source and destination bits may all be in different CR fields. +- **Identity case.** `crnand BT, BA, BA` ≡ `¬CR[BA]` — a one-instruction CR-bit invert. Less common than `crxor` against a `crset`-prepared 1-bit, but legal. +- **Use case.** Branch on "NOT (A AND B)". Less common than the De Morgan equivalent (`cror BT, ¬A, ¬B`), but saves an extra `crnot` step. +- **No `Rc` / `OE`.** No CR0 / XER side effects. +- **Not synchronising.** Reorderable. +- **xenia status.** Decoded by the generic XL-form CR-logical handler; the interpreter snapshot is shared with `crand`/`cror`/etc. xenia-canary's `InstrEmit_crnand` emits a host AND followed by NOT. + +## Related Instructions + +- [`crand`](crand.md), [`crandc`](crandc.md) — non-negated AND siblings. +- [`crnor`](crnor.md) — negated OR (the De Morgan dual). +- [`cror`](cror.md), [`crorc`](crorc.md) — OR family. +- [`crxor`](crxor.md), [`creqv`](creqv.md) — XOR / XNOR. +- [`mcrf`](mcrf.md) — copy a 4-bit CR field wholesale. +- [`bcx`](../branch/bcx.md) — typical consumer. + +`crnand` has no dedicated simplified mnemonic. Use `crand` + a separate complement, or use `crnand BT, BA, BA` to invert a single bit. + +## IBM Reference + +- [AIX 7.3 — `crnand` (Condition Register NAND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crnand-condition-register-nand-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/crnor.md b/migration/project-root/ppc-manual/control/crnor.md new file mode 100644 index 0000000..d495ac4 --- /dev/null +++ b/migration/project-root/ppc-manual/control/crnor.md @@ -0,0 +1,125 @@ +# `crnor` — Condition Register NOR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000042` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crnor` | `crnor` | — | Condition Register NOR | + +## Syntax + +```asm +crnor [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crnor` — form `XL` + +- **Opcode word:** `0x4c000042` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `33` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crnor: read | CR source bit A (0–31). | +| `CRBB` | crnor: read | CR source bit B (0–31). | +| `CRBD` | crnor: write | CR destination bit (0–31). | + +## Register Effects + +### `crnor` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crnor`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crnor"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:388`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L388) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:712`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L712) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← ¬(CR[CRBA] OR CR[CRBB])`. Result is 1 only when both source bits are 0; otherwise 0. +- **`crnot BT, BA` idiom.** With `BA == BB`, `crnor BT, BA, BA` ≡ `¬CR[BA]` — this is the canonical PowerPC **single-bit invert**, recognised by assemblers as the simplified mnemonic `crnot BT, BA`. +- **Bit-level operands.** 5-bit absolute CR-bit indices (0..31). The three bits may live in any combination of the eight CR fields. +- **Use case.** Branch on "neither A nor B"; or, with `crnot`, simply complement a CR bit before consuming it in a `bcx`. +- **No `Rc` / `OE`.** Pure CR-bit dataflow; CR0/XER untouched. +- **Not synchronising.** Reorderable. +- **xenia status.** Decoded via the generic CR-logical handler. xenia-canary's `InstrEmit_crnor` emits a host OR followed by NOT. + +## Related Instructions + +- [`cror`](cror.md), [`crorc`](crorc.md) — non-negated OR siblings. +- [`crnand`](crnand.md) — negated AND (the De Morgan dual). +- [`crand`](crand.md), [`crandc`](crandc.md) — AND family. +- [`crxor`](crxor.md), [`creqv`](creqv.md) — XOR / XNOR. +- [`mcrf`](mcrf.md) — bulk CR-field copy. +- [`bcx`](../branch/bcx.md) — typical consumer of synthesised CR bits. + +### Simplified Mnemonics + +| Simplified | Expansion | Effect | +| --- | --- | --- | +| `crnot BT, BA` | `crnor BT, BA, BA` | `CR[BT] ← ¬CR[BA]` (invert single bit) | + +## IBM Reference + +- [AIX 7.3 — `crnor` (Condition Register NOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crnor-condition-register-nor-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/cror.md b/migration/project-root/ppc-manual/control/cror.md new file mode 100644 index 0000000..5ed28c5 --- /dev/null +++ b/migration/project-root/ppc-manual/control/cror.md @@ -0,0 +1,125 @@ +# `cror` — Condition Register OR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000382` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `cror` | `cror` | — | Condition Register OR | + +## Syntax + +```asm +cror [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `cror` — form `XL` + +- **Opcode word:** `0x4c000382` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `449` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | cror: read | CR source bit A (0–31). | +| `CRBB` | cror: read | CR source bit B (0–31). | +| `CRBD` | cror: write | CR destination bit (0–31). | + +## Register Effects + +### `cror` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`cror`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="cror"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:397`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L397) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:720`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L720) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← CR[CRBA] OR CR[CRBB]`. All other CR bits are preserved. +- **`crmove BT, BA` idiom.** With `BA == BB`, `cror BT, BA, BA` is the canonical PowerPC **single-bit copy** between CR bits, recognised by assemblers as the simplified mnemonic `crmove BT, BA`. This is the standard way to relocate a CR bit (e.g., promote `cr1.EQ` to `cr0.EQ` so a default-`cr0` branch can consume it). +- **Bit-level operands.** Three independent 5-bit CR-bit indices; mixing CR fields is the whole point of this family. +- **Use case.** Branch on "A OR B" of two prior compare results — saves an extra branch by collapsing two conditions. +- **No `Rc` / `OE`.** Pure CR-bit dataflow. +- **Not synchronising.** Reorderable. +- **xenia status.** Most-used CR-logical instruction in real code (almost always as `crmove`). Decoded by the generic XL-form CR-logical handler; canary emits a host OR. + +## Related Instructions + +- [`crand`](crand.md), [`crandc`](crandc.md) — AND family. +- [`crorc`](crorc.md) — OR with complement. +- [`crnor`](crnor.md), [`crnand`](crnand.md) — negated forms. +- [`crxor`](crxor.md), [`creqv`](creqv.md) — XOR / XNOR. +- [`mcrf`](mcrf.md) — copy a 4-bit CR field wholesale. +- [`bcx`](../branch/bcx.md), [`bclrx`](../branch/bclrx.md), [`bcctrx`](../branch/bcctrx.md) — typical consumers. + +### Simplified Mnemonics + +| Simplified | Expansion | Effect | +| --- | --- | --- | +| `crmove BT, BA` | `cror BT, BA, BA` | `CR[BT] ← CR[BA]` (single-bit copy) | + +## IBM Reference + +- [AIX 7.3 — `cror` (Condition Register OR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-cror-condition-register-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/crorc.md b/migration/project-root/ppc-manual/control/crorc.md new file mode 100644 index 0000000..abb3c17 --- /dev/null +++ b/migration/project-root/ppc-manual/control/crorc.md @@ -0,0 +1,121 @@ +# `crorc` — Condition Register OR with Complement + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000342` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crorc` | `crorc` | — | Condition Register OR with Complement | + +## Syntax + +```asm +crorc [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crorc` — form `XL` + +- **Opcode word:** `0x4c000342` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `417` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crorc: read | CR source bit A (0–31). | +| `CRBB` | crorc: read | CR source bit B (0–31). | +| `CRBD` | crorc: write | CR destination bit (0–31). | + +## Register Effects + +### `crorc` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crorc`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crorc"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:406`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L406) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:719`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L719) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← CR[CRBA] OR ¬CR[CRBB]` — a one-instruction "A or not B" implication. Result is 0 only when `A=0` and `B=1`. +- **Logical implication.** `B → A` ≡ `¬B ∨ A`, which is exactly `crorc BT, BA, BB`. Useful for predicates of the form "if B holds, then A must hold". +- **Identity case.** `crorc BT, BA, BA` always yields 1 (`A ∨ ¬A`), an alternative to `creqv` for setting a bit. +- **Bit-level operands.** 5-bit absolute CR-bit indices; sources and destination may all be in different CR fields. +- **Use case.** Compose "if B then A" guards without a separate complement step. +- **No `Rc` / `OE`.** Doesn't update CR0 or XER. +- **Not synchronising.** Reorderable. +- **xenia status.** Decoded by the generic CR-logical handler; canary emits OR-with-NOT directly. + +## Related Instructions + +- [`cror`](cror.md), [`crnor`](crnor.md) — non-complement OR / NOR. +- [`crand`](crand.md), [`crandc`](crandc.md), [`crnand`](crnand.md) — AND family. +- [`crxor`](crxor.md), [`creqv`](creqv.md) — XOR / XNOR. +- [`mcrf`](mcrf.md) — bulk CR-field copy. +- [`bcx`](../branch/bcx.md) — typical consumer. + +`crorc` has no dedicated simplified mnemonic. See [`cror`](cror.md) for `crmove`. + +## IBM Reference + +- [AIX 7.3 — `crorc` (Condition Register OR with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crorc-condition-register-complement-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/crxor.md b/migration/project-root/ppc-manual/control/crxor.md new file mode 100644 index 0000000..35f79e5 --- /dev/null +++ b/migration/project-root/ppc-manual/control/crxor.md @@ -0,0 +1,124 @@ +# `crxor` — Condition Register XOR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000182` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `crxor` | `crxor` | — | Condition Register XOR | + +## Syntax + +```asm +crxor [CRBD], [CRBA], [CRBB] +``` + +## Encoding + +### `crxor` — form `XL` + +- **Opcode word:** `0x4c000182` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `193` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRBA` | crxor: read | CR source bit A (0–31). | +| `CRBB` | crxor: read | CR source bit B (0–31). | +| `CRBD` | crxor: write | CR destination bit (0–31). | + +## Register Effects + +### `crxor` + +- **Reads (always):** `CRBA`, `CRBB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRBD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`crxor`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="crxor"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:415`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L415) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:17`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L17) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:715`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L715) + + + +## Special Cases & Edge Conditions + +- **Operation.** `CR[CRBD] ← CR[CRBA] XOR CR[CRBB]`. Result is 1 iff the two source bits differ. +- **`crclr BT` idiom.** With identical operands, `crxor BT, BT, BT` always yields 0 (any bit XOR'd with itself is 0). This is the canonical PowerPC **clear-to-0** for a single CR bit; assemblers recognise the simplified mnemonic `crclr BT`. Compilers emit it before variadic-argument calls (PPC ABI uses `cr1.SO` to flag presence of FP arguments). +- **Bit-level operands.** Three independent 5-bit absolute CR-bit indices (0..31). +- **Use case.** Branch on "A != B"; or, with the `crclr` idiom, zero a CR bit before fall-through CR computation. +- **No `Rc` / `OE`.** No CR0 / XER side effects. +- **Not synchronising.** Reorderable. +- **xenia status.** Common enough in real code (typically as `crclr 6` for the variadic-FP marker) that translators often special-case the `crclr` pattern. xenia-canary's `InstrEmit_crxor` emits a host XOR; xenia-rs decodes via the generic CR-logical handler. + +## Related Instructions + +- [`creqv`](creqv.md) — the dual; `creqv BT, BT, BT` is the standard **set-to-1** idiom. +- [`crand`](crand.md), [`crandc`](crandc.md), [`crnand`](crnand.md) — AND family. +- [`cror`](cror.md), [`crorc`](crorc.md), [`crnor`](crnor.md) — OR family. +- [`mcrf`](mcrf.md) — bulk CR-field copy. +- [`bcx`](../branch/bcx.md), [`bclrx`](../branch/bclrx.md) — typical consumers of synthesised CR bits. + +### Simplified Mnemonics + +| Simplified | Expansion | Effect | +| --- | --- | --- | +| `crclr BT` | `crxor BT, BT, BT` | force `CR[BT] ← 0` | + +## IBM Reference + +- [AIX 7.3 — `crxor` (Condition Register XOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-crxor-condition-register-xor-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/mcrf.md b/migration/project-root/ppc-manual/control/mcrf.md new file mode 100644 index 0000000..b40b114 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mcrf.md @@ -0,0 +1,129 @@ +# `mcrf` — Move Condition Register Field + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XL](../forms/XL.md) · **Opcode:** `0x4c000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mcrf` | `mcrf` | — | Move Condition Register Field | + +## Syntax + +```asm +mcrf [CRFD], [CRFS] +``` + +## Encoding + +### `mcrf` — form `XL` + +- **Opcode word:** `0x4c000000` +- **Primary opcode (bits 0–5):** `19` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRFS` | mcrf: read | CR source field. | +| `CRFD` | mcrf: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `mcrf` + +- **Reads (always):** `CRFS` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mcrf`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mcrf"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:424`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L424) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:51`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L51) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:710`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L710) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1683-1686`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1683-L1686) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mcrf => { + ctx.cr[instr.crfd()] = ctx.cr[instr.crfs()]; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Field-level (4-bit) move.** Unlike the bit-level CR-logical family ([`crand`](crand.md), …, [`crxor`](crxor.md)), `mcrf` copies *all four* bits of a CR field (LT, GT, EQ, SO) in one instruction. `CRFD` and `CRFS` are 3-bit field indices (0..7), each naming a 4-bit slice of the 32-bit CR. +- **No source-field clobber.** The source field is read, not modified — `mcrf 0, 1` copies CR1 into CR0 leaving CR1 intact. +- **Same-field is a NOP.** `mcrf cr0, cr0` reads-then-writes the same field; xenia's interpreter still does the assignment but the architectural state is unchanged. +- **Use case.** Promote a non-default compare result into `cr0` so a default-`cr0` simplified branch (`beq label`) can consume it without spelling out `cr1`/`cr2`/etc. The alternative — `crmove` — would require four `cror` instructions to move all four bits. +- **No CR0/XER side effects.** Pure CR-field dataflow. +- **Not synchronising.** Reorderable. +- **xenia exact match.** xenia-rs models the CR as an array of eight 4-bit fields, so `mcrf` is a single struct copy (`ctx.cr[crfd] = ctx.cr[crfs]`). Matches PowerISA semantics exactly. + +## Related Instructions + +- [`mfcr`](mfcr.md) — read the entire 32-bit CR into a GPR. +- [`mtcrf`](mtcrf.md) — write selected CR fields from a GPR (uses an 8-bit field-mask). +- [`mcrxr`](mcrxr.md) — copy `XER[SO..CA]` into a CR field and clear them. +- [`mcrfs`](mcrfs.md) — copy an FPSCR field into a CR field. +- [`crand`](crand.md), [`crandc`](crandc.md), [`creqv`](creqv.md), [`crnand`](crnand.md), [`crnor`](crnor.md), [`cror`](cror.md), [`crorc`](crorc.md), [`crxor`](crxor.md) — bit-level alternatives. +- [`bcx`](../branch/bcx.md) — typical consumer. + +`mcrf` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mcrf` (Move Condition Register Field)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mcrf-move-condition-register-field-instruction) diff --git a/migration/project-root/ppc-manual/control/mcrfs.md b/migration/project-root/ppc-manual/control/mcrfs.md new file mode 100644 index 0000000..36ee430 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mcrfs.md @@ -0,0 +1,155 @@ +# `mcrfs` — Move to Condition Register from FPSCR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000080` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mcrfs` | `mcrfs` | — | Move to Condition Register from FPSCR | + +## Syntax + +```asm +mcrfs [CRFD], [CRFS] +``` + +## Encoding + +### `mcrfs` — form `X` + +- **Opcode word:** `0xfc000080` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `64` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CRFS` | mcrfs: read | CR source field. | +| `FPSCR` | mcrfs: read; mcrfs: write | Floating-Point Status and Control Register. | +| `CRFD` | mcrfs: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `mcrfs` + +- **Reads (always):** `CRFS`, `FPSCR` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD`, `FPSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `mcrfs`: **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mcrfs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mcrfs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:371`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L371) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:51`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L51) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:904`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L904) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4716-4745`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4716-L4745) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mcrfs => { + let crfd = instr.crfd(); + let crfs = instr.crfs(); + let shift = 28 - (crfs as u32 * 4); + let nibble = ((ctx.fpscr >> shift) & 0xF) as u8; + ctx.cr[crfd] = crate::context::CrField::from_u8(nibble); + // Clearable exception bits: 0 (FX), 3 (OX), 4 (UX), 5 (ZX), + // 6 (XX), 7 (VXSNAN), 8 (VXISI), 9 (VXIDI), 10 (VXZDZ), + // 11 (VXIMZ), 12 (VXVC), 21 (VXSOFT), 22 (VXSQRT), 23 (VXCVI). + // (Bit positions are PowerISA MSB-0; here 'FPSCR bit n' means + // the bit at (31-n) in our little-endian u32.) + const CLEARABLE_MASK: u32 = + (1 << 31) | (1 << (31 - 3)) | (1 << (31 - 4)) | + (1 << (31 - 5)) | (1 << (31 - 6)) | (1 << (31 - 7)) | + (1 << (31 - 8)) | (1 << (31 - 9)) | (1 << (31 - 10)) | + (1 << (31 - 11)) | (1 << (31 - 12)) | + (1 << (31 - 21)) | (1 << (31 - 22)) | (1 << (31 - 23)); + let nibble_mask = 0xFu32 << shift; + ctx.fpscr &= !(nibble_mask & CLEARABLE_MASK); + // PPCBUG-068: recompute the VX summary bit. If any VX* exception + // bit remains set, VX must remain set; if all are cleared, VX + // must clear. (FEX recomputation omitted — xenia doesn't model + // enabled-exception dispatch.) + if ctx.fpscr & fpscr::VX_ALL != 0 { + ctx.fpscr |= fpscr::VX; + } else { + ctx.fpscr &= !fpscr::VX; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Copies one 4-bit FPSCR field into the chosen CR field, then **clears the source FPSCR exception-status bits** (sticky-bit reset). The non-exception status bits (FPRF, etc.) are *not* cleared. +- **Bits cleared in FPSCR.** The architectural rule is: any bit in the source FPSCR field that is one of {FX, OX, UX, ZX, XX, VXSNAN, VXISI, VXIDI, VXZDZ, VXIMZ, VXVC, VXSOFT, VXSQRT, VXCVI} is reset to 0 after the copy. FEX and VX (summary bits) are subsequently re-derived. Many other FPSCR bits (rounding mode, FPRF, FR/FI) are not affected — even if they fall in `CRFS`. +- **CR field destination.** `CRFD` is a 3-bit field index (0..7); the four bits land in their natural positions (LT, GT, EQ, SO) of the chosen CR field. After `mcrfs`, `crf` can be tested with the usual conditional branches. +- **Use case.** Inspect a particular FPSCR exception group, then act on it with a `bc` — e.g. test FPSCR[24..27] (the FI / FR / VXSNAN / VXISI cluster) and branch. +- **Privilege.** Non-privileged on the Xenon — application-visible. +- **xenia status.** Decoded (decoder slot 727), but the interpreter does **not** ship a body in the snapshot on this page — `mcrfs` is rare in title code. xenia's FPSCR model is incomplete (most exception bits are stubbed), so even when implemented, the cleared bits typically have no observable effect. +- **No `Rc`.** X-form, but the `Rc` bit position is unused (reserved 0). + +## Related Instructions + +- [`mffsx`](mffsx.md) — read entire FPSCR into an FPR. +- [`mtfsfx`](mtfsfx.md), [`mtfsb0x`](mtfsb0x.md), [`mtfsb1x`](mtfsb1x.md), [`mtfsfix`](mtfsfix.md) — write FPSCR bits/fields. +- [`mcrf`](mcrf.md) — copy a CR field to another CR field. +- [`mcrxr`](mcrxr.md) — analogous copy from XER (also clears). + +`mcrfs` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mcrfs` (Move to Condition Register from FPSCR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mcrfs-move-condition-register-from-fpscr-instruction) +- PowerISA v2.07B, Book I §4.6 — FPSCR layout (sticky exception bits and which clear semantics apply). diff --git a/migration/project-root/ppc-manual/control/mcrxr.md b/migration/project-root/ppc-manual/control/mcrxr.md new file mode 100644 index 0000000..ebfaae3 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mcrxr.md @@ -0,0 +1,145 @@ +# `mcrxr` — Move to Condition Register from XER + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000400` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mcrxr` | `mcrxr` | — | Move to Condition Register from XER | + +## Syntax + +```asm +mcrxr [CRFD] +``` + +## Encoding + +### `mcrxr` — form `X` + +- **Opcode word:** `0x7c000400` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `512` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CR` | mcrxr: read | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `CRFD` | mcrxr: write | CR destination field (`crf`, 0–7). | + +## Register Effects + +### `mcrxr` + +- **Reads (always):** `CR` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mcrxr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mcrxr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:433`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L433) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:51`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L51) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:814`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L814) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4694-4706`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4694-L4706) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mcrxr => { + let crfd = instr.crfd(); + ctx.cr[crfd] = crate::context::CrField { + lt: ctx.xer_so != 0, + gt: ctx.xer_ov != 0, + eq: ctx.xer_ca != 0, + so: false, + }; + ctx.xer_so = 0; + ctx.xer_ov = 0; + ctx.xer_ca = 0; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Copies XER's top 4 status bits into a CR field, **then atomically clears those XER bits**. Layout in the destination CR field after the move: + + | CR bit | Source XER bit | Meaning | + | --- | --- | --- | + | LT | XER[SO] | summary overflow (sticky) | + | GT | XER[OV] | overflow (last `OE=1` op) | + | EQ | XER[CA] | carry | + | SO | 0 (cleared) | — | + +- **Sticky-bit reset.** XER[SO], XER[OV], and XER[CA] are all *zeroed* after the copy. This is the **only** architecturally clean way to sample-then-clear XER's overflow/carry state — `mfxer` reads but does not clear. +- **Use case.** Saturating-arithmetic loops sample XER[OV] periodically; `mcrxr cr0; bso cr0, overflow` is the canonical "did overflow happen since last check?" idiom. +- **CR field destination.** `CRFD` is a 3-bit index (0..7). All other CR fields are preserved. +- **No reads of GPRs.** `mcrxr` reads only XER, writes only the chosen CR field and XER. +- **xenia exact match.** xenia-rs implements the full sample-and-clear semantics: writes `lt = SO`, `gt = OV`, `eq = CA`, `so = false`, then zeroes `xer_so`, `xer_ov`, `xer_ca`. Matches PowerISA exactly. +- **Deprecated in newer PowerISA.** PowerISA v2.06+ marked `mcrxr` deprecated in favour of `mcrxrx` and explicit `mfxer`/`mtxer` patterns, but the Xenon predates that; titles still emit it freely. + +## Related Instructions + +- [`mfcr`](mfcr.md), [`mtcrf`](mtcrf.md) — bulk CR <-> GPR moves. +- [`mcrf`](mcrf.md) — CR-field copy. +- [`mcrfs`](mcrfs.md) — analogous copy from FPSCR (also clears certain bits). +- [`mfspr`](mfspr.md) (with `SPR=1`, i.e. `mfxer`) — non-clearing read of XER into a GPR. +- [`mtspr`](mtspr.md) (with `SPR=1`, i.e. `mtxer`) — explicit XER write. + +`mcrxr` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mcrxr` (Move to Condition Register from XER)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mcrxr-move-condition-register-from-xer-instruction) diff --git a/migration/project-root/ppc-manual/control/mfcr.md b/migration/project-root/ppc-manual/control/mfcr.md new file mode 100644 index 0000000..9cccdca --- /dev/null +++ b/migration/project-root/ppc-manual/control/mfcr.md @@ -0,0 +1,117 @@ +# `mfcr` — Move from Condition Register + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000026` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mfcr` | `mfcr` | — | Move from Condition Register | + +## Syntax + +```asm +mfcr [RD] +``` + +## Encoding + +### `mfcr` — form `X` + +- **Opcode word:** `0x7c000026` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `19` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `CR` | mfcr: read | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `RD` | mfcr: write | Destination GPR. | + +## Register Effects + +### `mfcr` + +- **Reads (always):** `CR` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +RT <- 0x00000000 || CR +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mfcr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mfcr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:625`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L625) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:753`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L753) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1627-1630`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1627-L1630) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mfcr => { + ctx.gpr[instr.rd()] = ctx.cr() as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Packs all 32 CR bits into the low half of `RD`; the upper 32 bits of `RD` are zeroed. CR field 0 ends up in bits 32..35 of `RD` (i.e. bits 0..3 of the 32-bit packed value), CR field 7 in bits 60..63 (bits 28..31). +- **No CR side effect.** `mfcr` is a read; CR is unmodified. The XL-form's nominal `Rc` bit is unused on this opcode. +- **Saving CR across calls.** The Xbox 360 / SysV ABI requires non-volatile CR fields (CR2..CR4) to be preserved across calls. Standard prologue: `mfcr r12; stw r12, 8(r1)`. Epilogue restores via [`mtcrf`](mtcrf.md). +- **Bit ordering.** PowerPC numbers bits big-endian (bit 0 = MSB). The encoding into the GPR follows the same convention: CR0.LT lands in bit 32 of the doubleword (the MSB of the low word). C-side translations should mask with `0xFFFFFFFFu` before consuming. +- **`mfocrf` variant.** PowerISA defines `mfocrf` (one CR field), encoded as `mfcr` with the high bit of FXM set. xenia-rs decodes both as the same opcode and ignores the FXM hint, returning the entire CR. This is benign — the spec says implementations may treat `mfocrf` as `mfcr`. +- **Not synchronising.** Reorderable. +- **xenia exact match.** xenia-rs packs its eight `CrField` structs into a `u64` via `ctx.cr()`, mirroring spec semantics. + +## Related Instructions + +- [`mtcrf`](mtcrf.md) — inverse: write selected CR fields from a GPR. +- [`mcrf`](mcrf.md), [`mcrxr`](mcrxr.md), [`mcrfs`](mcrfs.md) — narrower CR-field moves. +- [`mfspr`](mfspr.md), [`mtspr`](mtspr.md) — generic SPR moves; CR is *not* an SPR (it has its own opcode). + +`mfcr` has no simplified mnemonics. `mfocrf RT, FXM` is a related encoding handled by the same xenia-rs slot. + +## IBM Reference + +- [AIX 7.3 — `mfcr` (Move from Condition Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mfcr-move-from-condition-register-instruction) diff --git a/migration/project-root/ppc-manual/control/mffsx.md b/migration/project-root/ppc-manual/control/mffsx.md new file mode 100644 index 0000000..344fb6e --- /dev/null +++ b/migration/project-root/ppc-manual/control/mffsx.md @@ -0,0 +1,132 @@ +# `mffsx` — Move from FPSCR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00048e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mffs` | `mffsx` | — | Move from FPSCR | +| `mffs.` | `mffsx` | Rc=1 | Move from FPSCR | + +## Syntax + +```asm +mffs[Rc] [RD] +``` + +## Encoding + +### `mffsx` — form `X` + +- **Opcode word:** `0xfc00048e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `583` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FPSCR` | mffsx: read | Floating-Point Status and Control Register. | +| `FD` | mffsx: write | Destination floating-point register. | +| `CR` | mffsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mffsx` + +- **Reads (always):** `FPSCR` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mffsx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mffsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mffsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:397`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L397) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:910`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L910) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3035-3040`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3035-L3040) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mffsx => { + // Move from FPSCR: frD = FPSCR as double (low 32 bits) + ctx.fpr[instr.rd()] = f64::from_bits(ctx.fpscr as u64); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Reads the 32-bit FPSCR and places it in the **low 32 bits** of `FRT`. The high 32 bits of the destination FPR are architecturally undefined; xenia leaves them as the bit-pattern of the FPSCR cast to `u64` (i.e. the high bits are zero, since FPSCR is 32-bit). PowerISA explicitly permits implementations to leave anything there. +- **Destination is an FPR, not a GPR.** Use [`stfd`](../memory/stfd.md) to spill the FPR to memory and reload via a GPR if the value is needed in the integer file. +- **`mffs.` (`Rc=1`) updates CR1.** The `Rc` bit copies the high four FPSCR bits (FX, FEX, VX, OX) into CR1's LT/GT/EQ/SO. xenia-rs implements this via `update_cr1_from_fpscr`. +- **No FPSCR side effect.** Pure read; FPSCR is not modified (unlike [`mcrfs`](mcrfs.md), which clears sticky exception bits). +- **xenia simplification.** xenia-rs models FPSCR as a `u32` field but **does not actively maintain** most of the IEEE-754 sticky bits — the FPU paths typically leave FPSCR untouched. So `mffs` will return whatever was last explicitly set (often 0 / boot defaults). Real titles use it mostly to save/restore the rounding-mode field around library calls, which xenia happens to handle correctly. +- **Not synchronising.** Reorderable with non-FPU instructions. + +## Related Instructions + +- [`mtfsfx`](mtfsfx.md) — write fields of FPSCR from an FPR (the inverse). +- [`mtfsb0x`](mtfsb0x.md), [`mtfsb1x`](mtfsb1x.md) — set/clear individual FPSCR bits. +- [`mtfsfix`](mtfsfix.md) — load a 4-bit immediate into one FPSCR field. +- [`mcrfs`](mcrfs.md) — copy an FPSCR field into a CR field (and clear sticky bits). +- [`mfspr`](mfspr.md) — for non-FPSCR special registers. + +`mffs` is the simplified mnemonic for the base form (`Rc=0`); `mffs.` is the recording variant. + +## IBM Reference + +- [AIX 7.3 — `mffs` (Move from FPSCR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mffs-move-from-fpscr-instruction) +- PowerISA v2.07B, Book I §4.6 — FPSCR layout and the high-half-undefined rule. diff --git a/migration/project-root/ppc-manual/control/mfmsr.md b/migration/project-root/ppc-manual/control/mfmsr.md new file mode 100644 index 0000000..0efaa82 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mfmsr.md @@ -0,0 +1,139 @@ +# `mfmsr` — Move from Machine State Register + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0000a6` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mfmsr` | `mfmsr` | — | Move from Machine State Register | + +## Syntax + +```asm +mfmsr [RD] +``` + +## Encoding + +### `mfmsr` — form `X` + +- **Opcode word:** `0x7c0000a6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `83` +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `MSR` | mfmsr: read | Machine State Register. | +| `RD` | mfmsr: write | Destination GPR. | + +## Register Effects + +### `mfmsr` + +- **Reads (always):** `MSR` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mfmsr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mfmsr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:814`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L814) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:771`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L771) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1645-1648`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1645-L1648) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mfmsr => { + ctx.gpr[instr.rd()] = ctx.msr; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Privileged.** `mfmsr` is supervisor-only; executing it from problem state on real hardware raises a Privileged Instruction interrupt. Xbox 360 game code never executes it directly — it appears only in the kernel image (`xboxkrnl.exe`) and in xenia's HLE bridge. +- **MSR layout (Xenon-relevant fields, big-endian bit numbering).** + + | Bit | Name | Meaning | + | --- | --- | --- | + | 32 | EE | external interrupts enabled | + | 33 | PR | problem state (1 = user) | + | 34 | FP | floating-point available | + | 35 | ME | machine-check enable | + | 38 | DR | data address translation | + | 39 | IR | instruction address translation | + | 50 | LE | little-endian (always 0 on Xenon) | + | 63 | RI | recoverable interrupt | + + The Xenon also exposes `MSR[SF]` (bit 0) = 1 for 64-bit mode; `MSR[HV]` (bit 3) for hypervisor. See PowerISA Book III for the full table. +- **Synchronisation.** Marked `sync` in xenia's XML — `mfmsr` is execution-synchronising on real hardware (drains the pipeline before sampling MSR). +- **xenia model.** xenia-rs stores MSR as a flat `u64` and returns it raw. No real bit semantics are modelled — the kernel HLE never observes individual MSR fields. The interpreter ignores privilege. +- **Read of an undocumented field returns 0.** Most of the MSR is zero in xenia because no path explicitly initialises it. + +## Related Instructions + +- [`mtmsr`](mtmsr.md) — write MSR from a GPR (32-bit form). +- [`mtmsrd`](mtmsrd.md) — write the full 64-bit MSR (PPC64 form). +- [`mfspr`](mfspr.md) — for non-MSR special registers; MSR has its own dedicated opcode. +- [`sc`](../branch/sc.md) — kernel entry where MSR transitions occur via `rfid`/`hrfid`. + +`mfmsr` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mfmsr` (Move from Machine State Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mfmsr-move-from-machine-state-register-instruction) +- PowerISA v2.07B, Book III §4.3 — MSR field definitions. diff --git a/migration/project-root/ppc-manual/control/mfspr.md b/migration/project-root/ppc-manual/control/mfspr.md new file mode 100644 index 0000000..57ca1d4 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mfspr.md @@ -0,0 +1,179 @@ +# `mfspr` — Move from Special-Purpose Register + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XFX](../forms/XFX.md) · **Opcode:** `0x7c0002a6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mfspr` | `mfspr` | — | Move from Special-Purpose Register | + +## Syntax + +```asm +mfspr [RD], [SPR] +``` + +## Encoding + +### `mfspr` — form `XFX` + +- **Opcode word:** `0x7c0002a6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `339` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination / source GPR | +| 11–20 | `spr/tbr/FXM` | SPR/TBR number (byte-swapped halves) or CR field mask | +| 21–30 | `XO` | extended opcode | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `SPR` | mfspr: read | Special-Purpose-Register number. Encoded with the two 5-bit halves swapped (bits 11-15 become the high half, bits 16-20 the low half). | +| `RD` | mfspr: write | Destination GPR. | + +## Register Effects + +### `mfspr` + +- **Reads (always):** `SPR` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +n <- spr_number(SPR) ; SPR field has its two 5-bit halves swapped +RT <- SPR(n) +``` + +## C Translation Example + +```c +/* mfspr RT, SPR — SPR field has swapped halves */ +uint32_t n = ((insn.SPR & 0x1F) << 5) | ((insn.SPR >> 5) & 0x1F); +switch (n) { + case 1: r[insn.RT] = xer_pack(); break; /* XER */ + case 8: r[insn.RT] = lr; break; /* LR */ + case 9: r[insn.RT] = ctr; break; /* CTR */ + case 256: r[insn.RT] = vrsave; break; /* VRSAVE*/ + case 268: r[insn.RT] = tb & 0xFFFFFFFFu; break; /* TBL */ + case 269: r[insn.RT] = tb >> 32; break; /* TBU */ + default: r[insn.RT] = 0; break; +} +``` + +## Implementation References + +**`mfspr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mfspr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:666`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L666) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:799`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L799) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1567-1595`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1567-L1595) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mfspr => { + let spr = instr.spr(); + ctx.gpr[instr.rd()] = match spr { + crate::context::spr::XER => ctx.xer() as u64, + crate::context::spr::LR => ctx.lr, + crate::context::spr::CTR => ctx.ctr, + crate::context::spr::DEC => ctx.dec as u64, + crate::context::spr::TBL => ctx.timebase & 0xFFFF_FFFF, + crate::context::spr::TBU => ctx.timebase >> 32, + crate::context::spr::VRSAVE => ctx.vrsave as u64, + // Xbox 360 Xenon processor signature (from canary). + crate::context::spr::PVR => 0x0071_0800, + // Benign SPRs — titles read these but we don't model them. + crate::context::spr::SPRG0 + | crate::context::spr::SPRG1 + | crate::context::spr::SPRG2 + | crate::context::spr::SPRG3 + | crate::context::spr::HID0 + | crate::context::spr::HID1 + | crate::context::spr::DAR + | crate::context::spr::DSISR + | crate::context::spr::PIR => 0, + _ => { + tracing::warn!("mfspr: unimplemented SPR {}", spr); + 0 + } + }; + ctx.pc += 4; + } +``` +
+ + + +## SPR Number Encoding — the "halves swap" + +The 10-bit `spr` field in the XFX form is **stored in a transposed order**: the bits that software names the *high* half (bits 5..9 of the SPR number) occupy instruction bits **16..20**, and the *low* half (bits 0..4) occupies instruction bits **11..15**. Software (and this manual) always refers to the logical, unswapped SPR number. + +``` +decoded_spr = ((field & 0x1F) << 5) | ((field >> 5) & 0x1F) +``` + +So a programmer writing `mfspr RT, 8` (read LR) encodes `spr-field = 0x100` — *not* `8`. Assemblers handle this transparently; disassemblers reverse it. When writing a translator that parses raw instruction words, swap the halves explicitly. + +## SPR Map (Xenon subset modelled by xenia) + +| Decoded # | Name | Meaning | xenia-rs behaviour | +| --- | --- | --- | --- | +| 1 | `XER` | Fixed-point exception register (CA / OV / SO + length field) | packed with `ctx.xer()` | +| 8 | `LR` | Link register | `ctx.lr` | +| 9 | `CTR` | Count register | `ctx.ctr` | +| 18 | `DSISR` | Data-storage interrupt syndrome | returns 0 (stubbed) | +| 19 | `DAR` | Data-access register | returns 0 (stubbed) | +| 256 | `VRSAVE` | Vector-register save mask | `ctx.vrsave` | +| 268 | `TBL` | Time-base lower 32 bits | `ctx.timebase & 0xFFFFFFFF` | +| 269 | `TBU` | Time-base upper 32 bits | `ctx.timebase >> 32` | +| 272–275 | `SPRG0..3` | Software scratch registers (kernel) | returns 0 (stubbed) | +| 287 | `PVR` | Processor-version register | `0x00710800` (Xenon signature) | +| 1008–1009 | `HID0/1` | Hardware implementation registers | returns 0 (stubbed) | +| 1023 | `PIR` | Processor-ID register | returns 0 (stubbed) | + +Unrecognised SPRs return 0 and log a warning. Games rarely read unmodelled SPRs; when they do it's usually clock-skew or sanity checks. + +## Special Cases & Edge Conditions + +- **Privilege.** Some SPRs are privileged on real hardware (MSR, HID0/1, SPRG0..3, DSISR, DAR, PIR). Xbox 360 titles run in a mixed privilege model under the hypervisor; xenia exposes all SPRs without a privilege check because the captured title binaries never contain a real privileged read that should trap. +- **`LR` and `CTR` have dedicated simplified mnemonics.** Assemblers recognise `mflr RT` ≡ `mfspr RT, 8` and `mfctr RT` ≡ `mfspr RT, 9`. Similarly `mfxer RT` ≡ `mfspr RT, 1`. Disassemblers emit the simplified forms; the translation agent should map both forms to the same abstract operation. +- **`mftb` vs. `mfspr TBL/TBU`.** Reading the time-base has a dedicated X-form variant [`mftb`](mftb.md) that uses a separate opcode. Post-Xbox-360 PowerISA deprecated `mfspr TBL/TBU`, but xenia accepts both. Prefer `mftb` in new translations. +- **Side-effect-free.** `mfspr` has no effect on any register beyond `RT`. It can be freely reordered with non-SPR-touching instructions. +- **No `Rc` / `OE`.** This is an XFX-form instruction; bit 31 is reserved (0). + +## Related Instructions + +- [`mtspr`](mtspr.md) — the inverse; write a GPR to an SPR. +- [`mftb`](mftb.md) — read time-base (preferred over `mfspr TBL/TBU`). +- [`mflr`](mfspr.md), [`mfctr`](mfspr.md), [`mfxer`](mfspr.md) — simplified mnemonics of this instruction. +- [`mcrxr`](mcrxr.md) — move `XER[SO..CA]` to a CR field and clear them. + +## Simplified Mnemonics + +| Simplified | Expansion | +| --- | --- | +| `mfxer RT` | `mfspr RT, 1` | +| `mflr RT` | `mfspr RT, 8` | +| `mfctr RT` | `mfspr RT, 9` | + +## IBM Reference + +- [AIX 7.3 — `mfspr` (Move from Special Purpose Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mfspr-move-from-special-purpose-register-instruction) +- [PowerISA v2.07B — SPR number table and privilege rules](https://openpowerfoundation.org/specifications/isa/) diff --git a/migration/project-root/ppc-manual/control/mftb.md b/migration/project-root/ppc-manual/control/mftb.md new file mode 100644 index 0000000..dd0298d --- /dev/null +++ b/migration/project-root/ppc-manual/control/mftb.md @@ -0,0 +1,150 @@ +# `mftb` — Move from Time Base + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XFX](../forms/XFX.md) · **Opcode:** `0x7c0002e6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mftb` | `mftb` | — | Move from Time Base | + +## Syntax + +```asm +mftb [RD], [TBR] +``` + +## Encoding + +### `mftb` — form `XFX` + +- **Opcode word:** `0x7c0002e6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `371` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination / source GPR | +| 11–20 | `spr/tbr/FXM` | SPR/TBR number (byte-swapped halves) or CR field mask | +| 21–30 | `XO` | extended opcode | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `TBR` | mftb: read | Time-Base Register selector for `mftb`. | +| `RD` | mftb: write | Destination GPR. | + +## Register Effects + +### `mftb` + +- **Reads (always):** `TBR` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mftb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mftb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:719`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L719) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:803`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L803) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1664-1672`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1664-L1672) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mftb => { + let tbr = instr.spr(); + ctx.gpr[instr.rd()] = match tbr { + 268 => ctx.timebase & 0xFFFF_FFFF, + 269 => ctx.timebase >> 32, + _ => 0, + }; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Time-base register selectors.** The 10-bit `tbr` field encodes the same way as `mfspr`'s `spr` field (two halves swapped). The two values defined for the Xenon: + + | Decoded | Name | Meaning | + | --- | --- | --- | + | 268 | TBL | Time Base, lower 32 bits | + | 269 | TBU | Time Base, upper 32 bits | + + Other selectors return 0 in xenia and are not used by titles. +- **Atomic 64-bit read pattern.** Because `mftb` reads only 32 bits at a time, software performs the canonical retry loop to avoid TBL→TBU rollover skew: + ```asm + retry: + mftbu rH ; read upper + mftb rL ; read lower (TBR=268) + mftbu rH2 ; read upper again + cmpw rH, rH2 + bne retry + ``` +- **Xenon clock rate.** Real hardware ticks the time base at ~3.2 GHz (one tick per CPU clock divided by the architectural ratio). The PVR signature the kernel exposes (`0x00710800`) and the kernel-reported tick rate jointly let titles convert TB ticks to seconds. +- **xenia behaviour.** xenia-rs stores `ctx.timebase` as a `u64` and **increments it once per interpreted instruction**, not per real-time wall clock. This guarantees deterministic replay (same trace ⇒ same TB readings) at the cost of decoupling guest time from host time. Games that rely on TB for real-time sync will run faster or slower depending on host throughput. +- **`mftb RT` (no operand)** is the simplified mnemonic for `mftb RT, 268` — read the lower half. `mftbu RT` ≡ `mftb RT, 269`. +- **Deprecated alternative.** `mfspr RT, 268`/`269` works on the Xenon (xenia accepts both) but post-PowerISA v2.06 deprecated reading TB through `mfspr`. Prefer `mftb`. + +## Related Instructions + +- [`mfspr`](mfspr.md) — generic SPR read; can also read TBL/TBU on Xenon (deprecated). +- [`mtspr`](mtspr.md) — TBL/TBU writes are privileged; not user-accessible. +- [`isync`](mtmsr.md) — context-synchronising fence sometimes paired with `mftb` for tight measurement loops. + +### Simplified Mnemonics + +| Simplified | Expansion | Notes | +| --- | --- | --- | +| `mftb RT` | `mftb RT, 268` | read TBL | +| `mftbu RT` | `mftb RT, 269` | read TBU | + +## IBM Reference + +- [AIX 7.3 — `mftb` (Move from Time Base)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mftb-move-from-time-base-instruction) +- PowerISA v2.07B, Book II §6.1 — Time Base description and the canonical 64-bit read sequence. diff --git a/migration/project-root/ppc-manual/control/mfvscr.md b/migration/project-root/ppc-manual/control/mfvscr.md new file mode 100644 index 0000000..bccfa37 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mfvscr.md @@ -0,0 +1,137 @@ +# `mfvscr` — Move from VSCR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000604` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mfvscr` | `mfvscr` | — | Move from VSCR | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `mfvscr` — form `VX` + +- **Opcode word:** `0x10000604` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1540` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VSCR` | mfvscr: read | Vector Status and Control Register (NJ/SAT bits). | +| `VD` | mfvscr: write | Destination vector register. | + +## Register Effects + +### `mfvscr` + +- **Reads (always):** `VSCR` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mfvscr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mfvscr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:303`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L303) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:53`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L53) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:539`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L539) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2506-2513`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2506-L2513) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mfvscr => { + // PPCBUG-080: ISA places VSCR in the rightmost word of VD with + // bytes 0-11 zeroed. Previously the full 128-bit ctx.vscr was + // copied (leaking stale upper data to guest). + let vscr_word = ctx.vscr.as_u32x4()[3]; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array([0, 0, 0, vscr_word]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Reads the 32-bit Vector Status and Control Register (VSCR) into the **low 32 bits of the rightmost word** of `VD` (the 128-bit vector register). The other 96 bits of `VD` are zeroed. PowerISA places the result at byte offset 12..15 (big-endian within the 128-bit register). +- **VSCR contents (Xenon-relevant).** + + | Bit | Name | Meaning | + | --- | --- | --- | + | 16 | NJ | Non-Java mode (denormal handling for IEEE-754 single-prec vector ops) | + | 31 | SAT | Saturation — sticky; set whenever a saturating vector op clamps | + + All other bits are reserved (zero). +- **`SAT` is sticky.** Once a saturating vector instruction clamps a result, `VSCR[SAT]` becomes 1 and stays 1 until explicitly cleared via [`mtvscr`](mtvscr.md). Software polls it after a vector batch to detect overflow. +- **`NJ` controls denormals.** When `NJ=1` (the Xenon's default), AltiVec single-precision ops flush denormal inputs/outputs to zero (non-IEEE behaviour); `NJ=0` enforces full IEEE. +- **VRSAVE.** Writing the entire 128-bit `VD` consumes a vector register slot; software wishing to honour [`VRSAVE`](mtspr.md) bookkeeping should ensure the chosen `VD` is in the live mask. +- **xenia simplification.** xenia-rs stores VSCR as a single value of the same `vr` type (effectively a u128) and copies it directly into `ctx.vr[VD]`. Saturating ops in xenia-rs **do** maintain SAT correctly for the vector ops that are implemented; NJ is honoured for the denormal-flush paths but its effect is small in practice. +- **Not synchronising.** + +## Related Instructions + +- [`mtvscr`](mtvscr.md) — write VSCR from a vector register (the inverse). +- [`mfspr`](mfspr.md) — for non-vector status registers; VSCR has its own opcode. +- AltiVec saturating-arithmetic ops (e.g., `vaddubs`, `vsubuhs`) — primary writers of `VSCR[SAT]`. + +`mfvscr` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mfvscr` (Move from VSCR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mfvscr-move-from-vector-status-control-register-instruction) +- PowerISA v2.07B, Book I §6.6 — VSCR layout and the SAT / NJ definitions. diff --git a/migration/project-root/ppc-manual/control/mtcrf.md b/migration/project-root/ppc-manual/control/mtcrf.md new file mode 100644 index 0000000..e3b92b6 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtcrf.md @@ -0,0 +1,136 @@ +# `mtcrf` — Move to Condition Register Fields + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XFX](../forms/XFX.md) · **Opcode:** `0x7c000120` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtcrf` | `mtcrf` | — | Move to Condition Register Fields | + +## Syntax + +```asm +mtcrf [CRM], [RS] +``` + +## Encoding + +### `mtcrf` — form `XFX` + +- **Opcode word:** `0x7c000120` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `144` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination / source GPR | +| 11–20 | `spr/tbr/FXM` | SPR/TBR number (byte-swapped halves) or CR field mask | +| 21–30 | `XO` | extended opcode | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | mtcrf: read | Source GPR (alias for RD in some stores). | +| `CRM` | mtcrf: write | 8-bit CR field mask used by `mtcrf` — one bit per CR field. | + +## Register Effects + +### `mtcrf` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRM` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +for i in 0..7: + if CRM[i] then CR[i] <- (RS)[32+i*4 : 35+i*4] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtcrf`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtcrf"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:732`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L732) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:779`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L779) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1631-1644`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1631-L1644) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtcrf => { + let crm = instr.crm(); + let val = ctx.gpr[instr.rs()] as u32; + let old = ctx.cr(); + let mut new = old; + for i in 0..8u32 { + if crm & (1 << (7 - i)) != 0 { + let mask = 0xF << (28 - i * 4); + new = (new & !mask) | (val & mask); + } + } + ctx.set_cr(new); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`CRM` is an 8-bit field-mask, MSB-first.** Each bit of `CRM` corresponds to one CR field: `CRM[0]` (mask bit `0x80`) selects CR0, `CRM[1]` (`0x40`) selects CR1, …, `CRM[7]` (`0x01`) selects CR7. Each *set* mask bit causes the corresponding 4-bit slice of `RS[32:63]` to overwrite that CR field; clear mask bits leave the field untouched. +- **Slice positions inside `RS`.** Big-endian: bits 32..35 of `RS` map to CR0, bits 36..39 to CR1, …, bits 60..63 to CR7. The high 32 bits of `RS` are ignored. +- **`mtcr RS` simplified mnemonic.** When `CRM = 0xFF`, all eight CR fields are written; assemblers fold this into `mtcr RS`. This is the dominant form (function epilogue restoring the saved CR). +- **`mtocrf` variant.** PowerISA defines `mtocrf` as the single-field variant — encoded with the high bit of FXM set and exactly one CRM bit set. xenia-rs treats both as the same opcode and processes whatever `CRM` mask is present, so `mtocrf` works correctly without special handling. +- **Use case in ABI.** Save/restore non-volatile CR fields (CR2, CR3, CR4 on the Xbox 360 ABI). The standard restore is `lwz r12, 8(r1); mtcrf 0x38, r12` — `0x38` = bits for CR2|CR3|CR4 — preserving the volatile fields the callee may have already updated. +- **No CR0 / XER side effects.** `mtcrf` does not record into CR0; XER is untouched. +- **xenia exact match.** xenia-rs decomposes the CR into a `u32`, applies a per-field mask, and reassembles via `set_cr`. The 8-bit `CRM` walk matches the spec exactly. +- **Not synchronising.** Reorderable. + +## Related Instructions + +- [`mfcr`](mfcr.md) — read the entire CR into a GPR. +- [`mcrf`](mcrf.md) — copy one CR field to another (no GPR involved). +- [`mcrxr`](mcrxr.md), [`mcrfs`](mcrfs.md) — narrower CR-field moves from XER / FPSCR. +- [`crand`](crand.md), [`cror`](cror.md), … — bit-level CR manipulation. + +### Simplified Mnemonics + +| Simplified | Expansion | Notes | +| --- | --- | --- | +| `mtcr RS` | `mtcrf 0xFF, RS` | write all eight CR fields from low half of RS | + +`mtocrf RS, FXM` is a related encoding handled by the same xenia-rs slot. + +## IBM Reference + +- [AIX 7.3 — `mtcrf` (Move to Condition Register Fields)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtcrf-move-condition-register-fields-instruction) +- [AIX 7.3 — Condition register simplified mnemonics](https://www.ibm.com/docs/en/aix/7.3.0?topic=mnemonics-condition-register-logical-simplified) diff --git a/migration/project-root/ppc-manual/control/mtfsb0x.md b/migration/project-root/ppc-manual/control/mtfsb0x.md new file mode 100644 index 0000000..2c562af --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtfsb0x.md @@ -0,0 +1,133 @@ +# `mtfsb0x` — Move to FPSCR Bit 0 + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00008c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtfsb0` | `mtfsb0x` | — | Move to FPSCR Bit 0 | +| `mtfsb0.` | `mtfsb0x` | Rc=1 | Move to FPSCR Bit 0 | + +## Syntax + +```asm +mtfsb0[Rc] [FPSCRD] +``` + +## Encoding + +### `mtfsb0x` — form `X` + +- **Opcode word:** `0xfc00008c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `70` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FPSCRD` | mtfsb0x: write | FPSCR destination field. | +| `CR` | mtfsb0x: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mtfsb0x` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** `FPSCRD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mtfsb0x`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtfsb0x`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtfsb0x"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:406`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L406) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:905`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L905) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3055-3061`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3055-L3061) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtfsb0x => { + // Clear FPSCR bit crbd + let bit = instr.crbd(); + ctx.fpscr &= !(1 << (31 - bit)); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Clears (sets to 0) **a single named bit** of the 32-bit FPSCR. The bit is selected by `FPSCRD` (a 5-bit absolute index 0..31, big-endian: 0 = MSB = FX). +- **The mnemonic name "Bit 0" is misleading.** "0" refers to the *value being written*, not to bit position 0. Pair with [`mtfsb1x`](mtfsb1x.md) which writes a 1. +- **Restricted bits.** Per PowerISA, `mtfsb0` cannot clear bits 1 (FEX) or 2 (VX) — those are summary bits, derived from other FPSCR bits. xenia-rs does **not** enforce this restriction; it will happily flip any bit. In practice no Xbox 360 title relies on the restriction's enforcement. +- **`Rc=1`.** `mtfsb0.` (`Rc=1`) updates **CR1** with the high four FPSCR bits (FX, FEX, VX, OX) after the clear. This is the FPU's record-form analogue. +- **Common use.** Reset a sticky exception bit ahead of a sequence of FP ops you want to monitor (e.g. `mtfsb0 5` to clear ZX before a divide series, then read it back). +- **xenia simplification.** xenia-rs maintains FPSCR as a `u32` and does the bit clear correctly, but most downstream FP instructions in xenia **do not update** FPSCR exception bits — so monitoring them after `mtfsb0` will see the bits stay at their seed value. Acceptable for titles that use FPSCR only to manage rounding / non-exception state. +- **Not synchronising.** Reorderable. + +## Related Instructions + +- [`mtfsb1x`](mtfsb1x.md) — set a single FPSCR bit to 1. +- [`mtfsfx`](mtfsfx.md) — write fields of FPSCR from an FPR. +- [`mtfsfix`](mtfsfix.md) — write a 4-bit immediate into one FPSCR field. +- [`mffsx`](mffsx.md) — read FPSCR. +- [`mcrfs`](mcrfs.md) — copy FPSCR field → CR field (and clear sticky bits). + +`mtfsb0` is itself the simplified form (`Rc=0`); `mtfsb0.` is the recording variant. + +## IBM Reference + +- [AIX 7.3 — `mtfsb0` (Move to FPSCR Bit 0)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtfsb0-move-fpscr-bit-0-instruction) +- PowerISA v2.07B, Book I §4.6 — FPSCR bit definitions and the FX/FEX/VX restriction. diff --git a/migration/project-root/ppc-manual/control/mtfsb1x.md b/migration/project-root/ppc-manual/control/mtfsb1x.md new file mode 100644 index 0000000..4b936a1 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtfsb1x.md @@ -0,0 +1,133 @@ +# `mtfsb1x` — Move to FPSCR Bit 1 + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00004c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtfsb1` | `mtfsb1x` | — | Move to FPSCR Bit 1 | +| `mtfsb1.` | `mtfsb1x` | Rc=1 | Move to FPSCR Bit 1 | + +## Syntax + +```asm +mtfsb1[Rc] [FPSCRD] +``` + +## Encoding + +### `mtfsb1x` — form `X` + +- **Opcode word:** `0xfc00004c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `38` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FPSCRD` | mtfsb1x: write | FPSCR destination field. | +| `CR` | mtfsb1x: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mtfsb1x` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** `FPSCRD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mtfsb1x`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtfsb1x`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtfsb1x"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:411`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L411) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:902`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L902) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3062-3068`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3062-L3068) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtfsb1x => { + // Set FPSCR bit crbd + let bit = instr.crbd(); + ctx.fpscr |= 1 << (31 - bit); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Sets (writes 1 to) **a single named bit** of FPSCR. `FPSCRD` is a 5-bit absolute index (0..31), big-endian (0 = MSB = FX). +- **Mnemonic name.** "1" denotes the *value written*, not bit position. Pair with [`mtfsb0x`](mtfsb0x.md) for clears. +- **Restricted bits.** PowerISA forbids `mtfsb1` from setting FEX (bit 1) or VX (bit 2) directly — both are summary bits derived from other state. `mtfsb1` *can* set FX (bit 0), which is itself a sticky summary; this is occasionally used to force a `Program` interrupt for testing. xenia-rs does **not** enforce the restriction; setting summary bits will stick until cleared explicitly. +- **`Rc=1`.** `mtfsb1.` (`Rc=1`) updates CR1 with the high four FPSCR bits (FX, FEX, VX, OX) after the set. +- **Common use.** Force-set a sticky exception bit to test exception-handling code paths. Also seen in floating-point library setup that wants a known FPSCR seed. +- **xenia simplification.** Same caveat as `mtfsb0`: xenia maintains FPSCR but most FP paths don't read it, so the set has limited downstream effect. The bit will read back correctly via [`mffsx`](mffsx.md). +- **Not synchronising.** Reorderable. + +## Related Instructions + +- [`mtfsb0x`](mtfsb0x.md) — clear a single FPSCR bit. +- [`mtfsfx`](mtfsfx.md) — write 4-bit FPSCR fields from an FPR. +- [`mtfsfix`](mtfsfix.md) — write 4-bit immediate into a single FPSCR field. +- [`mffsx`](mffsx.md) — read FPSCR. +- [`mcrfs`](mcrfs.md) — FPSCR field → CR field (clears sticky bits). + +`mtfsb1` is the simplified form (`Rc=0`); `mtfsb1.` is the recording variant. + +## IBM Reference + +- [AIX 7.3 — `mtfsb1` (Move to FPSCR Bit 1)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtfsb1-move-fpscr-bit-1-instruction) +- PowerISA v2.07B, Book I §4.6 — FPSCR bit definitions and FX/FEX/VX restriction. diff --git a/migration/project-root/ppc-manual/control/mtfsfix.md b/migration/project-root/ppc-manual/control/mtfsfix.md new file mode 100644 index 0000000..9e258d5 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtfsfix.md @@ -0,0 +1,135 @@ +# `mtfsfix` — Move to FPSCR Field Immediate + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00010c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtfsfi` | `mtfsfix` | — | Move to FPSCR Field Immediate | +| `mtfsfi.` | `mtfsfix` | Rc=1 | Move to FPSCR Field Immediate | + +## Syntax + +```asm +mtfsfi[Rc] [CRFD], [IMM] +``` + +## Encoding + +### `mtfsfix` — form `X` + +- **Opcode word:** `0xfc00010c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `134` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `IMM` | mtfsfix: read | Generic immediate field. | +| `CRFD` | mtfsfix: write | CR destination field (`crf`, 0–7). | +| `CR` | mtfsfix: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mtfsfix` + +- **Reads (always):** `IMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mtfsfix`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtfsfix`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtfsfix"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:452`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L452) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:907`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L907) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3069-3077`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3069-L3077) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtfsfix => { + // Move to FPSCR field immediate: crfD = IMM (4 bits) + let crfd = instr.crfd(); + let imm = (instr.raw >> 12) & 0xF; + let shift = 28 - crfd as u32 * 4; + ctx.fpscr = (ctx.fpscr & !(0xF << shift)) | (imm << shift); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Loads a **4-bit immediate** (`IMM`, encoded in instruction bits 16..19) into a single FPSCR field selected by `CRFD` (3-bit, 0..7). All other FPSCR fields are preserved. +- **Most common use: rounding-mode set.** `mtfsfi 7, 0` selects round-to-nearest, `mtfsfi 7, 1` round-toward-zero, `mtfsfi 7, 2` round-toward-+∞, `mtfsfi 7, 3` round-toward-−∞. The four immediate values map to RN per IEEE-754. Compilers emit this when transitioning into and out of strict-IEEE regions. +- **No FPR source.** Unlike [`mtfsfx`](mtfsfx.md), `mtfsfi` doesn't need an FPR — it carries its 4-bit value in the instruction word, making it cheaper for constant updates. +- **`Rc=1`.** `mtfsfi.` copies FPSCR's top 4 bits (FX, FEX, VX, OX) into CR1 after the write. +- **Restrictions in newer PowerISA.** v2.05+ disallows writing FEX/VX (summary bits) via `mtfsfi`. xenia-rs does **not** enforce this — the immediate goes straight into the chosen field. +- **xenia simplification.** xenia stores FPSCR as a `u32` and applies the field-shift correctly. Same caveat as `mtfsf`: most xenia FP paths don't honour FPSCR, so the rounding-mode change is architecturally visible (via [`mffsx`](mffsx.md)) but typically does not change subsequent FP results. +- **Not synchronising.** PowerISA recommends `isync` after rounding-mode changes if subsequent FP correctness depends on the new mode. + +## Related Instructions + +- [`mtfsfx`](mtfsfx.md) — write FPSCR fields from an FPR (uses 8-bit field-mask). +- [`mtfsb0x`](mtfsb0x.md), [`mtfsb1x`](mtfsb1x.md) — clear / set a single FPSCR bit. +- [`mffsx`](mffsx.md) — read FPSCR. +- [`mcrfs`](mcrfs.md) — FPSCR field → CR field (also clears sticky bits). + +`mtfsfi` is the simplified form (`Rc=0`); `mtfsfi.` is the recording variant. + +## IBM Reference + +- [AIX 7.3 — `mtfsfi` (Move to FPSCR Field Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtfsfi-move-fpscr-field-immediate-instruction) +- PowerISA v2.07B, Book I §4.6.6 — FPSCR-field write semantics and the FEX/VX restriction. diff --git a/migration/project-root/ppc-manual/control/mtfsfx.md b/migration/project-root/ppc-manual/control/mtfsfx.md new file mode 100644 index 0000000..e0d6fd6 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtfsfx.md @@ -0,0 +1,142 @@ +# `mtfsfx` — Move to FPSCR Fields + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XFL](../forms/XFL.md) · **Opcode:** `0xfc00058e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtfsf` | `mtfsfx` | — | Move to FPSCR Fields | +| `mtfsf.` | `mtfsfx` | Rc=1 | Move to FPSCR Fields | + +## Syntax + +```asm +mtfsf[Rc] [FM], [FB] +``` + +## Encoding + +### `mtfsfx` — form `XFL` + +- **Opcode word:** `0xfc00058e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `711` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (63) | +| 6 | `L` | field-select behaviour | +| 7–14 | `FM` | FPSCR field mask | +| 15 | `W` | immediate-value flag | +| 16–20 | `FRB` | source FPR | +| 21–30 | `XO` | extended opcode | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FM` | mtfsfx: read | 8-bit FPSCR field-mask used by `mtfsf`. | +| `FB` | mtfsfx: read | Source B floating-point register. | +| `FPSCR` | mtfsfx: write | Floating-Point Status and Control Register. | +| `CR` | mtfsfx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `mtfsfx` + +- **Reads (always):** `FM`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `mtfsfx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtfsfx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtfsfx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:416`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L416) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:911`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L911) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3041-3054`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3041-L3054) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtfsfx => { + // Move to FPSCR fields: fm mask in bits 7-14, frB value + let fm = (instr.raw >> 17) & 0xFF; + let val = ctx.fpr[instr.rb()].to_bits() as u32; + let mut mask = 0u32; + for i in 0..8 { + if fm & (1 << (7 - i)) != 0 { + mask |= 0xF << (28 - i * 4); + } + } + ctx.fpscr = (ctx.fpscr & !mask) | (val & mask); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **`FM` is an 8-bit field-mask.** Each of the eight bits selects one 4-bit FPSCR field (0..7). `FM[0]` (`0x80`) selects FPSCR field 0 (bits 0..3 = FX, FEX, VX, OX), …, `FM[7]` (`0x01`) selects FPSCR field 7 (rounding-mode bits RN). Set bits cause that field of `FRB`'s low 32 bits to overwrite the corresponding FPSCR field; clear bits leave the field unchanged. +- **Source is the LOW 32 bits of `FRB`.** The high 32 bits are ignored. Software that wants to write a 32-bit pattern typically constructs it in a GPR, stores to memory, and reloads as a double via [`lfd`](../memory/lfd.md). +- **Most common use: setting rounding mode.** Compilers wrap calls to ``-style functions with `mtfsf 1, fX` to update only the rounding-mode field (RN, FPSCR field 7). +- **`Rc=1` updates CR1.** `mtfsf.` copies FPSCR's top 4 bits (FX, FEX, VX, OX) into CR1 after the write. +- **`L`/`W` bits.** PowerISA v2.05+ adds `L=1` to mean "write all FPSCR bits regardless of FM" and `W=1` to select the upper or lower 32 bits. xenia-rs **ignores** `L` and `W` (always treats `L=0, W=0`), which matches every real Xbox 360 use. +- **xenia simplification.** xenia maintains FPSCR as a `u32` and applies the field mask correctly. However, most FP instructions in xenia don't *read* FPSCR (e.g., divides ignore the rounding mode), so the architecturally-set rounding mode often has no actual effect on results. Acceptable for the title set xenia targets. +- **Not synchronising.** Reorderable; PowerISA recommends an `isync` after FPSCR changes that affect subsequent FP behaviour. + +## Related Instructions + +- [`mtfsfix`](mtfsfix.md) — write a 4-bit immediate into one FPSCR field (no FPR source needed). +- [`mtfsb0x`](mtfsb0x.md), [`mtfsb1x`](mtfsb1x.md) — single-bit FPSCR write. +- [`mffsx`](mffsx.md) — read FPSCR into an FPR. +- [`mcrfs`](mcrfs.md) — copy FPSCR field → CR field. + +`mtfsf` is the simplified form (`Rc=0`); `mtfsf.` is the recording variant. + +## IBM Reference + +- [AIX 7.3 — `mtfsf` (Move to FPSCR Fields)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtfsf-move-fpscr-fields-instruction) +- PowerISA v2.07B, Book I §4.6.6 — `mtfsf` definition and `L`/`W` extensions. diff --git a/migration/project-root/ppc-manual/control/mtmsr.md b/migration/project-root/ppc-manual/control/mtmsr.md new file mode 100644 index 0000000..2c597af --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtmsr.md @@ -0,0 +1,139 @@ +# `mtmsr` — Move to Machine State Register + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000124` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtmsr` | `mtmsr` | — | Move to Machine State Register | + +## Syntax + +```asm +mtmsr [RS] +``` + +## Encoding + +### `mtmsr` — form `X` + +- **Opcode word:** `0x7c000124` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `146` +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | mtmsr: read | Source GPR (alias for RD in some stores). | +| `MSR` | mtmsr: write | Machine State Register. | + +## Register Effects + +### `mtmsr` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `MSR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtmsr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtmsr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:822`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L822) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:780`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L780) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1649-1663`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1649-L1663) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtmsr | PpcOpcode::mtmsrd => { + // PPCBUG-078: mtmsrd L=1 is a partial-MSR-write — only MSR[EE] + // (u64 bit 15) and MSR[RI] (u64 bit 0) are modified; all other + // MSR bits preserved. Used by kernel code to re-enable external + // interrupts without disturbing the rest of the MSR. + let l = (instr.raw >> (31 - 15)) & 1; + let rs = ctx.gpr[instr.rs()]; + if matches!(instr.opcode, PpcOpcode::mtmsrd) && l == 1 { + let mask: u64 = (1u64 << 15) | 1u64; + ctx.msr = (ctx.msr & !mask) | (rs & mask); + } else { + ctx.msr = rs; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Privileged.** `mtmsr` is supervisor-only on real hardware. Executing it from problem state raises a Privileged Instruction interrupt. Game code never emits it; only the kernel and exception-return paths use it. +- **32-bit form.** `mtmsr` writes the **low 32 bits** of MSR (legacy PPC32 form). On the Xenon (a PPC64 implementation), use [`mtmsrd`](mtmsrd.md) for the full 64-bit MSR. Some Xenon kernel sequences still use `mtmsr` to leave the high half untouched while flipping low-half flags like EE/PR. +- **Synchronisation.** Marked `sync` — `mtmsr` is **execution-synchronising**. The Xenon must drain all preceding instructions before the new MSR takes effect, and PowerISA recommends a following `isync` to guarantee subsequent instructions execute under the new MSR. +- **`L` operand.** Modern PowerISA defines an `L` bit selecting "EE/RI only" (`L=1`) versus "all" (`L=0`); xenia-rs ignores `L` and writes the entire MSR. Real Xbox 360 kernel code uses both `L=0` and `L=1`. +- **xenia model.** Treats MSR as a flat `u64` field. Both `mtmsr` and `mtmsrd` execute the same body — `ctx.msr = ctx.gpr[rs]`. No privilege or atomicity is enforced; no side effects on TLB / interrupt mask / endianness are simulated. +- **No CR / XER side effects.** +- **Caveat for translators.** Because the host kernel runs natively in xenia, the guest MSR has no architectural meaning beyond storage. Code that reads it back via [`mfmsr`](mfmsr.md) will see exactly what was last written. + +## Related Instructions + +- [`mfmsr`](mfmsr.md) — read MSR. +- [`mtmsrd`](mtmsrd.md) — 64-bit form (writes the entire MSR). +- [`sc`](../branch/sc.md) — kernel entry; the kernel handler typically uses `mtmsr`/`rfid` to return. +- [`isync`](mtmsr.md) — companion fence after MSR writes. + +`mtmsr` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mtmsr` (Move to Machine State Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtmsr-move-machine-state-register-instruction) +- PowerISA v2.07B, Book III §4.3.1 — MSR field definitions and `L`-bit semantics. diff --git a/migration/project-root/ppc-manual/control/mtmsrd.md b/migration/project-root/ppc-manual/control/mtmsrd.md new file mode 100644 index 0000000..47f3c06 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtmsrd.md @@ -0,0 +1,138 @@ +# `mtmsrd` — Move to Machine State Register Doubleword + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000164` · _sync_ + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtmsrd` | `mtmsrd` | — | Move to Machine State Register Doubleword | + +## Syntax + +```asm +mtmsrd [RS] +``` + +## Encoding + +### `mtmsrd` — form `X` + +- **Opcode word:** `0x7c000164` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `178` +- **Synchronising:** yes + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | mtmsrd: read | Source GPR (alias for RD in some stores). | +| `MSR` | mtmsrd: write | Machine State Register. | + +## Register Effects + +### `mtmsrd` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `MSR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtmsrd`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtmsrd"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:827`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L827) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:785`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L785) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1649-1663`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1649-L1663) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtmsr | PpcOpcode::mtmsrd => { + // PPCBUG-078: mtmsrd L=1 is a partial-MSR-write — only MSR[EE] + // (u64 bit 15) and MSR[RI] (u64 bit 0) are modified; all other + // MSR bits preserved. Used by kernel code to re-enable external + // interrupts without disturbing the rest of the MSR. + let l = (instr.raw >> (31 - 15)) & 1; + let rs = ctx.gpr[instr.rs()]; + if matches!(instr.opcode, PpcOpcode::mtmsrd) && l == 1 { + let mask: u64 = (1u64 << 15) | 1u64; + ctx.msr = (ctx.msr & !mask) | (rs & mask); + } else { + ctx.msr = rs; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Privileged.** Like [`mtmsr`](mtmsr.md), supervisor-only. Game code never emits it. +- **64-bit form.** Writes all 64 MSR bits — including `MSR[SF]` (bit 0) which selects 64-bit mode, `MSR[HV]` (bit 3, hypervisor), `MSR[EE]` (32, external interrupts), `MSR[PR]` (33, problem state), `MSR[FP]` (34), `MSR[ME]` (35, machine-check enable), `MSR[DR]`/`MSR[IR]` (data/instruction translation, 38/39), `MSR[RI]` (63, recoverable interrupt). On the Xenon kernel this is the canonical MSR-write instruction. +- **`L` operand.** Same `L`-bit selector as `mtmsr`: `L=1` updates only `MSR[EE]` and `MSR[RI]`; `L=0` updates the full register. xenia-rs ignores `L` and always writes the full doubleword (matching the typical kernel use). +- **Synchronisation.** Marked `sync` — execution-synchronising. PowerISA recommends `isync` afterwards if subsequent fetch / data semantics depend on the new MSR. +- **xenia model.** Shares one interpreter arm with `mtmsr`: `ctx.msr = ctx.gpr[rs]`. No architectural side effects beyond writing the storage; no privilege check. +- **No CR / XER updates.** +- **Used in interrupt return paths.** Kernel handlers commonly write SRR1 (saved MSR) into MSR via `mtmsrd` followed by `rfid` to atomically restore state and jump to SRR0. + +## Related Instructions + +- [`mfmsr`](mfmsr.md) — read MSR. +- [`mtmsr`](mtmsr.md) — 32-bit form (low half only). +- [`sc`](../branch/sc.md) — kernel entry; the kernel typically pairs `mtmsrd` + `rfid` to return. + +`mtmsrd` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mtmsrd` (Move to Machine State Register Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtmsrd-move-machine-state-register-doubleword-instruction) +- PowerISA v2.07B, Book III §4.3.1 — MSR field definitions, `L`-bit semantics, and 64-bit-mode rules. diff --git a/migration/project-root/ppc-manual/control/mtspr.md b/migration/project-root/ppc-manual/control/mtspr.md new file mode 100644 index 0000000..6103cde --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtspr.md @@ -0,0 +1,161 @@ +# `mtspr` — Move to Special-Purpose Register + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [XFX](../forms/XFX.md) · **Opcode:** `0x7c0003a6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtspr` | `mtspr` | — | Move to Special-Purpose Register | + +## Syntax + +```asm +mtspr [SPR], [RS] +``` + +## Encoding + +### `mtspr` — form `XFX` + +- **Opcode word:** `0x7c0003a6` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `467` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination / source GPR | +| 11–20 | `spr/tbr/FXM` | SPR/TBR number (byte-swapped halves) or CR field mask | +| 21–30 | `XO` | extended opcode | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | mtspr: read | Source GPR (alias for RD in some stores). | +| `SPR` | mtspr: write | Special-Purpose-Register number. Encoded with the two 5-bit halves swapped (bits 11-15 become the high half, bits 16-20 the low half). | + +## Register Effects + +### `mtspr` + +- **Reads (always):** `RS` +- **Reads (conditional):** _none_ +- **Writes (always):** `SPR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +n <- spr_number(SPR) +SPR(n) <- (RS) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtspr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtspr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_control.cc:771`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_control.cc#L771) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:810`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L810) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1596-1626`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1596-L1626) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtspr => { + let spr = instr.spr(); + let val = ctx.gpr[instr.rs()]; + match spr { + crate::context::spr::XER => ctx.set_xer(val as u32), + crate::context::spr::LR => ctx.lr = val, + crate::context::spr::CTR => ctx.ctr = val as u32 as u64, + crate::context::spr::DEC => ctx.dec = val as u32, + crate::context::spr::TBL_WRITE => { + ctx.timebase = (ctx.timebase & 0xFFFF_FFFF_0000_0000) | (val & 0xFFFF_FFFF); + } + crate::context::spr::TBU_WRITE => { + ctx.timebase = (ctx.timebase & 0x0000_0000_FFFF_FFFF) | ((val & 0xFFFF_FFFF) << 32); + } + crate::context::spr::VRSAVE => ctx.vrsave = val as u32, + // Benign writes — swallow silently to avoid false Unimplemented + // warnings on SPRs that have no observable effect in userspace. + crate::context::spr::SPRG0 + | crate::context::spr::SPRG1 + | crate::context::spr::SPRG2 + | crate::context::spr::SPRG3 + | crate::context::spr::HID0 + | crate::context::spr::HID1 + | crate::context::spr::DAR + | crate::context::spr::DSISR => {} + _ => { + tracing::warn!("mtspr: unimplemented SPR {}", spr); + } + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **SPR halves are swapped in the encoding.** As with [`mfspr`](mfspr.md), the 10-bit `spr` field stores the two 5-bit halves transposed. Software always names the *logical* SPR number; assemblers handle the swap. Decoded number `n = ((field & 0x1F) << 5) | ((field >> 5) & 0x1F)`. +- **SPRs writable from userspace (Xenon, modelled by xenia).** + + | Decoded # | Name | Effect | + | --- | --- | --- | + | 1 | XER | unpacked into `ctx.xer_so/xer_ov/xer_ca` and length field | + | 8 | LR | `ctx.lr ← RS` | + | 9 | CTR | `ctx.ctr ← RS` | + | 256 | VRSAVE | `ctx.vrsave ← RS & 0xFFFFFFFF` | + +- **SPRs xenia silently swallows (no observable effect).** SPRG0..3, HID0, HID1, DAR, DSISR — these are kernel/diagnostic registers; xenia accepts the write to avoid spurious "unimplemented SPR" warnings, but the value is discarded. +- **Privileged SPRs.** On real hardware, writes to MSR-visible kernel SPRs (SPRG0..3, HID0/1, DSISR, DAR, PIR, etc.) require supervisor mode and trap from problem state. xenia does **not** enforce privilege. +- **Time-base writes are privileged.** `mtspr 268/269` (TBL/TBU) only works in supervisor mode on real hardware. xenia will warn `mtspr: unimplemented SPR` for these — do **not** assume the time base can be guest-written. +- **Simplified mnemonics.** `mtxer RS` ≡ `mtspr 1, RS`, `mtlr RS` ≡ `mtspr 8, RS`, `mtctr RS` ≡ `mtspr 9, RS`. These dominate Xbox 360 disassembly. +- **No CR / XER side effects.** `mtspr` itself doesn't record (the *target* SPR may itself be XER, in which case XER is being directly overwritten). +- **Not synchronising.** xenia's XML omits the `sync` flag; PowerISA does require some `mtspr` cases (e.g. SDR1, MMU regs) to be context-synchronising — none of them appear in title binaries. + +## Related Instructions + +- [`mfspr`](mfspr.md) — inverse: read an SPR into a GPR. +- [`mftb`](mftb.md) — read time-base (preferred over `mfspr TBL/TBU`). +- [`mtmsr`](mtmsr.md), [`mtmsrd`](mtmsrd.md) — write MSR (separate opcode). +- [`mcrxr`](mcrxr.md) — sample-and-clear XER's overflow/carry bits. + +### Simplified Mnemonics + +| Simplified | Expansion | +| --- | --- | +| `mtxer RS` | `mtspr 1, RS` | +| `mtlr RS` | `mtspr 8, RS` | +| `mtctr RS` | `mtspr 9, RS` | + +## IBM Reference + +- [AIX 7.3 — `mtspr` (Move to Special Purpose Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtspr-move-special-purpose-register-instruction) +- PowerISA v2.07B, Book III §4 — SPR number table and privilege rules. diff --git a/migration/project-root/ppc-manual/control/mtvscr.md b/migration/project-root/ppc-manual/control/mtvscr.md new file mode 100644 index 0000000..ea991d3 --- /dev/null +++ b/migration/project-root/ppc-manual/control/mtvscr.md @@ -0,0 +1,125 @@ +# `mtvscr` — Move to VSCR + +> **Category:** [Control / CR / SPR](../categories/control.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000644` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `mtvscr` | `mtvscr` | — | Move to VSCR | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `mtvscr` — form `VX` + +- **Opcode word:** `0x10000644` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1604` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | mtvscr: read | Source B vector register. | +| `VSCR` | mtvscr: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `mtvscr` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `mtvscr`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`mtvscr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="mtvscr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:310`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L310) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:55`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L55) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:542`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L542) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2514-2517`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2514-L2517) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::mtvscr => { + ctx.vscr = ctx.vr[instr.rb()]; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Operation.** Reads the **low 32 bits of the rightmost word** of `VB` (bytes 12..15 in big-endian) and stores them into VSCR. Other bits of `VB` are ignored. +- **Bits actually significant.** Of the 32 source bits, only **NJ (bit 16)** and **SAT (bit 31)** are architecturally meaningful on the Xenon. All other bits should be written as zero; behaviour for non-zero values is implementation-defined. +- **Clearing SAT.** The dominant use is `mtvscr vN` with `vN` zeroed via `vxor vN, vN, vN`, which writes VSCR=0 and thereby clears the sticky SAT bit before a fresh batch of saturating vector ops. +- **Setting NJ.** Switching to/from "Java mode" (`NJ=0`, full IEEE denormal handling) versus "Non-Java mode" (`NJ=1`, flush-to-zero) is the other meaningful use. Game audio / DSP code occasionally toggles this to match a precise IEEE expectation. +- **xenia simplification.** xenia-rs stores VSCR identically to a vector register and copies the source straight in: `ctx.vscr = ctx.vr[VB]`. Subsequent xenia AltiVec ops do consult `VSCR[SAT]` for sticky updates, so the architecturally-relevant behaviour is preserved. NJ's flush-to-zero semantics are honoured by xenia's vector denormal paths. +- **Not synchronising.** PowerISA does not require `isync` after `mtvscr`, but library code occasionally pairs them as a defensive measure. + +## Related Instructions + +- [`mfvscr`](mfvscr.md) — read VSCR into a vector register (the inverse). +- AltiVec saturating ops (`vaddubs`, `vsubuhs`, …) — primary writers of `VSCR[SAT]`; `mtvscr` is the only way to clear it. +- [`mtspr`](mtspr.md) — for non-vector control registers; VSCR has its own opcode. + +`mtvscr` has no simplified mnemonics. + +## IBM Reference + +- [AIX 7.3 — `mtvscr` (Move to VSCR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-mtvscr-move-vector-status-control-register-instruction) +- PowerISA v2.07B, Book I §6.6 — VSCR layout, SAT / NJ semantics. diff --git a/migration/project-root/ppc-manual/forms/A.md b/migration/project-root/ppc-manual/forms/A.md new file mode 100644 index 0000000..f74d0cf --- /dev/null +++ b/migration/project-root/ppc-manual/forms/A.md @@ -0,0 +1,43 @@ +# Form `A` — A — Arithmetic (three-source FPU) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`fdivsx`](../fpu/fdivsx.md) | `0xec000024` | fpu | Floating Divide Single | +| [`fsubsx`](../fpu/fsubsx.md) | `0xec000028` | fpu | Floating Subtract Single | +| [`faddsx`](../fpu/faddsx.md) | `0xec00002a` | fpu | Floating Add Single | +| [`fsqrtsx`](../fpu/fsqrtsx.md) | `0xec00002c` | fpu | Floating Square Root Single | +| [`fresx`](../fpu/fresx.md) | `0xec000030` | fpu | Floating Reciprocal Estimate Single | +| [`fmulsx`](../fpu/fmulsx.md) | `0xec000032` | fpu | Floating Multiply Single | +| [`fmsubsx`](../fpu/fmsubsx.md) | `0xec000038` | fpu | Floating Multiply-Subtract Single | +| [`fmaddsx`](../fpu/fmaddsx.md) | `0xec00003a` | fpu | Floating Multiply-Add Single | +| [`fnmsubsx`](../fpu/fnmsubsx.md) | `0xec00003c` | fpu | Floating Negative Multiply-Subtract Single | +| [`fnmaddsx`](../fpu/fnmaddsx.md) | `0xec00003e` | fpu | Floating Negative Multiply-Add Single | +| [`fdivx`](../fpu/fdivx.md) | `0xfc000024` | fpu | Floating Divide | +| [`fsubx`](../fpu/fsubx.md) | `0xfc000028` | fpu | Floating Subtract | +| [`faddx`](../fpu/faddx.md) | `0xfc00002a` | fpu | Floating Add | +| [`fsqrtx`](../fpu/fsqrtx.md) | `0xfc00002c` | fpu | Floating Square Root | +| [`fselx`](../fpu/fselx.md) | `0xfc00002e` | fpu | Floating Select | +| [`fmulx`](../fpu/fmulx.md) | `0xfc000032` | fpu | Floating Multiply | +| [`frsqrtex`](../fpu/frsqrtex.md) | `0xfc000034` | fpu | Floating Reciprocal Square Root Estimate | +| [`fmsubx`](../fpu/fmsubx.md) | `0xfc000038` | fpu | Floating Multiply-Subtract | +| [`fmaddx`](../fpu/fmaddx.md) | `0xfc00003a` | fpu | Floating Multiply-Add | +| [`fnmsubx`](../fpu/fnmsubx.md) | `0xfc00003c` | fpu | Floating Negative Multiply-Subtract | +| [`fnmaddx`](../fpu/fnmaddx.md) | `0xfc00003e` | fpu | Floating Negative Multiply-Add | + + diff --git a/migration/project-root/ppc-manual/forms/B.md b/migration/project-root/ppc-manual/forms/B.md new file mode 100644 index 0000000..423a44b --- /dev/null +++ b/migration/project-root/ppc-manual/forms/B.md @@ -0,0 +1,22 @@ +# Form `B` — B — Conditional Branch + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `BO` | branch options | +| 11–15 | `BI` | CR bit to test | +| 16–29 | `BD` | signed 14-bit word-offset target | +| 30 | `AA` | absolute-address flag | +| 31 | `LK` | link flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`bcx`](../branch/bcx.md) | `0x40000000` | branch | Branch Conditional | + + diff --git a/migration/project-root/ppc-manual/forms/D.md b/migration/project-root/ppc-manual/forms/D.md new file mode 100644 index 0000000..182c44c --- /dev/null +++ b/migration/project-root/ppc-manual/forms/D.md @@ -0,0 +1,59 @@ +# Form `D` — D — Displacement (load/store and immediate ALU) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`tdi`](../branch/tdi.md) | `0x08000000` | branch | Trap Doubleword Immediate | +| [`twi`](../branch/twi.md) | `0x0c000000` | branch | Trap Word Immediate | +| [`mulli`](../alu/mulli.md) | `0x1c000000` | integer | Multiply Low Immediate | +| [`subficx`](../alu/subficx.md) | `0x20000000` | integer | Subtract From Immediate Carrying | +| [`cmpli`](../alu/cmpli.md) | `0x28000000` | integer | Compare Logical Immediate | +| [`cmpi`](../alu/cmpi.md) | `0x2c000000` | integer | Compare Immediate | +| [`addic`](../alu/addic.md) | `0x30000000` | integer | Add Immediate Carrying | +| [`addic.`](../alu/addicx.md) | `0x34000000` | integer | Add Immediate Carrying and Record | +| [`addi`](../alu/addi.md) | `0x38000000` | integer | Add Immediate | +| [`addis`](../alu/addis.md) | `0x3c000000` | integer | Add Immediate Shifted | +| [`ori`](../alu/ori.md) | `0x60000000` | integer | OR Immediate | +| [`oris`](../alu/oris.md) | `0x64000000` | integer | OR Immediate Shifted | +| [`xori`](../alu/xori.md) | `0x68000000` | integer | XOR Immediate | +| [`xoris`](../alu/xoris.md) | `0x6c000000` | integer | XOR Immediate Shifted | +| [`andi.`](../alu/andix.md) | `0x70000000` | integer | AND Immediate | +| [`andis.`](../alu/andisx.md) | `0x74000000` | integer | AND Immediate Shifted | +| [`lwz`](../memory/lwz.md) | `0x80000000` | memory | Load Word and Zero | +| [`lwzu`](../memory/lwz.md) | `0x84000000` | memory | Load Word and Zero with Update | +| [`lbz`](../memory/lbz.md) | `0x88000000` | memory | Load Byte and Zero | +| [`lbzu`](../memory/lbz.md) | `0x8c000000` | memory | Load Byte and Zero with Update | +| [`stw`](../memory/stw.md) | `0x90000000` | memory | Store Word | +| [`stwu`](../memory/stw.md) | `0x94000000` | memory | Store Word with Update | +| [`stb`](../memory/stb.md) | `0x98000000` | memory | Store Byte | +| [`stbu`](../memory/stb.md) | `0x9c000000` | memory | Store Byte with Update | +| [`lhz`](../memory/lhz.md) | `0xa0000000` | memory | Load Half Word and Zero | +| [`lhzu`](../memory/lhz.md) | `0xa4000000` | memory | Load Half Word and Zero with Update | +| [`lha`](../memory/lha.md) | `0xa8000000` | memory | Load Half Word Algebraic | +| [`lhau`](../memory/lha.md) | `0xac000000` | memory | Load Half Word Algebraic with Update | +| [`sth`](../memory/sth.md) | `0xb0000000` | memory | Store Half Word | +| [`sthu`](../memory/sth.md) | `0xb4000000` | memory | Store Half Word with Update | +| [`lmw`](../memory/lmw.md) | `0xb8000000` | memory | Load Multiple Word | +| [`stmw`](../memory/stmw.md) | `0xbc000000` | memory | Store Multiple Word | +| [`lfs`](../memory/lfs.md) | `0xc0000000` | memory | Load Floating-Point Single | +| [`lfsu`](../memory/lfs.md) | `0xc4000000` | memory | Load Floating-Point Single with Update | +| [`lfd`](../memory/lfd.md) | `0xc8000000` | memory | Load Floating-Point Double | +| [`lfdu`](../memory/lfd.md) | `0xcc000000` | memory | Load Floating-Point Double with Update | +| [`stfs`](../memory/stfs.md) | `0xd0000000` | memory | Store Floating-Point Single | +| [`stfsu`](../memory/stfs.md) | `0xd4000000` | memory | Store Floating-Point Single with Update | +| [`stfd`](../memory/stfd.md) | `0xd8000000` | memory | Store Floating-Point Double | +| [`stfdu`](../memory/stfd.md) | `0xdc000000` | memory | Store Floating-Point Double with Update | + + diff --git a/migration/project-root/ppc-manual/forms/DCBZ.md b/migration/project-root/ppc-manual/forms/DCBZ.md new file mode 100644 index 0000000..42e3e3f --- /dev/null +++ b/migration/project-root/ppc-manual/forms/DCBZ.md @@ -0,0 +1,23 @@ +# Form `DCBZ` — DCBZ — Cache Block Zeroing (special X variant) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `—` | reserved | +| 11–15 | `RA` | base register (0 ⇒ literal 0) | +| 16–20 | `RB` | offset register | +| 21–30 | `XO` | extended opcode (1014 for dcbz / 1010 for dcbz128) | +| 31 | `—` | reserved | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`dcbz`](../memory/dcbz.md) | `0x7c0007ec` | memory | Data Cache Block Clear to Zero | +| [`dcbz128`](../memory/dcbz.md) | `0x7c2007ec` | memory | Data Cache Block Clear to Zero 128 | + + diff --git a/migration/project-root/ppc-manual/forms/DS.md b/migration/project-root/ppc-manual/forms/DS.md new file mode 100644 index 0000000..3046292 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/DS.md @@ -0,0 +1,25 @@ +# Form `DS` — DS — Doubleword Shift (word-scaled displacement) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`ld`](../memory/ld.md) | `0xe8000000` | memory | Load Doubleword | +| [`ldu`](../memory/ld.md) | `0xe8000001` | memory | Load Doubleword with Update | +| [`lwa`](../memory/lwa.md) | `0xe8000002` | memory | Load Word Algebraic | +| [`std`](../memory/std.md) | `0xf8000000` | memory | Store Doubleword | +| [`stdu`](../memory/std.md) | `0xf8000001` | memory | Store Doubleword with Update | + + diff --git a/migration/project-root/ppc-manual/forms/I.md b/migration/project-root/ppc-manual/forms/I.md new file mode 100644 index 0000000..2597a6d --- /dev/null +++ b/migration/project-root/ppc-manual/forms/I.md @@ -0,0 +1,20 @@ +# Form `I` — I — Immediate Branch + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–29 | `LI` | signed 24-bit word-offset target | +| 30 | `AA` | absolute-address flag | +| 31 | `LK` | link flag (bl/ba/bla) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`bx`](../branch/bx.md) | `0x48000000` | branch | Branch | + + diff --git a/migration/project-root/ppc-manual/forms/M.md b/migration/project-root/ppc-manual/forms/M.md new file mode 100644 index 0000000..3ad42ee --- /dev/null +++ b/migration/project-root/ppc-manual/forms/M.md @@ -0,0 +1,25 @@ +# Form `M` — M — Mask (rlwinm/rlwimi/rlwnm) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `SH/RB` | shift amount or source B | +| 21–25 | `MB` | mask begin | +| 26–30 | `ME` | mask end | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`rlwimix`](../alu/rlwimix.md) | `0x50000000` | integer | Rotate Left Word Immediate then Mask Insert | +| [`rlwinmx`](../alu/rlwinmx.md) | `0x54000000` | integer | Rotate Left Word Immediate then AND with Mask | +| [`rlwnmx`](../alu/rlwnmx.md) | `0x5c000000` | integer | Rotate Left Word then AND with Mask | + + diff --git a/migration/project-root/ppc-manual/forms/MD.md b/migration/project-root/ppc-manual/forms/MD.md new file mode 100644 index 0000000..a2f43ec --- /dev/null +++ b/migration/project-root/ppc-manual/forms/MD.md @@ -0,0 +1,27 @@ +# Form `MD` — MD — Mask Double (rldicr/rldicl/rldic/rldimi) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–29 | `XO` | extended opcode | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`rldiclx`](../alu/rldiclx.md) | `0x78000000` | integer | Rotate Left Doubleword Immediate then Clear Left | +| [`rldicrx`](../alu/rldicrx.md) | `0x78000004` | integer | Rotate Left Doubleword Immediate then Clear Right | +| [`rldicx`](../alu/rldicx.md) | `0x78000008` | integer | Rotate Left Doubleword Immediate then Clear | +| [`rldimix`](../alu/rldimix.md) | `0x7800000c` | integer | Rotate Left Doubleword Immediate then Mask Insert | + + diff --git a/migration/project-root/ppc-manual/forms/MDS.md b/migration/project-root/ppc-manual/forms/MDS.md new file mode 100644 index 0000000..c596582 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/MDS.md @@ -0,0 +1,24 @@ +# Form `MDS` — MDS — Mask Double, Shift-by-register (rldcl/rldcr) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (30) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `RB` | source B GPR | +| 21–26 | `mb/me` | 6-bit mask field (swapped halves) | +| 27–30 | `XO` | extended opcode | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`rldclx`](../alu/rldclx.md) | `0x78000010` | integer | Rotate Left Doubleword then Clear Left | +| [`rldcrx`](../alu/rldcrx.md) | `0x78000012` | integer | Rotate Left Doubleword then Clear Right | + + diff --git a/migration/project-root/ppc-manual/forms/SC.md b/migration/project-root/ppc-manual/forms/SC.md new file mode 100644 index 0000000..d291b91 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/SC.md @@ -0,0 +1,22 @@ +# Form `SC` — SC — System Call + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (17) | +| 6–19 | `—` | reserved | +| 20–26 | `LEV` | exception level | +| 27–29 | `—` | reserved | +| 30 | `1` | fixed 1 | +| 31 | `—` | reserved | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`sc`](../branch/sc.md) | `0x44000002` | branch | System Call | + + diff --git a/migration/project-root/ppc-manual/forms/VA.md b/migration/project-root/ppc-manual/forms/VA.md new file mode 100644 index 0000000..c8466d2 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VA.md @@ -0,0 +1,35 @@ +# Form `VA` — VA — Vector Arithmetic (4-operand, madd-style) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vmhaddshs`](../vmx/vmhaddshs.md) | `0x10000020` | vmx | Vector Multiply-High and Add Signed Signed Half Word Saturate | +| [`vmhraddshs`](../vmx/vmhraddshs.md) | `0x10000021` | vmx | Vector Multiply-High Round and Add Signed Signed Half Word Saturate | +| [`vmladduhm`](../vmx/vmladduhm.md) | `0x10000022` | vmx | Vector Multiply-Low and Add Unsigned Half Word Modulo | +| [`vmsumubm`](../vmx/vmsumubm.md) | `0x10000024` | vmx | Vector Multiply-Sum Unsigned Byte Modulo | +| [`vmsummbm`](../vmx/vmsummbm.md) | `0x10000025` | vmx | Vector Multiply-Sum Mixed-Sign Byte Modulo | +| [`vmsumuhm`](../vmx/vmsumuhm.md) | `0x10000026` | vmx | Vector Multiply-Sum Unsigned Half Word Modulo | +| [`vmsumuhs`](../vmx/vmsumuhs.md) | `0x10000027` | vmx | Vector Multiply-Sum Unsigned Half Word Saturate | +| [`vmsumshm`](../vmx/vmsumshm.md) | `0x10000028` | vmx | Vector Multiply-Sum Signed Half Word Modulo | +| [`vmsumshs`](../vmx/vmsumshs.md) | `0x10000029` | vmx | Vector Multiply-Sum Signed Half Word Saturate | +| [`vsel`](../vmx/vsel.md) | `0x1000002a` | vmx | Vector Conditional Select | +| [`vperm`](../vmx/vperm.md) | `0x1000002b` | vmx | Vector Permute | +| [`vsldoi`](../vmx/vsldoi.md) | `0x1000002c` | vmx | Vector Shift Left Double by Octet Immediate | +| [`vmaddfp`](../vmx/vmaddfp.md) | `0x1000002e` | vmx | Vector Multiply-Add Floating Point | +| [`vnmsubfp`](../vmx/vnmsubfp.md) | `0x1000002f` | vmx | Vector Negative Multiply-Subtract Floating Point | + + diff --git a/migration/project-root/ppc-manual/forms/VC.md b/migration/project-root/ppc-manual/forms/VC.md new file mode 100644 index 0000000..4e7fd0d --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VC.md @@ -0,0 +1,34 @@ +# Form `VC` — VC — Vector Compare (with Rc → CR6) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vcmpequb`](../vmx/vcmpequb.md) | `0x10000006` | vmx | Vector Compare Equal-to Unsigned Byte | +| [`vcmpequh`](../vmx/vcmpequh.md) | `0x10000046` | vmx | Vector Compare Equal-to Unsigned Half Word | +| [`vcmpequw`](../vmx/vcmpequw.md) | `0x10000086` | vmx | Vector Compare Equal-to Unsigned Word | +| [`vcmpeqfp`](../vmx/vcmpeqfp.md) | `0x100000c6` | vmx | Vector Compare Equal-to Floating Point | +| [`vcmpgefp`](../vmx/vcmpgefp.md) | `0x100001c6` | vmx | Vector Compare Greater-Than-or-Equal-to Floating Point | +| [`vcmpgtub`](../vmx/vcmpgtub.md) | `0x10000206` | vmx | Vector Compare Greater-Than Unsigned Byte | +| [`vcmpgtuh`](../vmx/vcmpgtuh.md) | `0x10000246` | vmx | Vector Compare Greater-Than Unsigned Half Word | +| [`vcmpgtuw`](../vmx/vcmpgtuw.md) | `0x10000286` | vmx | Vector Compare Greater-Than Unsigned Word | +| [`vcmpgtfp`](../vmx/vcmpgtfp.md) | `0x100002c6` | vmx | Vector Compare Greater-Than Floating Point | +| [`vcmpgtsb`](../vmx/vcmpgtsb.md) | `0x10000306` | vmx | Vector Compare Greater-Than Signed Byte | +| [`vcmpgtsh`](../vmx/vcmpgtsh.md) | `0x10000346` | vmx | Vector Compare Greater-Than Signed Half Word | +| [`vcmpgtsw`](../vmx/vcmpgtsw.md) | `0x10000386` | vmx | Vector Compare Greater-Than Signed Word | +| [`vcmpbfp`](../vmx/vcmpbfp.md) | `0x100003c6` | vmx | Vector Compare Bounds Floating Point | + + diff --git a/migration/project-root/ppc-manual/forms/VX.md b/migration/project-root/ppc-manual/forms/VX.md new file mode 100644 index 0000000..f6b7365 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX.md @@ -0,0 +1,137 @@ +# Form `VX` — VX — Vector (3-operand Altivec) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vaddubm`](../vmx/vaddubm.md) | `0x10000000` | vmx | Vector Add Unsigned Byte Modulo | +| [`vmaxub`](../vmx/vmaxub.md) | `0x10000002` | vmx | Vector Maximum Unsigned Byte | +| [`vrlb`](../vmx/vrlb.md) | `0x10000004` | vmx | Vector Rotate Left Integer Byte | +| [`vmuloub`](../vmx/vmuloub.md) | `0x10000008` | vmx | Vector Multiply Odd Unsigned Byte | +| [`vaddfp`](../vmx/vaddfp.md) | `0x1000000a` | vmx | Vector Add Floating Point | +| [`vmrghb`](../vmx/vmrghb.md) | `0x1000000c` | vmx | Vector Merge High Byte | +| [`vpkuhum`](../vmx/vpkuhum.md) | `0x1000000e` | vmx | Vector Pack Unsigned Half Word Unsigned Modulo | +| [`vadduhm`](../vmx/vadduhm.md) | `0x10000040` | vmx | Vector Add Unsigned Half Word Modulo | +| [`vmaxuh`](../vmx/vmaxuh.md) | `0x10000042` | vmx | Vector Maximum Unsigned Half Word | +| [`vrlh`](../vmx/vrlh.md) | `0x10000044` | vmx | Vector Rotate Left Integer Half Word | +| [`vmulouh`](../vmx/vmulouh.md) | `0x10000048` | vmx | Vector Multiply Odd Unsigned Half Word | +| [`vsubfp`](../vmx/vsubfp.md) | `0x1000004a` | vmx | Vector Subtract Floating Point | +| [`vmrghh`](../vmx/vmrghh.md) | `0x1000004c` | vmx | Vector Merge High Half Word | +| [`vpkuwum`](../vmx/vpkuwum.md) | `0x1000004e` | vmx | Vector Pack Unsigned Word Unsigned Modulo | +| [`vadduwm`](../vmx/vadduwm.md) | `0x10000080` | vmx | Vector Add Unsigned Word Modulo | +| [`vmaxuw`](../vmx/vmaxuw.md) | `0x10000082` | vmx | Vector Maximum Unsigned Word | +| [`vrlw`](../vmx/vrlw.md) | `0x10000084` | vmx | Vector Rotate Left Integer Word | +| [`vmrghw`](../vmx/vmrghw.md) | `0x1000008c` | vmx | Vector Merge High Word | +| [`vpkuhus`](../vmx/vpkuhus.md) | `0x1000008e` | vmx | Vector Pack Unsigned Half Word Unsigned Saturate | +| [`vpkuwus`](../vmx/vpkuwus.md) | `0x100000ce` | vmx | Vector Pack Unsigned Word Unsigned Saturate | +| [`vmaxsb`](../vmx/vmaxsb.md) | `0x10000102` | vmx | Vector Maximum Signed Byte | +| [`vslb`](../vmx/vslb.md) | `0x10000104` | vmx | Vector Shift Left Integer Byte | +| [`vmulosb`](../vmx/vmulosb.md) | `0x10000108` | vmx | Vector Multiply Odd Signed Byte | +| [`vrefp`](../vmx/vrefp.md) | `0x1000010a` | vmx | Vector Reciprocal Estimate Floating Point | +| [`vmrglb`](../vmx/vmrglb.md) | `0x1000010c` | vmx | Vector Merge Low Byte | +| [`vpkshus`](../vmx/vpkshus.md) | `0x1000010e` | vmx | Vector Pack Signed Half Word Unsigned Saturate | +| [`vmaxsh`](../vmx/vmaxsh.md) | `0x10000142` | vmx | Vector Maximum Signed Half Word | +| [`vslh`](../vmx/vslh.md) | `0x10000144` | vmx | Vector Shift Left Integer Half Word | +| [`vmulosh`](../vmx/vmulosh.md) | `0x10000148` | vmx | Vector Multiply Odd Signed Half Word | +| [`vrsqrtefp`](../vmx/vrsqrtefp.md) | `0x1000014a` | vmx | Vector Reciprocal Square Root Estimate Floating Point | +| [`vmrglh`](../vmx/vmrglh.md) | `0x1000014c` | vmx | Vector Merge Low Half Word | +| [`vpkswus`](../vmx/vpkswus.md) | `0x1000014e` | vmx | Vector Pack Signed Word Unsigned Saturate | +| [`vaddcuw`](../vmx/vaddcuw.md) | `0x10000180` | vmx | Vector Add Carryout Unsigned Word | +| [`vmaxsw`](../vmx/vmaxsw.md) | `0x10000182` | vmx | Vector Maximum Signed Word | +| [`vslw`](../vmx/vslw.md) | `0x10000184` | vmx | Vector Shift Left Integer Word | +| [`vexptefp`](../vmx/vexptefp.md) | `0x1000018a` | vmx | Vector 2 Raised to the Exponent Estimate Floating Point | +| [`vmrglw`](../vmx/vmrglw.md) | `0x1000018c` | vmx | Vector Merge Low Word | +| [`vpkshss`](../vmx/vpkshss.md) | `0x1000018e` | vmx | Vector Pack Signed Half Word Signed Saturate | +| [`vsl`](../vmx/vsl.md) | `0x100001c4` | vmx | Vector Shift Left | +| [`vlogefp`](../vmx/vlogefp.md) | `0x100001ca` | vmx | Vector Log2 Estimate Floating Point | +| [`vpkswss`](../vmx/vpkswss.md) | `0x100001ce` | vmx | Vector Pack Signed Word Signed Saturate | +| [`vaddubs`](../vmx/vaddubs.md) | `0x10000200` | vmx | Vector Add Unsigned Byte Saturate | +| [`vminub`](../vmx/vminub.md) | `0x10000202` | vmx | Vector Minimum Unsigned Byte | +| [`vsrb`](../vmx/vsrb.md) | `0x10000204` | vmx | Vector Shift Right Byte | +| [`vmuleub`](../vmx/vmuleub.md) | `0x10000208` | vmx | Vector Multiply Even Unsigned Byte | +| [`vrfin`](../vmx/vrfin.md) | `0x1000020a` | vmx | Vector Round to Floating-Point Integer Nearest | +| [`vspltb`](../vmx/vspltb.md) | `0x1000020c` | vmx | Vector Splat Byte | +| [`vupkhsb`](../vmx/vupkhsb.md) | `0x1000020e` | vmx | Vector Unpack High Signed Byte | +| [`vadduhs`](../vmx/vadduhs.md) | `0x10000240` | vmx | Vector Add Unsigned Half Word Saturate | +| [`vminuh`](../vmx/vminuh.md) | `0x10000242` | vmx | Vector Minimum Unsigned Half Word | +| [`vsrh`](../vmx/vsrh.md) | `0x10000244` | vmx | Vector Shift Right Half Word | +| [`vmuleuh`](../vmx/vmuleuh.md) | `0x10000248` | vmx | Vector Multiply Even Unsigned Half Word | +| [`vrfiz`](../vmx/vrfiz.md) | `0x1000024a` | vmx | Vector Round to Floating-Point Integer toward Zero | +| [`vsplth`](../vmx/vsplth.md) | `0x1000024c` | vmx | Vector Splat Half Word | +| [`vupkhsh`](../vmx/vupkhsh.md) | `0x1000024e` | vmx | Vector Unpack High Signed Half Word | +| [`vadduws`](../vmx/vadduws.md) | `0x10000280` | vmx | Vector Add Unsigned Word Saturate | +| [`vminuw`](../vmx/vminuw.md) | `0x10000282` | vmx | Vector Minimum Unsigned Word | +| [`vsrw`](../vmx/vsrw.md) | `0x10000284` | vmx | Vector Shift Right Word | +| [`vrfip`](../vmx/vrfip.md) | `0x1000028a` | vmx | Vector Round to Floating-Point Integer toward +Infinity | +| [`vspltw`](../vmx/vspltw.md) | `0x1000028c` | vmx | Vector Splat Word | +| [`vupklsb`](../vmx/vupklsb.md) | `0x1000028e` | vmx | Vector Unpack Low Signed Byte | +| [`vsr`](../vmx/vsr.md) | `0x100002c4` | vmx | Vector Shift Right | +| [`vrfim`](../vmx/vrfim.md) | `0x100002ca` | vmx | Vector Round to Floating-Point Integer toward -Infinity | +| [`vupklsh`](../vmx/vupklsh.md) | `0x100002ce` | vmx | Vector Unpack Low Signed Half Word | +| [`vaddsbs`](../vmx/vaddsbs.md) | `0x10000300` | vmx | Vector Add Signed Byte Saturate | +| [`vminsb`](../vmx/vminsb.md) | `0x10000302` | vmx | Vector Minimum Signed Byte | +| [`vsrab`](../vmx/vsrab.md) | `0x10000304` | vmx | Vector Shift Right Algebraic Byte | +| [`vmulesb`](../vmx/vmulesb.md) | `0x10000308` | vmx | Vector Multiply Even Signed Byte | +| [`vcfux`](../vmx/vcfux.md) | `0x1000030a` | vmx | Vector Convert from Unsigned Fixed-Point Word | +| [`vspltisb`](../vmx/vspltisb.md) | `0x1000030c` | vmx | Vector Splat Immediate Signed Byte | +| [`vpkpx`](../vmx/vpkpx.md) | `0x1000030e` | vmx | Vector Pack Pixel | +| [`vaddshs`](../vmx/vaddshs.md) | `0x10000340` | vmx | Vector Add Signed Half Word Saturate | +| [`vminsh`](../vmx/vminsh.md) | `0x10000342` | vmx | Vector Minimum Signed Half Word | +| [`vsrah`](../vmx/vsrah.md) | `0x10000344` | vmx | Vector Shift Right Algebraic Half Word | +| [`vmulesh`](../vmx/vmulesh.md) | `0x10000348` | vmx | Vector Multiply Even Signed Half Word | +| [`vcfsx`](../vmx/vcfsx.md) | `0x1000034a` | vmx | Vector Convert from Signed Fixed-Point Word | +| [`vspltish`](../vmx/vspltish.md) | `0x1000034c` | vmx | Vector Splat Immediate Signed Half Word | +| [`vupkhpx`](../vmx/vupkhpx.md) | `0x1000034e` | vmx | Vector Unpack High Pixel | +| [`vaddsws`](../vmx/vaddsws.md) | `0x10000380` | vmx | Vector Add Signed Word Saturate | +| [`vminsw`](../vmx/vminsw.md) | `0x10000382` | vmx | Vector Minimum Signed Word | +| [`vsraw`](../vmx/vsraw.md) | `0x10000384` | vmx | Vector Shift Right Algebraic Word | +| [`vctuxs`](../vmx/vctuxs.md) | `0x1000038a` | vmx | Vector Convert to Unsigned Fixed-Point Word Saturate | +| [`vspltisw`](../vmx/vspltisw.md) | `0x1000038c` | vmx | Vector Splat Immediate Signed Word | +| [`vctsxs`](../vmx/vctsxs.md) | `0x100003ca` | vmx | Vector Convert to Signed Fixed-Point Word Saturate | +| [`vupklpx`](../vmx/vupklpx.md) | `0x100003ce` | vmx | Vector Unpack Low Pixel | +| [`vsububm`](../vmx/vsububm.md) | `0x10000400` | vmx | Vector Subtract Unsigned Byte Modulo | +| [`vavgub`](../vmx/vavgub.md) | `0x10000402` | vmx | Vector Average Unsigned Byte | +| [`vand`](../vmx/vand.md) | `0x10000404` | vmx | Vector Logical AND | +| [`vmaxfp`](../vmx/vmaxfp.md) | `0x1000040a` | vmx | Vector Maximum Floating Point | +| [`vslo`](../vmx/vslo.md) | `0x1000040c` | vmx | Vector Shift Left by Octet | +| [`vsubuhm`](../vmx/vsubuhm.md) | `0x10000440` | vmx | Vector Subtract Unsigned Half Word Modulo | +| [`vavguh`](../vmx/vavguh.md) | `0x10000442` | vmx | Vector Average Unsigned Half Word | +| [`vandc`](../vmx/vandc.md) | `0x10000444` | vmx | Vector Logical AND with Complement | +| [`vminfp`](../vmx/vminfp.md) | `0x1000044a` | vmx | Vector Minimum Floating Point | +| [`vsro`](../vmx/vsro.md) | `0x1000044c` | vmx | Vector Shift Right Octet | +| [`vsubuwm`](../vmx/vsubuwm.md) | `0x10000480` | vmx | Vector Subtract Unsigned Word Modulo | +| [`vavguw`](../vmx/vavguw.md) | `0x10000482` | vmx | Vector Average Unsigned Word | +| [`vor`](../vmx/vor.md) | `0x10000484` | vmx | Vector Logical OR | +| [`vxor`](../vmx/vxor.md) | `0x100004c4` | vmx | Vector Logical XOR | +| [`vavgsb`](../vmx/vavgsb.md) | `0x10000502` | vmx | Vector Average Signed Byte | +| [`vnor`](../vmx/vnor.md) | `0x10000504` | vmx | Vector Logical NOR | +| [`vavgsh`](../vmx/vavgsh.md) | `0x10000542` | vmx | Vector Average Signed Half Word | +| [`vsubcuw`](../vmx/vsubcuw.md) | `0x10000580` | vmx | Vector Subtract Carryout Unsigned Word | +| [`vavgsw`](../vmx/vavgsw.md) | `0x10000582` | vmx | Vector Average Signed Word | +| [`vsububs`](../vmx/vsububs.md) | `0x10000600` | vmx | Vector Subtract Unsigned Byte Saturate | +| [`mfvscr`](../control/mfvscr.md) | `0x10000604` | control | Move from VSCR | +| [`vsum4ubs`](../vmx/vsum4ubs.md) | `0x10000608` | vmx | Vector Sum Across Partial (1/4) Unsigned Byte Saturate | +| [`vsubuhs`](../vmx/vsubuhs.md) | `0x10000640` | vmx | Vector Subtract Unsigned Half Word Saturate | +| [`mtvscr`](../control/mtvscr.md) | `0x10000644` | control | Move to VSCR | +| [`vsum4shs`](../vmx/vsum4shs.md) | `0x10000648` | vmx | Vector Sum Across Partial (1/4) Signed Half Word Saturate | +| [`vsubuws`](../vmx/vsubuws.md) | `0x10000680` | vmx | Vector Subtract Unsigned Word Saturate | +| [`vsum2sws`](../vmx/vsum2sws.md) | `0x10000688` | vmx | Vector Sum Across Partial (1/2) Signed Word Saturate | +| [`vsubsbs`](../vmx/vsubsbs.md) | `0x10000700` | vmx | Vector Subtract Signed Byte Saturate | +| [`vsum4sbs`](../vmx/vsum4sbs.md) | `0x10000708` | vmx | Vector Sum Across Partial (1/4) Signed Byte Saturate | +| [`vsubshs`](../vmx/vsubshs.md) | `0x10000740` | vmx | Vector Subtract Signed Half Word Saturate | +| [`vsubsws`](../vmx/vsubsws.md) | `0x10000780` | vmx | Vector Subtract Signed Word Saturate | +| [`vsumsws`](../vmx/vsumsws.md) | `0x10000788` | vmx | Vector Sum Across Signed Word Saturate | + + diff --git a/migration/project-root/ppc-manual/forms/VX128.md b/migration/project-root/ppc-manual/forms/VX128.md new file mode 100644 index 0000000..e22b37e --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128.md @@ -0,0 +1,60 @@ +# Form `VX128` — VX128 — VMX128 3-operand (register-fused) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vaddfp128`](../vmx128/vaddfp.md) | `0x14000010` | vmx | Vector128 Add Floating Point | +| [`vsubfp128`](../vmx128/vsubfp.md) | `0x14000050` | vmx | Vector128 Subtract Floating Point | +| [`vmulfp128`](../vmx128/vmulfp128.md) | `0x14000090` | vmx | Vector128 Multiply Floating-Point | +| [`vmaddfp128`](../vmx128/vmaddfp.md) | `0x140000d0` | vmx | Vector128 Multiply Add Floating Point | +| [`vmaddcfp128`](../vmx128/vmaddcfp128.md) | `0x14000110` | vmx | Vector128 Multiply Add Floating Point | +| [`vnmsubfp128`](../vmx128/vnmsubfp.md) | `0x14000150` | vmx | Vector128 Negative Multiply-Subtract Floating Point | +| [`vmsum3fp128`](../vmx128/vmsum3fp128.md) | `0x14000190` | vmx | Vector128 Multiply Sum 3-way Floating Point | +| [`vmsum4fp128`](../vmx128/vmsum4fp128.md) | `0x140001d0` | vmx | Vector128 Multiply Sum 4-way Floating-Point | +| [`vpkshss128`](../vmx128/vpkshss.md) | `0x14000200` | vmx | Vector128 Pack Signed Half Word Signed Saturate | +| [`vand128`](../vmx128/vand.md) | `0x14000210` | vmx | Vector128 Logical AND | +| [`vpkshus128`](../vmx128/vpkshus.md) | `0x14000240` | vmx | Vector128 Pack Signed Half Word Unsigned Saturate | +| [`vandc128`](../vmx128/vandc.md) | `0x14000250` | vmx | Vector128 Logical AND with Complement | +| [`vpkswss128`](../vmx128/vpkswss.md) | `0x14000280` | vmx | Vector128 Pack Signed Word Signed Saturate | +| [`vnor128`](../vmx128/vnor.md) | `0x14000290` | vmx | Vector128 Logical NOR | +| [`vpkswus128`](../vmx128/vpkswus.md) | `0x140002c0` | vmx | Vector128 Pack Signed Word Unsigned Saturate | +| [`vor128`](../vmx128/vor.md) | `0x140002d0` | vmx | Vector128 Logical OR | +| [`vpkuhum128`](../vmx128/vpkuhum.md) | `0x14000300` | vmx | Vector128 Pack Unsigned Half Word Unsigned Modulo | +| [`vxor128`](../vmx128/vxor.md) | `0x14000310` | vmx | Vector128 Logical XOR | +| [`vpkuhus128`](../vmx128/vpkuhus.md) | `0x14000340` | vmx | Vector128 Pack Unsigned Half Word Unsigned Saturate | +| [`vsel128`](../vmx128/vsel.md) | `0x14000350` | vmx | Vector128 Conditional Select | +| [`vpkuwum128`](../vmx128/vpkuwum.md) | `0x14000380` | vmx | Vector128 Pack Unsigned Word Unsigned Modulo | +| [`vslo128`](../vmx128/vslo.md) | `0x14000390` | vmx | Vector128 Shift Left Octet | +| [`vpkuwus128`](../vmx128/vpkuwus.md) | `0x140003c0` | vmx | Vector128 Pack Unsigned Word Unsigned Saturate | +| [`vsro128`](../vmx128/vsro.md) | `0x140003d0` | vmx | Vector128 Shift Right Octet | +| [`vrlw128`](../vmx128/vrlw.md) | `0x18000050` | vmx | Vector128 Rotate Left Word | +| [`vslw128`](../vmx128/vslw.md) | `0x180000d0` | vmx | Vector128 Shift Left Integer Word | +| [`vsraw128`](../vmx128/vsraw.md) | `0x18000150` | vmx | Vector128 Shift Right Arithmetic Word | +| [`vsrw128`](../vmx128/vsrw.md) | `0x180001d0` | vmx | Vector128 Shift Right Word | +| [`vmaxfp128`](../vmx128/vmaxfp.md) | `0x18000280` | vmx | Vector128 Maximum Floating Point | +| [`vminfp128`](../vmx128/vminfp.md) | `0x180002c0` | vmx | Vector128 Minimum Floating Point | +| [`vmrghw128`](../vmx128/vmrghw.md) | `0x18000300` | vmx | Vector128 Merge High Word | +| [`vmrglw128`](../vmx128/vmrglw.md) | `0x18000340` | vmx | Vector128 Merge Low Word | +| [`vupkhsb128`](../vmx128/vupkhsb.md) | `0x18000380` | vmx | Vector128 Unpack High Signed Byte | +| [`vupklsb128`](../vmx128/vupklsb.md) | `0x180003c0` | vmx | Vector128 Unpack Low Signed Byte | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_1.md b/migration/project-root/ppc-manual/forms/VX128_1.md new file mode 100644 index 0000000..b356cf5 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_1.md @@ -0,0 +1,38 @@ +# Form `VX128_1` — VX128_1 — VMX128 vector load/store + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`lvsl128`](../vmx128/lvsl.md) | `0x10000003` | vmx | Load Vector for Shift Left Indexed 128 | +| [`lvsr128`](../vmx128/lvsr.md) | `0x10000043` | vmx | Load Vector for Shift Right Indexed 128 | +| [`lvewx128`](../memory/lvewx.md) | `0x10000083` | memory | Load Vector Element Word Indexed 128 | +| [`lvx128`](../memory/lvx.md) | `0x100000c3` | memory | Load Vector Indexed 128 | +| [`stvewx128`](../memory/stvewx.md) | `0x10000183` | memory | Store Vector Element Word Indexed 128 | +| [`stvx128`](../memory/stvx.md) | `0x100001c3` | memory | Store Vector Indexed 128 | +| [`lvxl128`](../memory/lvxl.md) | `0x100002c3` | memory | Load Vector Indexed LRU 128 | +| [`stvxl128`](../memory/stvxl.md) | `0x100003c3` | memory | Store Vector Indexed LRU 128 | +| [`lvlx128`](../memory/lvlx.md) | `0x10000403` | memory | Load Vector Left Indexed 128 | +| [`lvrx128`](../memory/lvrx.md) | `0x10000443` | memory | Load Vector Right Indexed 128 | +| [`stvlx128`](../memory/stvlx.md) | `0x10000503` | memory | Store Vector Left Indexed 128 | +| [`stvrx128`](../memory/stvrx.md) | `0x10000543` | memory | Store Vector Right Indexed 128 | +| [`lvlxl128`](../memory/lvlxl.md) | `0x10000603` | memory | Load Vector Left Indexed LRU 128 | +| [`lvrxl128`](../memory/lvrxl.md) | `0x10000643` | memory | Load Vector Right Indexed LRU 128 | +| [`stvlxl128`](../memory/stvlxl.md) | `0x10000703` | memory | Store Vector Left Indexed LRU 128 | +| [`stvrxl128`](../memory/stvrxl.md) | `0x10000743` | memory | Store Vector Right Indexed LRU 128 | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_2.md b/migration/project-root/ppc-manual/forms/VX128_2.md new file mode 100644 index 0000000..22cfa4b --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_2.md @@ -0,0 +1,25 @@ +# Form `VX128_2` — VX128_2 — VMX128 3-operand arithmetic + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 23–25 | `VC` | source C 3-bit field | +| 26 | `VA128h` | source A middle bit | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vperm128`](../vmx128/vperm.md) | `0x14000000` | vmx | Vector128 Permute | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_3.md b/migration/project-root/ppc-manual/forms/VX128_3.md new file mode 100644 index 0000000..98270d6 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_3.md @@ -0,0 +1,37 @@ +# Form `VX128_3` — VX128_3 — VMX128 unary with immediate + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vcfpsxws128`](../vmx128/vcfpsxws128.md) | `0x18000230` | vmx | Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate | +| [`vcfpuxws128`](../vmx128/vcfpuxws128.md) | `0x18000270` | vmx | Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate | +| [`vcsxwfp128`](../vmx128/vcsxwfp128.md) | `0x180002b0` | vmx | Vector128 Convert From Signed Fixed-Point Word to Floating-Point | +| [`vcuxwfp128`](../vmx128/vcuxwfp128.md) | `0x180002f0` | vmx | Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point | +| [`vrfim128`](../vmx128/vrfim.md) | `0x18000330` | vmx | Vector128 Round to Floating-Point Integer toward -Infinity | +| [`vrfin128`](../vmx128/vrfin.md) | `0x18000370` | vmx | Vector128 Round to Floating-Point Integer Nearest | +| [`vrfip128`](../vmx128/vrfip.md) | `0x180003b0` | vmx | Vector128 Round to Floating-Point Integer toward +Infinity | +| [`vrfiz128`](../vmx128/vrfiz.md) | `0x180003f0` | vmx | Vector128 Round to Floating-Point Integer toward Zero | +| [`vrefp128`](../vmx128/vrefp.md) | `0x18000630` | vmx | Vector128 Reciprocal Estimate Floating Point | +| [`vrsqrtefp128`](../vmx128/vrsqrtefp.md) | `0x18000670` | vmx | Vector128 Reciprocal Square Root Estimate Floating Point | +| [`vexptefp128`](../vmx128/vexptefp.md) | `0x180006b0` | vmx | Vector128 Log2 Estimate Floating Point | +| [`vlogefp128`](../vmx128/vlogefp.md) | `0x180006f0` | vmx | Vector128 Log2 Estimate Floating Point | +| [`vspltw128`](../vmx128/vspltw.md) | `0x18000730` | vmx | Vector128 Splat Word | +| [`vspltisw128`](../vmx128/vspltisw.md) | `0x18000770` | vmx | Vector128 Splat Immediate Signed Word | +| [`vupkd3d128`](../vmx128/vupkd3d128.md) | `0x180007f0` | vmx | Vector128 Unpack D3Dtype | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_4.md b/migration/project-root/ppc-manual/forms/VX128_4.md new file mode 100644 index 0000000..242c0d4 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_4.md @@ -0,0 +1,25 @@ +# Form `VX128_4` — VX128_4 — VMX128 with sub-opcode selector + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–23 | `XO` | extended opcode | +| 24–25 | `z` | sub-operation selector | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vpkd3d128`](../vmx128/vpkd3d128.md) | `0x18000610` | vmx | Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert | +| [`vrlimi128`](../vmx128/vrlimi128.md) | `0x18000710` | vmx | Vector128 Rotate Left Immediate and Mask Insert | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_5.md b/migration/project-root/ppc-manual/forms/VX128_5.md new file mode 100644 index 0000000..27845bf --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_5.md @@ -0,0 +1,25 @@ +# Form `VX128_5` — VX128_5 — VMX128 with shift field + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `SH` | 4-bit shift amount | +| 26 | `VA128h` | source A middle bit | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vsldoi128`](../vmx128/vsldoi.md) | `0x10000010` | vmx | Vector128 Shift Left Double by Octet Immediate | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_P.md b/migration/project-root/ppc-manual/forms/VX128_P.md new file mode 100644 index 0000000..8270e21 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_P.md @@ -0,0 +1,24 @@ +# Form `VX128_P` — VX128_P — VMX128 permute + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `PERMl` | permute selector low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–22 | `—` | reserved | +| 23–25 | `PERMh` | permute selector high 3 bits | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vpermwi128`](../vmx128/vpermwi128.md) | `0x18000210` | vmx | Vector128 Permutate Word Immediate | + + diff --git a/migration/project-root/ppc-manual/forms/VX128_R.md b/migration/project-root/ppc-manual/forms/VX128_R.md new file mode 100644 index 0000000..18dc2a7 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/VX128_R.md @@ -0,0 +1,30 @@ +# Form `VX128_R` — VX128_R — VMX128 compare (with Rc → CR6) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`vcmpeqfp128`](../vmx128/vcmpeqfp.md) | `0x18000000` | vmx | Vector128 Compare Equal-to Floating Point | +| [`vcmpgefp128`](../vmx128/vcmpgefp.md) | `0x18000080` | vmx | Vector128 Compare Greater-Than-or-Equal-to Floating Point | +| [`vcmpgtfp128`](../vmx128/vcmpgtfp.md) | `0x18000100` | vmx | Vector128 Compare Greater-Than Floating-Point | +| [`vcmpbfp128`](../vmx128/vcmpbfp.md) | `0x18000180` | vmx | Vector128 Compare Bounds Floating Point | +| [`vcmpequw128`](../vmx128/vcmpequw.md) | `0x18000200` | vmx | Vector128 Compare Equal-to Unsigned Word | + + diff --git a/migration/project-root/ppc-manual/forms/X.md b/migration/project-root/ppc-manual/forms/X.md new file mode 100644 index 0000000..472d081 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/X.md @@ -0,0 +1,138 @@ +# Form `X` — X — Extended (10-bit extended opcode) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`cmp`](../alu/cmp.md) | `0x7c000000` | integer | Compare | +| [`tw`](../branch/tw.md) | `0x7c000008` | branch | Trap Word | +| [`lvsl`](../vmx/lvsl.md) | `0x7c00000c` | vmx | Load Vector for Shift Left Indexed | +| [`lvebx`](../memory/lvebx.md) | `0x7c00000e` | memory | Load Vector Element Byte Indexed | +| [`mfcr`](../control/mfcr.md) | `0x7c000026` | control | Move from Condition Register | +| [`lwarx`](../memory/lwarx.md) | `0x7c000028` | memory | Load Word and Reserve Indexed | +| [`ldx`](../memory/ld.md) | `0x7c00002a` | memory | Load Doubleword Indexed | +| [`lwzx`](../memory/lwz.md) | `0x7c00002e` | memory | Load Word and Zero Indexed | +| [`slwx`](../alu/slwx.md) | `0x7c000030` | integer | Shift Left Word | +| [`cntlzwx`](../alu/cntlzwx.md) | `0x7c000034` | integer | Count Leading Zeros Word | +| [`sldx`](../alu/sldx.md) | `0x7c000036` | integer | Shift Left Doubleword | +| [`andx`](../alu/andx.md) | `0x7c000038` | integer | AND | +| [`cmpl`](../alu/cmpl.md) | `0x7c000040` | integer | Compare Logical | +| [`lvsr`](../vmx/lvsr.md) | `0x7c00004c` | vmx | Load Vector for Shift Right Indexed | +| [`lvehx`](../memory/lvehx.md) | `0x7c00004e` | memory | Load Vector Element Half Word Indexed | +| [`ldux`](../memory/ld.md) | `0x7c00006a` | memory | Load Doubleword with Update Indexed | +| [`dcbst`](../memory/dcbst.md) | `0x7c00006c` | memory | Data Cache Block Store | +| [`lwzux`](../memory/lwz.md) | `0x7c00006e` | memory | Load Word and Zero with Update Indexed | +| [`cntlzdx`](../alu/cntlzdx.md) | `0x7c000074` | integer | Count Leading Zeros Doubleword | +| [`andcx`](../alu/andcx.md) | `0x7c000078` | integer | AND with Complement | +| [`td`](../branch/td.md) | `0x7c000088` | branch | Trap Doubleword | +| [`lvewx`](../memory/lvewx.md) | `0x7c00008e` | memory | Load Vector Element Word Indexed | +| [`mfmsr`](../control/mfmsr.md) | `0x7c0000a6` | control | Move from Machine State Register | +| [`ldarx`](../memory/ldarx.md) | `0x7c0000a8` | memory | Load Doubleword and Reserve Indexed | +| [`dcbf`](../memory/dcbf.md) | `0x7c0000ac` | memory | Data Cache Block Flush | +| [`lbzx`](../memory/lbz.md) | `0x7c0000ae` | memory | Load Byte and Zero Indexed | +| [`lvx`](../memory/lvx.md) | `0x7c0000ce` | memory | Load Vector Indexed | +| [`lbzux`](../memory/lbz.md) | `0x7c0000ee` | memory | Load Byte and Zero with Update Indexed | +| [`norx`](../alu/norx.md) | `0x7c0000f8` | integer | NOR | +| [`stvebx`](../memory/stvebx.md) | `0x7c00010e` | memory | Store Vector Element Byte Indexed | +| [`mtmsr`](../control/mtmsr.md) | `0x7c000124` | control | Move to Machine State Register | +| [`stdx`](../memory/std.md) | `0x7c00012a` | memory | Store Doubleword Indexed | +| [`stwcx`](../memory/stwcx.md) | `0x7c00012d` | memory | Store Word Conditional Indexed | +| [`stwx`](../memory/stw.md) | `0x7c00012e` | memory | Store Word Indexed | +| [`stvehx`](../memory/stvehx.md) | `0x7c00014e` | memory | Store Vector Element Half Word Indexed | +| [`mtmsrd`](../control/mtmsrd.md) | `0x7c000164` | control | Move to Machine State Register Doubleword | +| [`stdux`](../memory/std.md) | `0x7c00016a` | memory | Store Doubleword with Update Indexed | +| [`stwux`](../memory/stw.md) | `0x7c00016e` | memory | Store Word with Update Indexed | +| [`stvewx`](../memory/stvewx.md) | `0x7c00018e` | memory | Store Vector Element Word Indexed | +| [`stdcx`](../memory/stdcx.md) | `0x7c0001ad` | memory | Store Doubleword Conditional Indexed | +| [`stbx`](../memory/stb.md) | `0x7c0001ae` | memory | Store Byte Indexed | +| [`stvx`](../memory/stvx.md) | `0x7c0001ce` | memory | Store Vector Indexed | +| [`dcbtst`](../memory/dcbtst.md) | `0x7c0001ec` | memory | Data Cache Block Touch for Store | +| [`stbux`](../memory/stb.md) | `0x7c0001ee` | memory | Store Byte with Update Indexed | +| [`dcbt`](../memory/dcbt.md) | `0x7c00022c` | memory | Data Cache Block Touch | +| [`lhzx`](../memory/lhz.md) | `0x7c00022e` | memory | Load Half Word and Zero Indexed | +| [`eqvx`](../alu/eqvx.md) | `0x7c000238` | integer | Equivalent | +| [`lhzux`](../memory/lhz.md) | `0x7c00026e` | memory | Load Half Word and Zero with Update Indexed | +| [`xorx`](../alu/xorx.md) | `0x7c000278` | integer | XOR | +| [`lwax`](../memory/lwa.md) | `0x7c0002aa` | memory | Load Word Algebraic Indexed | +| [`lhax`](../memory/lha.md) | `0x7c0002ae` | memory | Load Half Word Algebraic Indexed | +| [`lvxl`](../memory/lvxl.md) | `0x7c0002ce` | memory | Load Vector Indexed LRU | +| [`lwaux`](../memory/lwa.md) | `0x7c0002ea` | memory | Load Word Algebraic with Update Indexed | +| [`lhaux`](../memory/lha.md) | `0x7c0002ee` | memory | Load Half Word Algebraic with Update Indexed | +| [`sthx`](../memory/sth.md) | `0x7c00032e` | memory | Store Half Word Indexed | +| [`orcx`](../alu/orcx.md) | `0x7c000338` | integer | OR with Complement | +| [`sthux`](../memory/sth.md) | `0x7c00036e` | memory | Store Half Word with Update Indexed | +| [`orx`](../alu/orx.md) | `0x7c000378` | integer | OR | +| [`dcbi`](../memory/dcbi.md) | `0x7c0003ac` | memory | Data Cache Block Invalidate | +| [`nandx`](../alu/nandx.md) | `0x7c0003b8` | integer | NAND | +| [`stvxl`](../memory/stvxl.md) | `0x7c0003ce` | memory | Store Vector Indexed LRU | +| [`mcrxr`](../control/mcrxr.md) | `0x7c000400` | control | Move to Condition Register from XER | +| [`lvlx`](../memory/lvlx.md) | `0x7c00040e` | memory | Load Vector Left Indexed | +| [`ldbrx`](../memory/ldbrx.md) | `0x7c000428` | memory | Load Doubleword Byte-Reverse Indexed | +| [`lswx`](../memory/lswx.md) | `0x7c00042a` | memory | Load String Word Indexed | +| [`lwbrx`](../memory/lwbrx.md) | `0x7c00042c` | memory | Load Word Byte-Reverse Indexed | +| [`lfsx`](../memory/lfs.md) | `0x7c00042e` | memory | Load Floating-Point Single Indexed | +| [`srwx`](../alu/srwx.md) | `0x7c000430` | integer | Shift Right Word | +| [`srdx`](../alu/srdx.md) | `0x7c000436` | integer | Shift Right Doubleword | +| [`lvrx`](../memory/lvrx.md) | `0x7c00044e` | memory | Load Vector Right Indexed | +| [`lfsux`](../memory/lfs.md) | `0x7c00046e` | memory | Load Floating-Point Single with Update Indexed | +| [`lswi`](../memory/lswi.md) | `0x7c0004aa` | memory | Load String Word Immediate | +| [`sync`](../alu/sync.md) | `0x7c0004ac` | integer | Synchronize | +| [`lfdx`](../memory/lfd.md) | `0x7c0004ae` | memory | Load Floating-Point Double Indexed | +| [`lfdux`](../memory/lfd.md) | `0x7c0004ee` | memory | Load Floating-Point Double with Update Indexed | +| [`stvlx`](../memory/stvlx.md) | `0x7c00050e` | memory | Store Vector Left Indexed | +| [`stdbrx`](../memory/stdbrx.md) | `0x7c000528` | memory | Store Doubleword Byte-Reverse Indexed | +| [`stswx`](../memory/stswx.md) | `0x7c00052a` | memory | Store String Word Indexed | +| [`stwbrx`](../memory/stwbrx.md) | `0x7c00052c` | memory | Store Word Byte-Reverse Indexed | +| [`stfsx`](../memory/stfs.md) | `0x7c00052e` | memory | Store Floating-Point Single Indexed | +| [`stvrx`](../memory/stvrx.md) | `0x7c00054e` | memory | Store Vector Right Indexed | +| [`stfsux`](../memory/stfs.md) | `0x7c00056e` | memory | Store Floating-Point Single with Update Indexed | +| [`stswi`](../memory/stswi.md) | `0x7c0005aa` | memory | Store String Word Immediate | +| [`stfdx`](../memory/stfd.md) | `0x7c0005ae` | memory | Store Floating-Point Double Indexed | +| [`stfdux`](../memory/stfd.md) | `0x7c0005ee` | memory | Store Floating-Point Double with Update Indexed | +| [`lvlxl`](../memory/lvlxl.md) | `0x7c00060e` | memory | Load Vector Left Indexed LRU | +| [`lhbrx`](../memory/lhbrx.md) | `0x7c00062c` | memory | Load Half Word Byte-Reverse Indexed | +| [`srawx`](../alu/srawx.md) | `0x7c000630` | integer | Shift Right Algebraic Word | +| [`sradx`](../alu/sradx.md) | `0x7c000634` | integer | Shift Right Algebraic Doubleword | +| [`lvrxl`](../memory/lvrxl.md) | `0x7c00064e` | memory | Load Vector Right Indexed LRU | +| [`srawix`](../alu/srawix.md) | `0x7c000670` | integer | Shift Right Algebraic Word Immediate | +| [`eieio`](../alu/eieio.md) | `0x7c0006ac` | integer | Enforce In-Order Execution of I/O | +| [`stvlxl`](../memory/stvlxl.md) | `0x7c00070e` | memory | Store Vector Left Indexed LRU | +| [`sthbrx`](../memory/sthbrx.md) | `0x7c00072c` | memory | Store Half Word Byte-Reverse Indexed | +| [`extshx`](../alu/extshx.md) | `0x7c000734` | integer | Extend Sign Half Word | +| [`stvrxl`](../memory/stvrxl.md) | `0x7c00074e` | memory | Store Vector Right Indexed LRU | +| [`extsbx`](../alu/extsbx.md) | `0x7c000774` | integer | Extend Sign Byte | +| [`icbi`](../memory/icbi.md) | `0x7c0007ac` | memory | Instruction Cache Block Invalidate | +| [`stfiwx`](../memory/stfiwx.md) | `0x7c0007ae` | memory | Store Floating-Point as Integer Word Indexed | +| [`extswx`](../alu/extswx.md) | `0x7c0007b4` | integer | Extend Sign Word | +| [`fcmpu`](../fpu/fcmpu.md) | `0xfc000000` | fpu | Floating Compare Unordered | +| [`frspx`](../fpu/frspx.md) | `0xfc000018` | fpu | Floating Round to Single | +| [`fctiwx`](../fpu/fctiwx.md) | `0xfc00001c` | fpu | Floating Convert to Integer Word | +| [`fctiwzx`](../fpu/fctiwzx.md) | `0xfc00001e` | fpu | Floating Convert to Integer Word with Round Toward Zero | +| [`fcmpo`](../fpu/fcmpo.md) | `0xfc000040` | fpu | Floating Compare Ordered | +| [`mtfsb1x`](../control/mtfsb1x.md) | `0xfc00004c` | control | Move to FPSCR Bit 1 | +| [`fnegx`](../fpu/fnegx.md) | `0xfc000050` | fpu | Floating Negate | +| [`mcrfs`](../control/mcrfs.md) | `0xfc000080` | control | Move to Condition Register from FPSCR | +| [`mtfsb0x`](../control/mtfsb0x.md) | `0xfc00008c` | control | Move to FPSCR Bit 0 | +| [`fmrx`](../fpu/fmrx.md) | `0xfc000090` | fpu | Floating Move Register | +| [`mtfsfix`](../control/mtfsfix.md) | `0xfc00010c` | control | Move to FPSCR Field Immediate | +| [`fnabsx`](../fpu/fnabsx.md) | `0xfc000110` | fpu | Floating Negative Absolute Value | +| [`fabsx`](../fpu/fabsx.md) | `0xfc000210` | fpu | Floating Absolute Value | +| [`mffsx`](../control/mffsx.md) | `0xfc00048e` | control | Move from FPSCR | +| [`fctidx`](../fpu/fctidx.md) | `0xfc00065c` | fpu | Floating Convert to Integer Doubleword | +| [`fctidzx`](../fpu/fctidzx.md) | `0xfc00065e` | fpu | Floating Convert to Integer Doubleword with Round Toward Zero | +| [`fcfidx`](../fpu/fcfidx.md) | `0xfc00069c` | fpu | Floating Convert From Integer Doubleword | + + diff --git a/migration/project-root/ppc-manual/forms/XFL.md b/migration/project-root/ppc-manual/forms/XFL.md new file mode 100644 index 0000000..50d0b38 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/XFL.md @@ -0,0 +1,23 @@ +# Form `XFL` — XFL — Floating Fields (mtfsf) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (63) | +| 6 | `L` | field-select behaviour | +| 7–14 | `FM` | FPSCR field mask | +| 15 | `W` | immediate-value flag | +| 16–20 | `FRB` | source FPR | +| 21–30 | `XO` | extended opcode | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`mtfsfx`](../control/mtfsfx.md) | `0xfc00058e` | control | Move to FPSCR Fields | + + diff --git a/migration/project-root/ppc-manual/forms/XFX.md b/migration/project-root/ppc-manual/forms/XFX.md new file mode 100644 index 0000000..09c94e6 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/XFX.md @@ -0,0 +1,24 @@ +# Form `XFX` — XFX — Fixed (SPR/TBR/CR-field access) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination / source GPR | +| 11–20 | `spr/tbr/FXM` | SPR/TBR number (byte-swapped halves) or CR field mask | +| 21–30 | `XO` | extended opcode | +| 31 | `—` | reserved | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`mtcrf`](../control/mtcrf.md) | `0x7c000120` | control | Move to Condition Register Fields | +| [`mfspr`](../control/mfspr.md) | `0x7c0002a6` | control | Move from Special-Purpose Register | +| [`mftb`](../control/mftb.md) | `0x7c0002e6` | control | Move from Time Base | +| [`mtspr`](../control/mtspr.md) | `0x7c0003a6` | control | Move to Special-Purpose Register | + + diff --git a/migration/project-root/ppc-manual/forms/XL.md b/migration/project-root/ppc-manual/forms/XL.md new file mode 100644 index 0000000..819bdae --- /dev/null +++ b/migration/project-root/ppc-manual/forms/XL.md @@ -0,0 +1,33 @@ +# Form `XL` — XL — Extended, Link (branch-to-LR/CTR, CR logical) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (19) | +| 6–10 | `BT/BO` | target / branch options | +| 11–15 | `BA/BI` | source A / CR bit to test | +| 16–20 | `BB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `LK` | link flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`mcrf`](../control/mcrf.md) | `0x4c000000` | control | Move Condition Register Field | +| [`bclrx`](../branch/bclrx.md) | `0x4c000020` | branch | Branch Conditional to Link Register | +| [`crnor`](../control/crnor.md) | `0x4c000042` | control | Condition Register NOR | +| [`crandc`](../control/crandc.md) | `0x4c000102` | control | Condition Register AND with Complement | +| [`isync`](../alu/isync.md) | `0x4c00012c` | integer | Instruction Synchronize | +| [`crxor`](../control/crxor.md) | `0x4c000182` | control | Condition Register XOR | +| [`crnand`](../control/crnand.md) | `0x4c0001c2` | control | Condition Register NAND | +| [`crand`](../control/crand.md) | `0x4c000202` | control | Condition Register AND | +| [`creqv`](../control/creqv.md) | `0x4c000242` | control | Condition Register Equivalent | +| [`crorc`](../control/crorc.md) | `0x4c000342` | control | Condition Register OR with Complement | +| [`cror`](../control/cror.md) | `0x4c000382` | control | Condition Register OR | +| [`bcctrx`](../branch/bcctrx.md) | `0x4c000420` | branch | Branch Conditional to Count Register | + + diff --git a/migration/project-root/ppc-manual/forms/XO.md b/migration/project-root/ppc-manual/forms/XO.md new file mode 100644 index 0000000..4b72d71 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/XO.md @@ -0,0 +1,43 @@ +# Form `XO` — XO — Extended, Overflow (ALU with OE/Rc) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RT` | destination GPR | +| 11–15 | `RA` | source A | +| 16–20 | `RB` | source B | +| 21 | `OE` | overflow-enable flag | +| 22–30 | `XO` | extended opcode (9 bits) | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`subfcx`](../alu/subfcx.md) | `0x7c000010` | integer | Subtract From Carrying | +| [`mulhdux`](../alu/mulhdux.md) | `0x7c000012` | integer | Multiply High Doubleword Unsigned | +| [`addcx`](../alu/addcx.md) | `0x7c000014` | integer | Add Carrying | +| [`mulhwux`](../alu/mulhwux.md) | `0x7c000016` | integer | Multiply High Word Unsigned | +| [`subfx`](../alu/subfx.md) | `0x7c000050` | integer | Subtract From | +| [`mulhdx`](../alu/mulhdx.md) | `0x7c000092` | integer | Multiply High Doubleword | +| [`mulhwx`](../alu/mulhwx.md) | `0x7c000096` | integer | Multiply High Word | +| [`negx`](../alu/negx.md) | `0x7c0000d0` | integer | Negate | +| [`subfex`](../alu/subfex.md) | `0x7c000110` | integer | Subtract From Extended | +| [`addex`](../alu/addex.md) | `0x7c000114` | integer | Add Extended | +| [`subfzex`](../alu/subfzex.md) | `0x7c000190` | integer | Subtract From Zero Extended | +| [`addzex`](../alu/addzex.md) | `0x7c000194` | integer | Add to Zero Extended | +| [`subfmex`](../alu/subfmex.md) | `0x7c0001d0` | integer | Subtract From Minus One Extended | +| [`mulldx`](../alu/mulldx.md) | `0x7c0001d2` | integer | Multiply Low Doubleword | +| [`addmex`](../alu/addmex.md) | `0x7c0001d4` | integer | Add to Minus One Extended | +| [`mullwx`](../alu/mullwx.md) | `0x7c0001d6` | integer | Multiply Low Word | +| [`addx`](../alu/addx.md) | `0x7c000214` | integer | Add | +| [`divdux`](../alu/divdux.md) | `0x7c000392` | integer | Divide Doubleword Unsigned | +| [`divwux`](../alu/divwux.md) | `0x7c000396` | integer | Divide Word Unsigned | +| [`divdx`](../alu/divdx.md) | `0x7c0003d2` | integer | Divide Doubleword | +| [`divwx`](../alu/divwx.md) | `0x7c0003d6` | integer | Divide Word | + + diff --git a/migration/project-root/ppc-manual/forms/XS.md b/migration/project-root/ppc-manual/forms/XS.md new file mode 100644 index 0000000..dd79c03 --- /dev/null +++ b/migration/project-root/ppc-manual/forms/XS.md @@ -0,0 +1,23 @@ +# Form `XS` — XS — Extended, Shift (64-bit sradi) + +## Bit Layout + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `RS` | source GPR | +| 11–15 | `RA` | destination GPR | +| 16–20 | `sh` | shift amount low 5 bits | +| 21–29 | `XO` | extended opcode (9 bits) | +| 30 | `sh5` | shift amount high bit | +| 31 | `Rc` | record-form flag | + +## Instructions Using This Form + + + +| Mnemonic | Opcode | Group | Description | +| --- | --- | --- | --- | +| [`sradix`](../alu/sradix.md) | `0x7c000674` | integer | Shift Right Algebraic Doubleword Immediate | + + diff --git a/migration/project-root/ppc-manual/fpu/fabsx.md b/migration/project-root/ppc-manual/fpu/fabsx.md new file mode 100644 index 0000000..a349397 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fabsx.md @@ -0,0 +1,121 @@ +# `fabsx` — Floating Absolute Value + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000210` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fabs` | `fabsx` | — | Floating Absolute Value | +| `fabs.` | `fabsx` | Rc=1 | Floating Absolute Value | + +## Syntax + +```asm +fabs[Rc] [FD], [FB] +``` + +## Encoding + +### `fabsx` — form `X` + +- **Opcode word:** `0xfc000210` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `264` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fabsx: read | Source B floating-point register. | +| `FD` | fabsx: write | Destination floating-point register. | +| `CR` | fabsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `fabsx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fabsx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +FRT <- clear_sign(FRB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fabsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fabsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:478`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L478) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:909`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L909) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2757-2761`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2757-L2761) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fabsx => { + ctx.fpr[instr.rd()] = ctx.fpr[instr.rb()].abs(); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-pattern operation, no rounding.** `fabs` clears the sign bit (bit 0) of the source FPR's binary64 representation and writes the 64-bit value to the destination unchanged otherwise. No precision loss, no FPSCR exception bits. The mnemonic does not have an `s` variant — there is one form regardless of whether the operand is interpreted as binary32 or binary64. +- **NaN handling.** `fabs(NaN)` returns the same NaN with the sign bit cleared. The signalling/quiet bit is **not** modified, and `FPSCR[VXSNAN]` is **not** raised. xenia-rs uses `f64::abs`, which matches: it is bit-level `x & 0x7FFF_FFFF_FFFF_FFFF`. +- **Special values.** `fabs(±0) = +0`; `fabs(±∞) = +∞`; `fabs(±NaN)` = `+NaN` (sign cleared, payload preserved). +- **FPSCR is largely untouched.** Hardware specifies `FPRF` is **not** updated by `fabs`, and no exception bits are raised. Notation in the page header about `FPSCR` write is generic — the only meaningful write is via `Rc=1`. +- **`Rc=1` (`fabs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1 (these bits are typically stale or zero). +- **No `FRA` operand.** X-form, primary 63, XO 264. Reads `FRB` only; bits 11–15 are don't-care. +- **Common idiom.** `fabs` followed by `fcmpu` against a small constant for ULP-sized "near zero" tests; or paired with `fneg`/`fnabs` for sign-set-to-known operations. + +## Related Instructions + +- [`fnegx`](fnegx.md) — flip sign bit. +- [`fnabsx`](fnabsx.md) — absolute value with sign **set** (always negative result). +- [`fmrx`](fmrx.md) — copy FPR (no sign manipulation). +- [`fselx`](fselx.md) — branch-free select; combined with `fabs` for `min`/`max`/`clamp` patterns. +- [`fcmpux`](fcmpu.md), [`fcmpox`](fcmpo.md) — compares often paired with `fabs` for magnitude tests. + +## IBM Reference + +- [AIX 7.3 — `fabs` (Floating Absolute Value)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fabs-floating-absolute-value-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (sign-bit manipulation defined as bit-pattern, not arithmetic). diff --git a/migration/project-root/ppc-manual/fpu/faddsx.md b/migration/project-root/ppc-manual/fpu/faddsx.md new file mode 100644 index 0000000..7bc26a1 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/faddsx.md @@ -0,0 +1,130 @@ +# `faddsx` — Floating Add Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec00002a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fadds` | `faddsx` | — | Floating Add Single | +| `fadds.` | `faddsx` | Rc=1 | Floating Add Single | + +## Syntax + +```asm +fadds[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `faddsx` — form `A` + +- **Opcode word:** `0xec00002a` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `21` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | faddsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | faddsx: read | Source B floating-point register. | +| `FD` | faddsx: write | Destination floating-point register. | +| `CR` | faddsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | faddsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `faddsx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `faddsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- RoundToSingle(FRA + FRB) ; single-precision +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`faddsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="faddsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:46`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L46) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:388`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L388) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2565-2574`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2565-L2574) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::faddsx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_add(ctx, a, b, false); + let result = to_single(ctx, a + b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single precision via double FPRs.** The trailing `s` in the mnemonic means the result is rounded to IEEE-754 binary32 after the addition, then re-encoded into the 64-bit FPR using the binary64 representation of that single-precision value. The host computes `to_single(a + b)`; both source operands are read as full binary64. +- **FPSCR side effects.** Hardware updates `FPRF` (result class), `FR`/`FI` (rounding info), `FX`, and the exception bits — `OX` on overflow, `UX` on underflow, `XX` on inexact, `VXISI` on `±∞ − ±∞`, `VXSNAN` on a signalling-NaN input. xenia-rs does **not** maintain FPSCR in the interpreter snapshot — call this out as a xenia quirk if you depend on cross-instruction FPSCR observation. +- **`Rc=1` (`fadds.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. xenia models this via `update_cr1_from_fpscr()`. +- **NaN propagation.** Any NaN input yields a quiet NaN result; signalling NaNs are quietened (signalling bit cleared) per PowerISA. Host-native `f64 +` may not perform that quietening on every platform. +- **`±∞ − ±∞` after rounding.** Although `+`-shaped, opposite-signed infinities still produce `QNaN(VXISI)`. +- **`FPSCR[NI]` (non-IEEE / flush-to-zero)** is set at Xenon boot, so denormal results normally flush to zero. Xenia inherits host semantics, which is IEEE-compliant by default; titles tuned around flush-to-zero may see slightly different denormal rounding under xenia. +- **Rounding mode** uses `FPSCR[RN]` (00 nearest-even, 01 toward 0, 10 toward +∞, 11 toward −∞). Default is nearest-even and is rarely changed. +- **A-form encoding ignores `FRC`.** Bits 21–25 are don't-care for the add family. + +## Related Instructions + +- [`faddx`](faddx.md) — double-precision sibling. +- [`fsubsx`](fsubsx.md), [`fmulsx`](fmulsx.md), [`fdivsx`](fdivsx.md) — other single-precision arithmetic ops. +- [`fmaddsx`](fmaddsx.md), [`fmsubsx`](fmsubsx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — fused multiply-add single-precision family (single rounding step). +- [`frspx`](frspx.md) — explicit double→single rounding helper; `fadds` is essentially `frsp(fadd)` fused into one rounding. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — read/write FPSCR for rounding-mode and exception control. + +## IBM Reference + +- [AIX 7.3 — `fadds` (Floating Add Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fadds-floating-add-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (single-precision rounding rules and FPSCR side effects). diff --git a/migration/project-root/ppc-manual/fpu/faddx.md b/migration/project-root/ppc-manual/fpu/faddx.md new file mode 100644 index 0000000..a64a65e --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/faddx.md @@ -0,0 +1,143 @@ +# `faddx` — Floating Add + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00002a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fadd` | `faddx` | — | Floating Add | +| `fadd.` | `faddx` | Rc=1 | Floating Add | + +## Syntax + +```asm +fadd[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `faddx` — form `A` + +- **Opcode word:** `0xfc00002a` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `21` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | faddx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | faddx: read | Source B floating-point register. | +| `FD` | faddx: write | Destination floating-point register. | +| `CR` | faddx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | faddx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `faddx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `faddx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- FRA + FRB ; double-precision +``` + +## C Translation Example + +```c +/* fadd / fadd. — IEEE-754 double-precision add (A-form) */ +f[insn.FRT] = f[insn.FRA] + f[insn.FRB]; +if (insn.Rc) update_cr1_from_fpscr(); +/* FPSCR[FPRF, FR, FI, FX, exceptions] implicitly updated by the FPU. */ +``` + +## Implementation References + +**`faddx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="faddx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:38`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L38) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:922`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L922) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2555-2564`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2555-L2564) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::faddx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_add(ctx, a, b, false); + let result = a + b; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +FRT <- round(FRA + FRB, FPSCR[RN]) ; double precision, current rounding mode + +; FPSCR side-effects (always) + FPSCR[FPRF] <- classify(FRT) ; sign / class bits + FPSCR[FR,FI] <- round_info + if overflow then FPSCR[OX] <- 1; FPSCR[FX] <- 1 + if underflow then FPSCR[UX] <- 1; FPSCR[FX] <- 1 + if inexact then FPSCR[XX] <- 1; FPSCR[FX] <- 1 + if NaN input or ±∞−±∞ then FPSCR[VXISI]<- 1; FPSCR[FX] <- 1 + FPSCR[FEX] <- any-enabled-exception + +if Rc then + CR1 <- FPSCR[FX, FEX, VX, OX] ; the four "summary" bits +``` + +## Special Cases & Edge Conditions + +- **Double precision.** `fadd` always operates on IEEE-754 binary64 regardless of whether either source was produced by a single-precision instruction. Single-precision adds use [`faddsx`](faddsx.md) and automatically round the result to binary32 precision. +- **No immediate / carry / OE.** FPU arithmetic has no immediate forms, no carry, and no overflow-enable bit. `Rc` is the only modifier — it writes `CR1` from the four top FPSCR bits. +- **FPSCR is always updated.** Even the non-record form (`fadd`) updates `FPSCR[FPRF, FR, FI, FX, …]` as a side effect of execution; xenia's interpreter currently **does not** model this, so translations that rely on observing FPSCR bits across a pair of FPU instructions will diverge from hardware. If your translator needs compatible FPSCR state, emit explicit updates — or accept the simplification, which matches real Xbox 360 title behaviour in practice (titles rarely read FPSCR except via `mffs` for exception sanity checks). +- **NaN propagation.** Per IEEE-754, any NaN input produces a NaN output; PowerPC specifies that the *signalling* bit of the result NaN is cleared (quietening a signalling input). Xenia uses host-native `f64 +`, which may preserve the signalling bit on some platforms — assume quietening for correctness. +- **`±∞ − ±∞` is an invalid operation.** Produces a quiet NaN (`QNaN(VXISI)`) and sets `FPSCR[VXISI]`. Xenia emits the host-native NaN. +- **Denormal handling.** Xenon's default mode flushes denormal results to zero (FPSCR[NI] / "non-IEEE mode" bit set at boot). Xenia inherits host semantics by default; if title code explicitly clears NI (rare) you'll get IEEE-compliant denormals from the host FPU. +- **Rounding mode.** `FPSCR[RN]` selects one of four rounding modes (nearest-even, toward 0, toward +∞, toward −∞). Games rarely change RN from the default nearest-even. If your translator needs faithful rounding-mode support emit `fesetround` around the operation. +- **Register encoding.** A-form: `FRT`, `FRA`, `FRB`, `FRC`, `Rc` — but `fadd` ignores `FRC` (the "C" multiplier operand used by `fmadd`-style ops). The `FRC` field is architecturally don't-care but typically encoded as 0. + +## Related Instructions + +- [`faddsx`](faddsx.md) — single-precision add; result is rounded to binary32 then stored as binary64. +- [`fsubx`](fsubx.md), [`fsubsx`](fsubsx.md) — double / single subtract. +- [`fmulx`](fmulx.md), [`fmulsx`](fmulsx.md) — double / single multiply. +- [`fmaddx`](fmaddx.md), [`fmsubx`](fmsubx.md), [`fnmaddx`](fnmaddx.md), [`fnmsubx`](fnmsubx.md) — fused multiply-add family (single-rounding; preferred for dot products). +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — read/write FPSCR. + +## IBM Reference + +- [AIX 7.3 — `fadd` (Floating Add)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fa-fadd-floating-add-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (complete FPSCR and NaN-propagation rules). diff --git a/migration/project-root/ppc-manual/fpu/fcfidx.md b/migration/project-root/ppc-manual/fpu/fcfidx.md new file mode 100644 index 0000000..3ed3168 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fcfidx.md @@ -0,0 +1,142 @@ +# `fcfidx` — Floating Convert From Integer Doubleword + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00069c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fcfid` | `fcfidx` | — | Floating Convert From Integer Doubleword | +| `fcfid.` | `fcfidx` | Rc=1 | Floating Convert From Integer Doubleword | + +## Syntax + +```asm +fcfid[Rc] [FD], [FB] +``` + +## Encoding + +### `fcfidx` — form `X` + +- **Opcode word:** `0xfc00069c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `846` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fcfidx: read | Source B floating-point register. | +| `FD` | fcfidx: write | Destination floating-point register. | +| `CR` | fcfidx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fcfidx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fcfidx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fcfidx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fcfidx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fcfidx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:253`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L253) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:914`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L914) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2872-2885`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2872-L2885) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fcfidx => { + // Convert from integer doubleword: frD = (double)(int64_t)frB_as_bits. + // PPCBUG-224: set XX when |i64| > 2^53 (precision loss in conversion). + let bits = ctx.fpr[instr.rb()].to_bits(); + let i = bits as i64; + let result = i as f64; + if (result as i64) != i { + fpscr::set_exception(ctx, fpscr::XX); + } + ctx.fpr[instr.rd()] = result; + fpscr::set_fprf(ctx, fpscr::classify_fprf(result)); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **64-bit signed integer → binary64.** Reads `FRB` as a 64-bit signed integer (the bits, interpreted as `i64`) and converts it to IEEE-754 binary64. xenia-rs implements this as `bits as i64 as f64`. +- **Loss of precision.** binary64 has 53 bits of significand, so `i64` values with magnitude > 2^53 lose low-order bits. This raises `FPSCR[XX, FX]` (inexact) on hardware. xenia-rs does not update FPSCR (xenia quirk) but the rounded value matches host `f64` rules (round-to-nearest-even by default). +- **Always exact for `|x| <= 2^53`.** Within ±9,007,199,254,740,992 the conversion is bit-exact. +- **Rounding mode.** Uses `FPSCR[RN]`. Default nearest-even. Rust's `as f64` from `i64` uses platform-native conversion which on Xenon-target hosts will respect the FE rounding mode; xenia uses the host default. +- **No NaN/∞ generation.** All `i64` inputs map to finite `f64` outputs (the largest `i64` is well below `f64::MAX`). +- **FPSCR side effects.** Hardware updates `FPRF` (result class) and may set `XX`/`FX` on inexact. xenia does not update FPSCR. +- **`Rc=1` (`fcfid.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 846. Reads `FRB` only. +- **Common pairing.** Used after `lfd` of a stored `i64` to bring an integer into the FP pipeline for arithmetic; the inverse direction is [`fctidx`](fctidx.md) / [`fctidzx`](fctidzx.md). + +## Related Instructions + +- [`fctidx`](fctidx.md), [`fctidzx`](fctidzx.md) — inverse direction (binary64 → 64-bit integer, current rounding / round-toward-zero). +- [`fctiwx`](fctiwx.md), [`fctiwzx`](fctiwzx.md) — 32-bit integer conversion variants. +- [`frspx`](frspx.md) — round to single precision; commonly chained after `fcfid` to produce a `float`. +- `lfd`, `stfd` — load/store doubleword used to move integer values between GPR and FPR via memory. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — control rounding mode used by the conversion. + +## IBM Reference + +- [AIX 7.3 — `fcfid` (Floating Convert From Integer Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fcfid-floating-convert-from-integer-doubleword-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (integer→FP conversion semantics). diff --git a/migration/project-root/ppc-manual/fpu/fcmpo.md b/migration/project-root/ppc-manual/fpu/fcmpo.md new file mode 100644 index 0000000..87f02b2 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fcmpo.md @@ -0,0 +1,164 @@ +# `fcmpo` — Floating Compare Ordered + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000040` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fcmpo` | `fcmpo` | — | Floating Compare Ordered | + +## Syntax + +```asm +fcmpo [CRFD], [FA], [FB] +``` + +## Encoding + +### `fcmpo` — form `X` + +- **Opcode word:** `0xfc000040` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `32` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fcmpo: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fcmpo: read | Source B floating-point register. | +| `CRFD` | fcmpo: write | CR destination field (`crf`, 0–7). | +| `FPSCR` | fcmpo: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fcmpo` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD`, `FPSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `fcmpo`: **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fcmpo`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fcmpo"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:362`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L362) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:901`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L901) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3002-3032`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3002-L3032) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fcmpo => { + // Ordered compare: like fcmpu but also sets VXVC on QNaN (or VXSNAN on SNaN). + let fra = ctx.fpr[instr.ra()]; + let frb = ctx.fpr[instr.rb()]; + let crfd = instr.crfd(); + if fra.is_nan() || frb.is_nan() { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true }; + if fpscr::is_snan(fra) || fpscr::is_snan(frb) { + fpscr::set_exception(ctx, fpscr::VXSNAN | fpscr::VXVC); + } else { + fpscr::set_exception(ctx, fpscr::VXVC); + } + } else if fra < frb { + ctx.cr[crfd] = crate::context::CrField { lt: true, gt: false, eq: false, so: false }; + } else if fra > frb { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: true, eq: false, so: false }; + } else { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: true, so: false }; + } + let fprf = if fra.is_nan() || frb.is_nan() { + 0b0_0001 + } else if fra < frb { + 0b0_1000 + } else if fra > frb { + 0b0_0100 + } else { + 0b0_0010 + }; + fpscr::set_fprf(ctx, fprf); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Ordered compare.** Same CR-field semantics as `fcmpu` (`LT/GT/EQ/SO`), but NaN inputs raise additional FPSCR exceptions: + - Either operand NaN → `FPSCR[VXVC] = 1` (invalid-operation: compare on QNaN/SNaN). + - Either operand signalling NaN → also `FPSCR[VXSNAN] = 1`. + - All NaN cases also set `FX = 1` and `VX = 1`. +- **xenia quirk.** xenia-rs's `fcmpo` body is identical to `fcmpu` — the FPSCR exception bits are not modelled. The xenia source comment explicitly notes "Same as fcmpu but sets FPSCR exception bits for QNaN (not modeled yet)". Title code that polls FPSCR for compare-class invalid-operation will not observe it. +- **CR field bits.** + - `LT` (bit 0) — `FRA < FRB` + - `GT` (bit 1) — `FRA > FRB` + - `EQ` (bit 2) — `FRA == FRB` + - `SO` (bit 3) — unordered (NaN involved) +- **`+0` and `-0` compare equal.** +- **No `Rc` bit.** +- **FPSCR side effects.** Hardware updates `FPSCR[FPCC]`, `FX`, `VX`, and (on NaN) `VXVC`/`VXSNAN`. xenia-rs only updates the CR field. +- **Use case.** Ordered compares are required by C/C++ semantics for `<`, `>`, `<=`, `>=` (which must signal on NaN per IEEE-754). `fcmpu` corresponds to the C `==`/`!=` semantics (which do not signal). +- **Encoding.** X-form, primary 63, XO 32. + +## Related Instructions + +- [`fcmpux`](fcmpu.md) — unordered compare; identical CR result, no `VXVC`. +- `mcrf`, `mcrfs`, `mfcr` — fan-out CR fields after compare. +- `bc`, `bclr`, `bcctr` — conditional branches consume `LT/GT/EQ/SO`. +- [`fselx`](fselx.md) — branch-free alternative for single-key compares. +- [`mcrfs`](mcrfs.md), [`mffsx`](mffsx.md) — move FPSCR/CR. + +## IBM Reference + +- [AIX 7.3 — `fcmpo` (Floating Compare Ordered)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fcmpo-floating-compare-ordered-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (`fcmpo` raises `VXVC` on QNaN; both raise `VXSNAN` on SNaN). diff --git a/migration/project-root/ppc-manual/fpu/fcmpu.md b/migration/project-root/ppc-manual/fpu/fcmpu.md new file mode 100644 index 0000000..f588a92 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fcmpu.md @@ -0,0 +1,161 @@ +# `fcmpu` — Floating Compare Unordered + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fcmpu` | `fcmpu` | — | Floating Compare Unordered | + +## Syntax + +```asm +fcmpu [CRFD], [FA], [FB] +``` + +## Encoding + +### `fcmpu` — form `X` + +- **Opcode word:** `0xfc000000` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fcmpu: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fcmpu: read | Source B floating-point register. | +| `CRFD` | fcmpu: write | CR destination field (`crf`, 0–7). | +| `FPSCR` | fcmpu: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fcmpu` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CRFD`, `FPSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `fcmpu`: **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fcmpu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fcmpu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:365`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L365) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:897`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L897) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2972-3001`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2972-L3001) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fcmpu => { + let fra = ctx.fpr[instr.ra()]; + let frb = ctx.fpr[instr.rb()]; + let crfd = instr.crfd(); + if fra.is_nan() || frb.is_nan() { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: false, so: true }; + // fcmpu: VXSNAN on SNaN input; no VXVC even on QNaN. + if fpscr::is_snan(fra) || fpscr::is_snan(frb) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + } else if fra < frb { + ctx.cr[crfd] = crate::context::CrField { lt: true, gt: false, eq: false, so: false }; + } else if fra > frb { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: true, eq: false, so: false }; + } else { + ctx.cr[crfd] = crate::context::CrField { lt: false, gt: false, eq: true, so: false }; + } + // Also mirror the comparison result into FPSCR[FPRF (FL/FG/FE/FU)]. + let fprf = if fra.is_nan() || frb.is_nan() { + 0b0_0001 + } else if fra < frb { + 0b0_1000 + } else if fra > frb { + 0b0_0100 + } else { + 0b0_0010 + }; + fpscr::set_fprf(ctx, fprf); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unordered compare.** "Unordered" means NaN inputs do **not** signal an invalid-operation exception — they merely set the unordered (`SO`) bit in the destination CR field. Use [`fcmpox`](fcmpo.md) when NaN should raise `VXSNAN`/`VXVC`. +- **CR field bits.** Writes the 4-bit CR field selected by `BF` (`crfd`): + - `LT` (bit 0) — `FRA < FRB` + - `GT` (bit 1) — `FRA > FRB` + - `EQ` (bit 2) — `FRA == FRB` + - `SO` (bit 3) — **unordered** (one or both operands is NaN) +- **NaN handling.** Either operand NaN → set `SO=1`, clear `LT/GT/EQ`. xenia-rs matches. +- **Signalling NaN.** Per PowerISA, `fcmpu` sets `FPSCR[VXSNAN]` if either operand is a signalling NaN, but does **not** set `FPSCR[VXVC]` (the difference vs `fcmpo`). xenia-rs does **not** model this — **xenia quirk**: `fcmpu` and `fcmpo` are observationally identical in xenia. +- **`+0` and `-0` compare equal.** Standard IEEE rule; xenia's host `<` / `>` on `f64` matches. +- **No `Rc` bit.** The CR field destination is encoded in the instruction (`BF`); there's no record-form variant. +- **FPSCR side effects.** Hardware updates `FPSCR[FPCC]` (the four-bit floating-point condition code) and `FPSCR[FX]`. xenia-rs does not maintain `FPCC`. +- **Precision-agnostic.** Compares the full binary64 values; works equally for single-precision values stored in FPRs (they are bit-identical to their double-precision representation). +- **Encoding.** X-form, primary 63, XO 0. Bits 9–10 of `BF` are unused (reserved 0). + +## Related Instructions + +- [`fcmpox`](fcmpo.md) — ordered compare; raises `VXSNAN`/`VXVC` on NaN/SNaN. +- `mcrf`, `mcrfs`, `mfcr` — copy CR fields, useful after `fcmpu` to fan out the result. +- `bc`, `bclr`, `bcctr` — conditional branches consume the CR fields written by `fcmpu`. +- [`fselx`](fselx.md) — branch-free alternative when only the sign of `FRA - FRB` is needed. +- [`mcrfs`](mcrfs.md), [`mffsx`](mffsx.md) — move FPSCR data into the CR. + +## IBM Reference + +- [AIX 7.3 — `fcmpu` (Floating Compare Unordered)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fcmpu-floating-compare-unordered-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (compare semantics, `FPCC` updates, NaN/SNaN exception rules). diff --git a/migration/project-root/ppc-manual/fpu/fctidx.md b/migration/project-root/ppc-manual/fpu/fctidx.md new file mode 100644 index 0000000..33f30a2 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fctidx.md @@ -0,0 +1,149 @@ +# `fctidx` — Floating Convert to Integer Doubleword + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00065c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fctid` | `fctidx` | — | Floating Convert to Integer Doubleword | +| `fctid.` | `fctidx` | Rc=1 | Floating Convert to Integer Doubleword | + +## Syntax + +```asm +fctid[Rc] [FD], [FB] +``` + +## Encoding + +### `fctidx` — form `X` + +- **Opcode word:** `0xfc00065c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `814` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fctidx: read | Source B floating-point register. | +| `FD` | fctidx: write | Destination floating-point register. | +| `CR` | fctidx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fctidx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fctidx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fctidx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fctidx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fctidx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:280`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L280) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:912`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L912) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2886-2906`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2886-L2906) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fctidx => { + // Convert to integer doubleword (round per FPSCR[RN]). + // PPCBUG-229: set XX on inexact (fractional input). + let val = ctx.fpr[instr.rb()]; + let result = if val.is_nan() { + fpscr::set_exception(ctx, fpscr::VXCVI | if fpscr::is_snan(val) { fpscr::VXSNAN } else { 0 }); + 0x8000_0000_0000_0000u64 + } else if val >= (i64::MAX as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x7FFF_FFFF_FFFF_FFFFu64 + } else if val < (i64::MIN as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x8000_0000_0000_0000u64 + } else { + if val != val.trunc() { fpscr::set_exception(ctx, fpscr::XX); } + fpscr::round_to_i64(ctx, val) as u64 + }; + ctx.fpr[instr.rd()] = f64::from_bits(result); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **binary64 → 64-bit signed integer, current rounding mode.** Result is the integer rounded per `FPSCR[RN]`, packed into the 64-bit FPR as raw bits (the FPR is reinterpreted as an `i64` by subsequent `stfd`/integer code). +- **Saturation on out-of-range.** Per PowerISA, values outside `[i64::MIN, i64::MAX]` (or NaN) yield the most-negative integer (`0x8000_0000_0000_0000`) and set `FPSCR[VXCVI, VX, FX]`. xenia-rs special-cases NaN to `0x8000_0000_0000_0000` but **does not saturate** out-of-range finite values — Rust's `as i64` from a too-large `f64` produces an undefined-then-saturated result that may differ from the PPC convention. **xenia quirk:** very-large finite inputs may round to a different sentinel than hardware. +- **xenia round implementation.** xenia uses Rust's `f64::round`, which rounds half-cases **away from zero** (NOT round-to-nearest-even). PowerISA round-to-nearest in default mode rounds half-cases to even. **xenia quirk:** values like `0.5`, `1.5`, `2.5` may produce different integers (xenia: `1, 2, 3`; PPC default: `0, 2, 2`). +- **Rounding mode.** PPC uses `FPSCR[RN]` for the rounding direction. xenia ignores the FPSCR mode and always uses `f64::round` (i.e. round-half-away-from-zero) regardless of `RN`. **xenia quirk:** non-default rounding modes are not respected. +- **Inexact.** Sets `FPSCR[XX, FX]` on any non-integer input. xenia does not update FPSCR. +- **NaN.** Returns sentinel `0x8000_0000_0000_0000` and sets `FPSCR[VXCVI]`. xenia matches the sentinel, but does not raise the FPSCR bit. +- **`Rc=1` (`fctid.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 814. Reads `FRB` only. +- **Pair with `stfd`** to extract the `i64` value to memory or a GPR (Xbox 360 has no direct FPR↔GPR move; round-trip via stack). + +## Related Instructions + +- [`fctidzx`](fctidzx.md) — same conversion but always rounds toward zero (truncation). +- [`fctiwx`](fctiwx.md), [`fctiwzx`](fctiwzx.md) — 32-bit integer variants (saturate to `i32` range). +- [`fcfidx`](fcfidx.md) — inverse direction (`i64` → binary64). +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — control `FPSCR[RN]`. +- `stfd`, `stfiwx` — store the integer-bits FPR to memory; `stfiwx` stores only the low 32 bits (use after `fctiwx` / `fctiwzx`). + +## IBM Reference + +- [AIX 7.3 — `fctid` (Floating Convert to Integer Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fctid-floating-convert-integer-doubleword-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (`VXCVI` is the invalid-conversion exception bit; saturation sentinel is `0x8000_0000_0000_0000`). diff --git a/migration/project-root/ppc-manual/fpu/fctidzx.md b/migration/project-root/ppc-manual/fpu/fctidzx.md new file mode 100644 index 0000000..ccc50e4 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fctidzx.md @@ -0,0 +1,151 @@ +# `fctidzx` — Floating Convert to Integer Doubleword with Round Toward Zero + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00065e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fctidz` | `fctidzx` | — | Floating Convert to Integer Doubleword with Round Toward Zero | +| `fctidz.` | `fctidzx` | Rc=1 | Floating Convert to Integer Doubleword with Round Toward Zero | + +## Syntax + +```asm +fctidz[Rc] [FD], [FB] +``` + +## Encoding + +### `fctidzx` — form `X` + +- **Opcode word:** `0xfc00065e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `815` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fctidzx: read | Source B floating-point register. | +| `FD` | fctidzx: write | Destination floating-point register. | +| `CR` | fctidzx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fctidzx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fctidzx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fctidzx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fctidzx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fctidzx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:285`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L285) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:913`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L913) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2907-2927`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2907-L2927) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fctidzx => { + // Convert to integer doubleword (round toward zero). + // PPCBUG-229: set XX on inexact. + let val = ctx.fpr[instr.rb()]; + let result = if val.is_nan() { + fpscr::set_exception(ctx, fpscr::VXCVI | if fpscr::is_snan(val) { fpscr::VXSNAN } else { 0 }); + 0x8000_0000_0000_0000u64 + } else if val >= (i64::MAX as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x7FFF_FFFF_FFFF_FFFFu64 + } else if val < (i64::MIN as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x8000_0000_0000_0000u64 + } else { + if val != val.trunc() { fpscr::set_exception(ctx, fpscr::XX); } + (val.trunc() as i64) as u64 + }; + ctx.fpr[instr.rd()] = f64::from_bits(result); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **binary64 → 64-bit signed integer, round toward zero.** The "z" suffix forces truncation regardless of `FPSCR[RN]`. xenia-rs uses Rust's `as i64` (which truncates toward zero), bypassing the FPSCR rounding mode entirely — this matches PPC `fctidz` semantics correctly. +- **Saturation on out-of-range.** PowerISA: out-of-range or NaN → `0x8000_0000_0000_0000` and `FPSCR[VXCVI, VX, FX]`. xenia handles NaN explicitly with the sentinel, but uses raw `as i64` for finite values; in current Rust (since 1.45) `as i64` from out-of-range `f64` is **defined to saturate** to `i64::MIN`/`i64::MAX`. So: + - **+∞ or large positive → `i64::MAX`** (`0x7FFF_FFFF_FFFF_FFFF`) under xenia. + - **−∞ or large negative → `i64::MIN`** (`0x8000_0000_0000_0000`) under xenia. + - **PPC** spec returns `0x8000_0000_0000_0000` for both. **xenia quirk:** positive overflow returns the wrong sentinel. +- **NaN.** Returns sentinel `0x8000_0000_0000_0000` (matches PPC). +- **Inexact.** Sets `FPSCR[XX, FX]` on any non-integer input. xenia does not update FPSCR (xenia quirk). +- **No `FPSCR[RN]` dependence.** `fctidz` always truncates; this is the right choice for C/C++ `(int64_t)` casts. +- **`Rc=1` (`fctidz.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 815. Reads `FRB` only. +- **Common pairing.** Translation of C `(int64_t)d` casts; combined with `stfd` to move the value to integer memory. + +## Related Instructions + +- [`fctidx`](fctidx.md) — same conversion but uses `FPSCR[RN]` (default nearest-even on PPC; xenia uses `f64::round` regardless). +- [`fctiwzx`](fctiwzx.md) — 32-bit truncating variant. +- [`fctiwx`](fctiwx.md) — 32-bit `FPSCR[RN]`-rounded variant. +- [`fcfidx`](fcfidx.md) — inverse direction. +- `stfd` — store the integer-bits FPR to memory. + +## IBM Reference + +- [AIX 7.3 — `fctidz` (Floating Convert to Integer Doubleword with Round Toward Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fctidz-floating-convert-integer-doubleword-round-toward-zero-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fctiwx.md b/migration/project-root/ppc-manual/fpu/fctiwx.md new file mode 100644 index 0000000..96bb483 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fctiwx.md @@ -0,0 +1,149 @@ +# `fctiwx` — Floating Convert to Integer Word + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00001c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fctiw` | `fctiwx` | — | Floating Convert to Integer Word | +| `fctiw.` | `fctiwx` | Rc=1 | Floating Convert to Integer Word | + +## Syntax + +```asm +fctiw[Rc] [FD], [FB] +``` + +## Encoding + +### `fctiwx` — form `X` + +- **Opcode word:** `0xfc00001c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `14` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fctiwx: read | Source B floating-point register. | +| `FD` | fctiwx: write | Destination floating-point register. | +| `CR` | fctiwx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fctiwx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fctiwx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fctiwx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fctiwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fctiwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:308`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L308) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:899`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L899) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2928-2948`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2928-L2948) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fctiwx => { + // Convert to integer word (round per FPSCR[RN]). + // PPCBUG-230: set XX on inexact. + let val = ctx.fpr[instr.rb()]; + let result_u32: u32 = if val.is_nan() { + fpscr::set_exception(ctx, fpscr::VXCVI | if fpscr::is_snan(val) { fpscr::VXSNAN } else { 0 }); + 0x8000_0000 + } else if val > (i32::MAX as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x7FFF_FFFF + } else if val < (i32::MIN as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x8000_0000 + } else { + if val != val.trunc() { fpscr::set_exception(ctx, fpscr::XX); } + fpscr::round_to_i32(ctx, val) as u32 + }; + ctx.fpr[instr.rd()] = f64::from_bits(result_u32 as u64); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **binary64 → 32-bit signed integer, current rounding mode.** Result is rounded per `FPSCR[RN]` and packed into the low 32 bits of the destination FPR. The high 32 bits are architecturally undefined per PowerISA but xenia produces zero-extended `u32` (i.e. the high 32 bits are 0). +- **Explicit saturation in xenia.** xenia's body clamps the rounded `f64` to `[i32::MIN as f64, i32::MAX as f64]` before the integer cast — this matches PPC's saturation behaviour for out-of-range positive/negative finite inputs. +- **NaN sentinel.** xenia returns `0x0000_0000_8000_0000` for NaN inputs (i.e. `i32::MIN` in the low word). Matches PPC's `VXCVI` sentinel for NaN/out-of-range. +- **Rounding implementation.** xenia uses `f64::round`, which rounds half-cases **away from zero** rather than to nearest-even. **xenia quirk:** values like `0.5`/`1.5`/`2.5` produce `1`/`2`/`3` under xenia vs `0`/`2`/`2` on PPC default rounding. +- **`FPSCR[RN]` not honored.** xenia always uses `f64::round`, ignoring the rounding-mode field. **xenia quirk** for non-default modes. +- **FPSCR side effects.** PPC: sets `XX`/`FX` on inexact, `VXCVI` on NaN/out-of-range. xenia does not update FPSCR. +- **`Rc=1` (`fctiw.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 14. Reads `FRB` only. +- **Common pairing.** Followed by `stfiwx` to store the low-32-bit integer to memory (`stfd` would write the doubleword including the high bits, which on hardware are undefined). + +## Related Instructions + +- [`fctiwzx`](fctiwzx.md) — 32-bit integer with round-toward-zero (truncation). +- [`fctidx`](fctidx.md), [`fctidzx`](fctidzx.md) — 64-bit integer variants. +- [`fcfidx`](fcfidx.md) — inverse direction (i64 → f64); for i32 → f64, sign-extend then `fcfid`. +- `stfiwx` — store low-32-bits FPR (the canonical companion to `fctiw`/`fctiwz`). +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — control `FPSCR[RN]` (currently a no-op under xenia for this instruction). + +## IBM Reference + +- [AIX 7.3 — `fctiw` (Floating Convert to Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fctiw-floating-convert-integer-word-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (high 32 bits are architecturally undefined; only `stfiwx` is the spec-blessed consumer). diff --git a/migration/project-root/ppc-manual/fpu/fctiwzx.md b/migration/project-root/ppc-manual/fpu/fctiwzx.md new file mode 100644 index 0000000..4d108bf --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fctiwzx.md @@ -0,0 +1,148 @@ +# `fctiwzx` — Floating Convert to Integer Word with Round Toward Zero + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc00001e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fctiwz` | `fctiwzx` | — | Floating Convert to Integer Word with Round Toward Zero | +| `fctiwz.` | `fctiwzx` | Rc=1 | Floating Convert to Integer Word with Round Toward Zero | + +## Syntax + +```asm +fctiwz[Rc] [FD], [FB] +``` + +## Encoding + +### `fctiwzx` — form `X` + +- **Opcode word:** `0xfc00001e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `15` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fctiwzx: read | Source B floating-point register. | +| `FD` | fctiwzx: write | Destination floating-point register. | +| `CR` | fctiwzx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fctiwzx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fctiwzx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fctiwzx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fctiwzx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fctiwzx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:313`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L313) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:27`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L27) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:900`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L900) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2949-2969`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2949-L2969) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fctiwzx => { + // Convert to integer word (round toward zero). + // PPCBUG-230: set XX on inexact. + let val = ctx.fpr[instr.rb()]; + let result_u32: u32 = if val.is_nan() { + fpscr::set_exception(ctx, fpscr::VXCVI | if fpscr::is_snan(val) { fpscr::VXSNAN } else { 0 }); + 0x8000_0000 + } else if val > (i32::MAX as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x7FFF_FFFF + } else if val < (i32::MIN as f64) { + fpscr::set_exception(ctx, fpscr::VXCVI); + 0x8000_0000 + } else { + if val != val.trunc() { fpscr::set_exception(ctx, fpscr::XX); } + val.trunc() as i32 as u32 + }; + ctx.fpr[instr.rd()] = f64::from_bits(result_u32 as u64); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **binary64 → 32-bit signed integer, round toward zero.** Truncates regardless of `FPSCR[RN]`. xenia-rs uses `clamp` to saturate to `[i32::MIN, i32::MAX]` then `as i32`, which truncates — matching PPC `fctiwz` semantics. +- **Most common conversion in compiled code.** Translates C/C++ `(int32_t)f` casts, which require truncation per the C standard. +- **Saturation on out-of-range.** Hardware saturates to `i32::MAX` for large positives, `i32::MIN` for large negatives or NaN, and sets `FPSCR[VXCVI, VX, FX]`. xenia's explicit `clamp` correctly reproduces the saturation, but does not raise FPSCR bits (xenia quirk). +- **NaN sentinel.** xenia returns `0x0000_0000_8000_0000` (i.e. `i32::MIN` in low 32 bits). Matches PPC sentinel. +- **High 32 bits of FPR.** Architecturally undefined per PowerISA, but xenia produces zero-extended `u32`. Use `stfiwx` (store low 32 bits) — never `stfd` — for the canonical "store this integer" idiom. +- **Inexact.** Sets `FPSCR[XX, FX]` on any non-integer input. xenia does not update FPSCR. +- **`Rc=1` (`fctiwz.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 15. Reads `FRB` only. + +## Related Instructions + +- [`fctiwx`](fctiwx.md) — 32-bit integer with `FPSCR[RN]` rounding. +- [`fctidx`](fctidx.md), [`fctidzx`](fctidzx.md) — 64-bit integer variants. +- [`fcfidx`](fcfidx.md) — inverse direction (i64 → f64); for i32 → f64, sign-extend to i64 first. +- `stfiwx` — store low 32 bits of FPR; canonical companion. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — FPSCR control (no effect on `fctiwz` since rounding mode is fixed to truncation). + +## IBM Reference + +- [AIX 7.3 — `fctiwz` (Floating Convert to Integer Word with Round Toward Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fctiwz-floating-convert-integer-word-round-toward-zero-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fdivsx.md b/migration/project-root/ppc-manual/fpu/fdivsx.md new file mode 100644 index 0000000..b2e8e7b --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fdivsx.md @@ -0,0 +1,143 @@ +# `fdivsx` — Floating Divide Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec000024` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fdivs` | `fdivsx` | — | Floating Divide Single | +| `fdivs.` | `fdivsx` | Rc=1 | Floating Divide Single | + +## Syntax + +```asm +fdivs[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `fdivsx` — form `A` + +- **Opcode word:** `0xec000024` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `18` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fdivsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fdivsx: read | Source B floating-point register. | +| `FD` | fdivsx: write | Destination floating-point register. | +| `CR` | fdivsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fdivsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fdivsx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fdivsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fdivsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fdivsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:71`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L71) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:386`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L386) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2627-2637`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2627-L2637) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fdivsx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_div(ctx, a, b); + fpscr::check_zero_divide(ctx, a, b); + let result = to_single(ctx, a / b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && b != 0.0); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single precision.** Result is rounded to IEEE-754 binary32 then re-encoded into the 64-bit FPR. xenia computes `to_single(a / b)`. +- **Divide by zero.** Finite/±0 sets `FPSCR[ZX, FX]` and yields ±∞. xenia returns the host ±∞ but does not update FPSCR (xenia quirk). +- **`0 / 0`** → `FPSCR[VXZDZ, VX, FX]`, quiet NaN result. +- **`±∞ / ±∞`** → `FPSCR[VXIDI, VX, FX]`, quiet NaN result. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, plus exception bits `OX`, `UX`, `XX`, `ZX`, `VXZDZ`, `VXIDI`, `VXSNAN`. +- **`Rc=1` (`fdivs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** returns ±∞ and sets `OX`/`XX`/`FX`. +- **Performance.** Hardware divide is multi-cycle. Title code commonly uses `fres` + Newton-Raphson for hot loops; this instruction is reserved for non-critical paths. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. +- **Encoding.** A-form, primary 59, XO 18. + +## Related Instructions + +- [`fdivx`](fdivx.md) — double-precision sibling. +- [`fresx`](fresx.md) — reciprocal estimate, used to build software divides. +- [`fmulsx`](fmulsx.md), [`faddsx`](faddsx.md), [`fsubsx`](fsubsx.md) — companion single-precision arithmetic. +- [`fmaddsx`](fmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — Newton-Raphson refinement helpers. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fdivs` (Floating Divide Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fdivs-floating-divide-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fdivx.md b/migration/project-root/ppc-manual/fpu/fdivx.md new file mode 100644 index 0000000..2e0de3e --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fdivx.md @@ -0,0 +1,133 @@ +# `fdivx` — Floating Divide + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc000024` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fdiv` | `fdivx` | — | Floating Divide | +| `fdiv.` | `fdivx` | Rc=1 | Floating Divide | + +## Syntax + +```asm +fdiv[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `fdivx` — form `A` + +- **Opcode word:** `0xfc000024` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `18` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fdivx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fdivx: read | Source B floating-point register. | +| `FD` | fdivx: write | Destination floating-point register. | +| `CR` | fdivx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fdivx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fdivx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fdivx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- FRA ÷ FRB +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fdivx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fdivx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:55`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L55) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:920`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L920) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2616-2626`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2616-L2626) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fdivx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_div(ctx, a, b); + fpscr::check_zero_divide(ctx, a, b); + let result = a / b; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && b != 0.0); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Double precision.** Operates on IEEE-754 binary64; [`fdivsx`](fdivsx.md) is the single-precision sibling. +- **Divide by zero.** `FRA / ±0` (with `FRA` finite, non-zero) sets `FPSCR[ZX, FX]` and produces a correctly-signed infinity. xenia relies on host `f64 /`, which produces the same ±∞ — but does not raise `ZX` in the interpreter snapshot. **xenia quirk:** title code that polls FPSCR for divide-by-zero will not observe it. +- **`0 / 0`** sets `FPSCR[VXZDZ, VX, FX]` and yields a quiet NaN. +- **`±∞ / ±∞`** sets `FPSCR[VXIDI, VX, FX]` and yields a quiet NaN. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` plus exception bits `OX`, `UX`, `XX`, `ZX`, `VXZDZ`, `VXIDI`, `VXSNAN`. xenia-rs does not maintain these. +- **`Rc=1` (`fdiv.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Performance.** Hardware divide is multi-cycle and not pipelined on Xenon. Many titles prefer `fres`/`frsqrte` followed by Newton-Raphson refinement (or by `fmadd` chains) to avoid the divider. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE. +- **Encoding.** A-form, primary 63, XO 18. `FRC` is don't-care. + +## Related Instructions + +- [`fdivsx`](fdivsx.md) — single-precision divide. +- [`fresx`](fresx.md) — reciprocal estimate `~1/FRB`; combined with `fmul`/`fmadd` to implement reciprocal divides. +- [`fmulx`](fmulx.md), [`faddx`](faddx.md), [`fsubx`](fsubx.md) — companion arithmetic. +- [`fmaddx`](fmaddx.md), [`fnmsubx`](fnmsubx.md) — used in Newton-Raphson refinement steps. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — FPSCR control (rounding mode, exception masks). + +## IBM Reference + +- [AIX 7.3 — `fdiv` (Floating Divide)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fd-fdiv-floating-divide-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (divide-by-zero and invalid-operation rules). diff --git a/migration/project-root/ppc-manual/fpu/fmaddsx.md b/migration/project-root/ppc-manual/fpu/fmaddsx.md new file mode 100644 index 0000000..ab4bd51 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmaddsx.md @@ -0,0 +1,147 @@ +# `fmaddsx` — Floating Multiply-Add Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec00003a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmadds` | `fmaddsx` | — | Floating Multiply-Add Single | +| `fmadds.` | `fmaddsx` | Rc=1 | Floating Multiply-Add Single | + +## Syntax + +```asm +fmadds[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fmaddsx` — form `A` + +- **Opcode word:** `0xec00003a` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `29` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmaddsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmaddsx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fmaddsx: read | Source B floating-point register. | +| `FD` | fmaddsx: write | Destination floating-point register. | +| `CR` | fmaddsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmaddsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmaddsx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmaddsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmaddsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmaddsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:190`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L190) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:393`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L393) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2653-2665`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2653-L2665) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmaddsx => { + // PPCBUG-181: missing VXISI on add step. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, false); + let result = to_single(ctx, a.mul_add(c, b)); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step then single-precision rounding.** PowerISA semantics: compute `(FRA × FRC) + FRB` to infinite precision, then round once to binary32. xenia-rs implements this as `to_single(a.mul_add(c, b))` — the `mul_add` is the single-step fused multiply-add at double precision, then `to_single` rounds the binary64 result to binary32. This matches PPC's "single rounding" requirement because the intermediate `mul_add` is already exact-rounded. +- **Operand order.** Assembler: `FD, FA, FC, FB` (multiplier `FRC` before addend `FRB`). +- **Invalid operations.** `0×∞ + finite` → `VXIMZ`; opposite-signed-∞ collision → `VXISI`. Quiet NaN result with `FPSCR[VX, FX]`. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fmadds.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** of the final rounded result returns ±∞ and sets `OX`/`XX`/`FX`. +- **Use case.** Dominates single-precision graphics math: matrix–vector multiplies, dot products, lighting equations, normal-map blending. Xbox 360 titles emit `fmadds` constantly. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fmaddx`](fmaddx.md) — double-precision sibling. +- [`fmsubsx`](fmsubsx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — single-precision fused-multiply siblings: + - `fmsubs` = `(A×C) − B` + - `fnmadds` = `−((A×C) + B)` + - `fnmsubs` = `−((A×C) − B)` +- [`fmulsx`](fmulsx.md), [`faddsx`](faddsx.md) — non-fused decomposition. +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal helpers; Newton-Raphson refinement uses `fmadds`/`fnmsubs`. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fmadds` (Floating Multiply-Add Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fmadds-floating-multiply-add-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fmaddx.md b/migration/project-root/ppc-manual/fpu/fmaddx.md new file mode 100644 index 0000000..a7e567f --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmaddx.md @@ -0,0 +1,136 @@ +# `fmaddx` — Floating Multiply-Add + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00003a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmadd` | `fmaddx` | — | Floating Multiply-Add | +| `fmadd.` | `fmaddx` | Rc=1 | Floating Multiply-Add | + +## Syntax + +```asm +fmadd[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fmaddx` — form `A` + +- **Opcode word:** `0xfc00003a` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `29` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmaddx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmaddx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fmaddx: read | Source B floating-point register. | +| `FD` | fmaddx: write | Destination floating-point register. | +| `CR` | fmaddx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmaddx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmaddx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmaddx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- (FRA × FRC) + FRB +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmaddx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmaddx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:186`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L186) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:928`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L928) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2640-2652`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2640-L2652) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmaddx => { + // PPCBUG-202: VXISI from input properties (not from `a*c` which has wrong sign on overflow). + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, false); + let result = a.mul_add(c, b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step.** `fmadd` computes `(FRA × FRC) + FRB` with one IEEE-754 rounding at the end — strictly more accurate than separate multiply + add. xenia-rs uses Rust's `f64::mul_add`, which guarantees a true FMA on hosts with hardware FMA (x86_64 with FMA3, ARM with NEON-FMA); on hosts without it, Rust's stdlib falls back to a software FMA so the semantic match is preserved. +- **Operand layout.** A-form: `FRT, FRA, FRC, FRB`. Note the assembler order — `FRC` (multiplier) comes before `FRB` (addend). Encoding bit fields are `FRA` (11–15), `FRB` (16–20), `FRC` (21–25). +- **Invalid operations.** `0×∞ + finite` → `VXIMZ`; `∞×x + ∓∞` (after multiplication produces ±∞ that opposes addend sign) → `VXISI`. Quiet NaN result with `FPSCR[VX, FX]` set. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not update FPSCR (xenia quirk). +- **`Rc=1` (`fmadd.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Use case.** Dot products, polynomial evaluation (Horner's method), matrix multiplies, Newton-Raphson divide/sqrt refinement. Hot-path PPC code is dense with `fmadd`. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fmaddsx`](fmaddsx.md) — single-precision sibling. +- [`fmsubx`](fmsubx.md), [`fnmaddx`](fnmaddx.md), [`fnmsubx`](fnmsubx.md) — the other three fused multiply-add variants: + - `fmsub` = `(A×C) − B` + - `fnmadd` = `−((A×C) + B)` + - `fnmsub` = `−((A×C) − B)` +- [`fmulx`](fmulx.md), [`faddx`](faddx.md) — non-fused decomposition (two rounding steps; less precise). +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal helpers refined by `fmadd`/`fnmsub`. + +## IBM Reference + +- [AIX 7.3 — `fmadd` (Floating Multiply-Add)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fma-fmadd-floating-multiply-add-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (single-rounding fused multiply-add definition). diff --git a/migration/project-root/ppc-manual/fpu/fmrx.md b/migration/project-root/ppc-manual/fpu/fmrx.md new file mode 100644 index 0000000..76e5b98 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmrx.md @@ -0,0 +1,120 @@ +# `fmrx` — Floating Move Register + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000090` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmr` | `fmrx` | — | Floating Move Register | +| `fmr.` | `fmrx` | Rc=1 | Floating Move Register | + +## Syntax + +```asm +fmr[Rc] [FD], [FB] +``` + +## Encoding + +### `fmrx` — form `X` + +- **Opcode word:** `0xfc000090` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `72` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fmrx: read | Source B floating-point register. | +| `FD` | fmrx: write | Destination floating-point register. | +| `CR` | fmrx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `fmrx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmrx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +FRT <- FRB +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:496`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L496) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:906`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L906) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2752-2756`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2752-L2756) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmrx => { + ctx.fpr[instr.rd()] = ctx.fpr[instr.rb()]; + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-pattern copy, no rounding.** `fmr` copies the 64-bit binary representation of `FRB` into `FRT` unchanged. No precision loss, no FPSCR exception bits, no NaN quietening. xenia-rs implements this as a plain `f64` copy. +- **NaN preserved verbatim.** Signalling/quiet bit, payload, and sign are all preserved exactly. Unlike arithmetic instructions, `fmr` does **not** quieten signalling NaNs. +- **Special values.** All bit patterns pass through untouched, including ±0, ±∞, and any NaN. The destination receives an exact copy. +- **FPSCR.** Hardware does **not** update `FPRF` or any exception bit. The "FPSCR write" implied in the header refers only to `Rc=1` updating CR1 from existing FPSCR contents. +- **`Rc=1` (`fmr.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **No `FRA`.** X-form, primary 63, XO 72. Reads `FRB` only. +- **Cheaper than load-store.** Compilers emit `fmr` for FPR-to-FPR moves; transferring a value via memory (`stfd`/`lfd`) would be far more expensive. + +## Related Instructions + +- [`fabsx`](fabsx.md), [`fnegx`](fnegx.md), [`fnabsx`](fnabsx.md) — sign-bit variants of the move (clear / toggle / set). +- [`fselx`](fselx.md) — branch-free select; like a conditional `fmr`. +- [`mffsx`](mffsx.md) — read FPSCR into an FPR; complementary "FPR move" for a control register. +- `stfd`/`lfd` — memory-mediated FPR transfer (much slower; used for register window spills). + +## IBM Reference + +- [AIX 7.3 — `fmr` (Floating Move Register)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fmr-floating-move-register-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (move-class instructions explicitly bypass quietening and FPSCR side effects). diff --git a/migration/project-root/ppc-manual/fpu/fmsubsx.md b/migration/project-root/ppc-manual/fpu/fmsubsx.md new file mode 100644 index 0000000..5a9c70c --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmsubsx.md @@ -0,0 +1,144 @@ +# `fmsubsx` — Floating Multiply-Subtract Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec000038` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmsubs` | `fmsubsx` | — | Floating Multiply-Subtract Single | +| `fmsubs.` | `fmsubsx` | Rc=1 | Floating Multiply-Subtract Single | + +## Syntax + +```asm +fmsubs[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fmsubsx` — form `A` + +- **Opcode word:** `0xec000038` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `28` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmsubsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmsubsx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fmsubsx: read | Source B floating-point register. | +| `FD` | fmsubsx: write | Destination floating-point register. | +| `CR` | fmsubsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmsubsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmsubsx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmsubsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmsubsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmsubsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:209`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L209) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:392`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L392) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2679-2691`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2679-L2691) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmsubsx => { + // PPCBUG-182: missing VXISI on sub step. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, true); + let result = to_single(ctx, a.mul_add(c, -b)); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step then round-to-single.** Computes `(FRA × FRC) − FRB` with one fused rounding at double precision, then rounds the binary64 result to binary32. xenia-rs implements this as `to_single(a.mul_add(c, -b))`. +- **Operand order.** Assembler: `FD, FA, FC, FB`. The multiplier `FRC` precedes the addend `FRB`. +- **Invalid operations.** `0×∞ − finite` → `VXIMZ`; `(±∞×x) − ±∞` (same sign) → `VXISI`. Quiet NaN result with `FPSCR[VX, FX]`. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fmsubs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** of the final rounded result returns ±∞ and sets `OX`/`XX`/`FX`. +- **Use case.** Newton-Raphson refinement of `fres`: `x_new = x*(2 - d*x)` decomposes to a `fmsubs`/`fnmsubs` pair. Also common in residual-correction loops. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fmsubx`](fmsubx.md) — double-precision sibling. +- [`fmaddsx`](fmaddsx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — other single-precision fused-multiply variants. +- [`fmulsx`](fmulsx.md), [`fsubsx`](fsubsx.md) — non-fused decomposition. +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal helpers refined by `fmsubs`/`fnmsubs`. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fmsubs` (Floating Multiply-Subtract Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fmsubs-floating-multiply-subtract-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fmsubx.md b/migration/project-root/ppc-manual/fpu/fmsubx.md new file mode 100644 index 0000000..5393ce2 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmsubx.md @@ -0,0 +1,135 @@ +# `fmsubx` — Floating Multiply-Subtract + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc000038` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmsub` | `fmsubx` | — | Floating Multiply-Subtract | +| `fmsub.` | `fmsubx` | Rc=1 | Floating Multiply-Subtract | + +## Syntax + +```asm +fmsub[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fmsubx` — form `A` + +- **Opcode word:** `0xfc000038` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `28` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmsubx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmsubx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fmsubx: read | Source B floating-point register. | +| `FD` | fmsubx: write | Destination floating-point register. | +| `CR` | fmsubx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmsubx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmsubx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmsubx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- (FRA × FRC) − FRB +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmsubx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmsubx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:205`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L205) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:927`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L927) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2666-2678`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2666-L2678) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmsubx => { + // PPCBUG-203: missing VXISI on sub step. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, true); + let result = a.mul_add(c, -b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step.** `fmsub` computes `(FRA × FRC) − FRB` with one rounding at the end. xenia-rs implements this as `a.mul_add(c, -b)`, which is a true FMA on hosts that have hardware support and a software FMA on those that don't. +- **Subtle: negate-then-FMA.** Negating `b` before passing to FMA matters for sign of zero and overflow. `(+0×+0) − (+0)` = `+0` in round-to-nearest, but `(+0×+0) − (−0)` = `+0` (the negation flips it before the FMA). Standard IEEE rules apply. +- **Operand order.** Assembler: `FD, FA, FC, FB`. +- **Invalid operations.** `0×∞ − finite` → `VXIMZ`; same-signed infinity collision (e.g. `(+∞×+1) − (+∞)`) → `VXISI`. Quiet NaN result with `FPSCR[VX, FX]`. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not update FPSCR (xenia quirk). +- **`Rc=1` (`fmsub.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Use case.** Newton-Raphson refinement of reciprocal estimates: `x_new = x*(2 - d*x) = -((d*x) - 2)` uses `fnmsub`, but `fmsub` shows up wherever `(a*c) - b` appears (residuals, error correction). +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fmsubsx`](fmsubsx.md) — single-precision sibling. +- [`fmaddx`](fmaddx.md), [`fnmaddx`](fnmaddx.md), [`fnmsubx`](fnmsubx.md) — other fused multiply-add variants. +- [`fmulx`](fmulx.md), [`fsubx`](fsubx.md) — non-fused decomposition (two rounding steps). +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal helpers refined by fused multiply-subtracts. +- [`fnegx`](fnegx.md) — sign flip (the bit-pattern op behind `-FRB` in xenia's implementation). + +## IBM Reference + +- [AIX 7.3 — `fmsub` (Floating Multiply-Subtract)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fms-fmsub-floating-multiply-subtract-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fmulsx.md b/migration/project-root/ppc-manual/fpu/fmulsx.md new file mode 100644 index 0000000..b931f6a --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmulsx.md @@ -0,0 +1,140 @@ +# `fmulsx` — Floating Multiply Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec000032` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmuls` | `fmulsx` | — | Floating Multiply Single | +| `fmuls.` | `fmulsx` | Rc=1 | Floating Multiply Single | + +## Syntax + +```asm +fmuls[Rc] [FD], [FA], [FC] +``` + +## Encoding + +### `fmulsx` — form `A` + +- **Opcode word:** `0xec000032` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `25` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmulsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmulsx: read | Source C floating-point register (for madd-style ops). | +| `FD` | fmulsx: write | Destination floating-point register. | +| `CR` | fmulsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmulsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmulsx` + +- **Reads (always):** `FA`, `FC` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmulsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmulsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmulsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:97`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L97) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:391`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L391) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2606-2615`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2606-L2615) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmulsx => { + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + fpscr::check_invalid_mul(ctx, a, c); + let result = to_single(ctx, a * c); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **A-form quirk: multiplier is `FRC`.** Operands come from `FRA` (bits 11–15) and `FRC` (bits 21–25). xenia decodes via `instr.rc()` (don't confuse with `rc_bit()` for the record bit). +- **Single precision.** Result is rounded to IEEE-754 binary32 then re-encoded into the 64-bit FPR. xenia uses `to_single(a * c)`. +- **`0 × ±∞`** sets `FPSCR[VXIMZ, VX, FX]` and yields a quiet NaN. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` and exception bits `OX`, `UX`, `XX`, `VXIMZ`, `VXSNAN`. xenia-rs does **not** maintain FPSCR (xenia quirk). +- **`Rc=1` (`fmuls.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** returns ±∞ and sets `OX`/`XX`/`FX`. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia inherits host IEEE behavior, so subnormal results may differ subtly from hardware. +- **Encoding.** A-form, primary 59, XO 25. + +## Related Instructions + +- [`fmulx`](fmulx.md) — double-precision multiply. +- [`fmaddsx`](fmaddsx.md), [`fmsubsx`](fmsubsx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — single-precision fused multiply-add family (one rounding step; preferred for dot products). +- [`faddsx`](faddsx.md), [`fsubsx`](fsubsx.md), [`fdivsx`](fdivsx.md) — companion single-precision arithmetic. +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal estimates often paired with `fmuls` to compute `a * (1/b)`. +- [`frspx`](frspx.md) — explicit double→single rounding helper. + +## IBM Reference + +- [AIX 7.3 — `fmuls` (Floating Multiply Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fmuls-floating-multiply-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fmulx.md b/migration/project-root/ppc-manual/fpu/fmulx.md new file mode 100644 index 0000000..ae418f6 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fmulx.md @@ -0,0 +1,132 @@ +# `fmulx` — Floating Multiply + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc000032` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fmul` | `fmulx` | — | Floating Multiply | +| `fmul.` | `fmulx` | Rc=1 | Floating Multiply | + +## Syntax + +```asm +fmul[Rc] [FD], [FA], [FC] +``` + +## Encoding + +### `fmulx` — form `A` + +- **Opcode word:** `0xfc000032` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `25` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fmulx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fmulx: read | Source C floating-point register (for madd-style ops). | +| `FD` | fmulx: write | Destination floating-point register. | +| `CR` | fmulx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fmulx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fmulx` + +- **Reads (always):** `FA`, `FC` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fmulx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- FRA × FRC +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fmulx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fmulx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:89`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L89) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:28`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L28) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:925`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L925) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2595-2605`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2595-L2605) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fmulx => { + // A-form: frD = frA * frC (frC is at rc() field, bits 21-25) + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + fpscr::check_invalid_mul(ctx, a, c); + let result = a * c; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **A-form quirk: multiplier is `FRC`, not `FRB`.** `fmul` reads operands from the `FRA` (bits 11–15) and `FRC` (bits 21–25) fields, bridging the multiply and fused-multiply-add families. xenia decodes this as `instr.rc()` (the FRC field, distinct from `rc_bit()` for the record bit). +- **Double precision.** Operates on IEEE-754 binary64; [`fmulsx`](fmulsx.md) rounds to binary32. +- **`0 × ±∞` is invalid.** Sets `FPSCR[VXIMZ, VX, FX]` and yields a quiet NaN. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` plus exception bits `OX` (overflow), `UX` (underflow), `XX` (inexact), `VXIMZ` (0×∞), `VXSNAN` (signalling NaN). xenia-rs does **not** update FPSCR in the interpreter snapshot — xenia quirk. +- **`Rc=1` (`fmul.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Any NaN operand yields a quiet NaN; signalling NaNs are quietened. +- **Sign of result.** Standard IEEE: `sign(a) XOR sign(c)`. `+0 × −0 = −0` and `−x × +∞ = −∞`. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1` (flush-to-zero); xenia inherits host IEEE behavior, so multiplications that produce subnormal results may differ subtly from hardware. +- **Rounding mode** uses `FPSCR[RN]` (default nearest-even). + +## Related Instructions + +- [`fmulsx`](fmulsx.md) — single-precision multiply. +- [`fmaddx`](fmaddx.md), [`fmsubx`](fmsubx.md), [`fnmaddx`](fnmaddx.md), [`fnmsubx`](fnmsubx.md) — fused multiply-add family; share the same `FRA × FRC` core but add/subtract `FRB` with a single rounding step. Prefer fused forms for dot products and polynomial evaluation. +- [`faddx`](faddx.md), [`fsubx`](fsubx.md), [`fdivx`](fdivx.md) — sibling double-precision arithmetic. +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal helpers commonly paired with `fmul` for reciprocal divides. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — FPSCR control. + +## IBM Reference + +- [AIX 7.3 — `fmul` (Floating Multiply)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fm-fmul-floating-multiply-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fnabsx.md b/migration/project-root/ppc-manual/fpu/fnabsx.md new file mode 100644 index 0000000..2314a13 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnabsx.md @@ -0,0 +1,120 @@ +# `fnabsx` — Floating Negative Absolute Value + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000110` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fnabs` | `fnabsx` | — | Floating Negative Absolute Value | +| `fnabs.` | `fnabsx` | Rc=1 | Floating Negative Absolute Value | + +## Syntax + +```asm +fnabs[Rc] [FD], [FB] +``` + +## Encoding + +### `fnabsx` — form `X` + +- **Opcode word:** `0xfc000110` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `136` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fnabsx: read | Source B floating-point register. | +| `FD` | fnabsx: write | Destination floating-point register. | +| `CR` | fnabsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `fnabsx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnabsx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +FRT <- set_sign(FRB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnabsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnabsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:504`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L504) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:908`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L908) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2767-2771`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2767-L2771) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnabsx => { + ctx.fpr[instr.rd()] = -(ctx.fpr[instr.rb()].abs()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-pattern operation, no rounding.** `fnabs` **sets** the sign bit (bit 0) of the source binary64 value to 1, producing `-|FRB|`. No precision change, no exception bits. xenia-rs implements this as `-(b.abs())` — the abs clears the sign bit, then negation sets it. +- **NaN handling.** Returns the source NaN with the sign bit set to 1; payload preserved; signalling/quiet bit unchanged. `FPSCR[VXSNAN]` is **not** raised. +- **Special values.** `fnabs(±0) = -0`; `fnabs(±∞) = -∞`; `fnabs(±NaN) = -NaN` (sign set, payload preserved). +- **FPSCR.** Hardware does not update `FPRF` and does not raise any exception bit. Sign-bit ops are not arithmetic. +- **`Rc=1` (`fnabs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **No `FRA`.** X-form, primary 63, XO 136. Reads `FRB` only. +- **Use case.** Less common than `fabs`/`fneg`. Useful for unconditional negative-magnitude values, e.g. forcing a value to be on the negative side of zero before a subsequent compare or for bit-pattern setup. + +## Related Instructions + +- [`fabsx`](fabsx.md) — clear sign bit (positive). +- [`fnegx`](fnegx.md) — toggle sign bit. +- [`fmrx`](fmrx.md) — plain register copy. +- [`fselx`](fselx.md) — branch-free select; with `fabs`/`fnabs` synthesises `copysign`-like helpers. + +## IBM Reference + +- [AIX 7.3 — `fnabs` (Floating Negative Absolute Value)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fnabs-floating-negative-absolute-value-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fnegx.md b/migration/project-root/ppc-manual/fpu/fnegx.md new file mode 100644 index 0000000..4f77253 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnegx.md @@ -0,0 +1,121 @@ +# `fnegx` — Floating Negate + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000050` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fneg` | `fnegx` | — | Floating Negate | +| `fneg.` | `fnegx` | Rc=1 | Floating Negate | + +## Syntax + +```asm +fneg[Rc] [FD], [FB] +``` + +## Encoding + +### `fnegx` — form `X` + +- **Opcode word:** `0xfc000050` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `40` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fnegx: read | Source B floating-point register. | +| `FD` | fnegx: write | Destination floating-point register. | +| `CR` | fnegx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `fnegx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnegx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`. + +## Operation (pseudocode) + +``` +FRT <- flip_sign(FRB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnegx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnegx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:515`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L515) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:903`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L903) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2762-2766`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2762-L2766) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnegx => { + ctx.fpr[instr.rd()] = -ctx.fpr[instr.rb()]; + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-pattern operation, no rounding.** `fneg` toggles the sign bit (bit 0) of the source binary64 value and writes the 64-bit pattern to the destination. No precision change, no exception bits. +- **NaN handling.** PowerISA specifies that `fneg` toggles the NaN sign bit (unlike `fnmadd` which does **not**). xenia-rs uses Rust's unary `-`, which toggles the sign bit on NaN values for binary64 — semantic match. +- **Special values.** `fneg(+0) = -0`; `fneg(-0) = +0`; `fneg(±∞) = ∓∞`. No `FPSCR[VXSNAN]` raised even on signalling NaN inputs (sign-bit ops are not arithmetic). +- **FPSCR.** Hardware does **not** update `FPRF` and does **not** raise any exception bit. The "FPSCR write" in the header refers only to `Rc=1` updating CR1 from existing FPSCR contents. +- **`Rc=1` (`fneg.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **No `FRA`.** X-form, primary 63, XO 40. Reads `FRB` only. +- **Use as a free negate.** Common in compiled PPC code for `-x` or as part of negate-and-fma sequences when no fused negative variant exists. + +## Related Instructions + +- [`fabsx`](fabsx.md) — clear sign bit (positive). +- [`fnabsx`](fnabsx.md) — set sign bit (negative). +- [`fmrx`](fmrx.md) — plain register copy. +- [`fnmaddx`](fnmaddx.md), [`fnmsubx`](fnmsubx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — fused negate-multiply-add forms; eliminate the need for an explicit `fneg` after an FMA. +- [`fselx`](fselx.md) — combined with `fneg` for branch-free `copysign`/clamp patterns. + +## IBM Reference + +- [AIX 7.3 — `fneg` (Floating Negate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fneg-floating-negate-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (PPC's `fneg` toggles NaN sign — distinct from the `fnmadd` family). diff --git a/migration/project-root/ppc-manual/fpu/fnmaddsx.md b/migration/project-root/ppc-manual/fpu/fnmaddsx.md new file mode 100644 index 0000000..0de9ac6 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnmaddsx.md @@ -0,0 +1,147 @@ +# `fnmaddsx` — Floating Negative Multiply-Add Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec00003e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fnmadds` | `fnmaddsx` | — | Floating Negative Multiply-Add Single | +| `fnmadds.` | `fnmaddsx` | Rc=1 | Floating Negative Multiply-Add Single | + +## Syntax + +```asm +fnmadds[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fnmaddsx` — form `A` + +- **Opcode word:** `0xec00003e` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `31` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fnmaddsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fnmaddsx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fnmaddsx: read | Source B floating-point register. | +| `FD` | fnmaddsx: write | Destination floating-point register. | +| `CR` | fnmaddsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fnmaddsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fnmaddsx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnmaddsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnmaddsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnmaddsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:222`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L222) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:395`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L395) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2706-2720`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2706-L2720) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnmaddsx => { + // PPCBUG-181 + PPCBUG-183: VXISI + NaN sign preservation. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, false); + let fma = a.mul_add(c, b); + let neg = if fma.is_nan() { fma } else { -fma }; + let result = to_single(ctx, neg); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding then negate then round-to-single.** Computes `−((FRA × FRC) + FRB)` and rounds to binary32. xenia-rs uses `to_single(-(a.mul_add(c, b)))` — the negation is a sign-flip on the binary64 intermediate, then `to_single` rounds to binary32. +- **NaN sign behaviour.** PowerISA specifies the negation does **not** flip the sign bit of a NaN result. xenia uses Rust's `Neg`, which does flip the NaN sign bit. Observable only via bit-level inspection. **xenia quirk.** +- **Operand order.** Assembler: `FD, FA, FC, FB`. +- **Invalid operations.** `0×∞` → `VXIMZ`; opposing-infinity collision → `VXISI`. Quiet NaN result with `FPSCR[VX, FX]`. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fnmadds.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** returns ±∞ and sets `OX`/`XX`/`FX`. +- **Use case.** Single-precision Newton-Raphson refinement and graphics-pipeline math where the negated product-sum form is convenient. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fnmaddx`](fnmaddx.md) — double-precision sibling. +- [`fmaddsx`](fmaddsx.md), [`fmsubsx`](fmsubsx.md), [`fnmsubsx`](fnmsubsx.md) — other single-precision fused-multiply variants. +- [`fmulsx`](fmulsx.md), [`faddsx`](faddsx.md) — non-fused decomposition. +- [`fnegx`](fnegx.md), [`fnabsx`](fnabsx.md) — sign-bit ops. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fnmadds` (Floating Negative Multiply-Add Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fnmadds-floating-negative-multiply-add-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fnmaddx.md b/migration/project-root/ppc-manual/fpu/fnmaddx.md new file mode 100644 index 0000000..9fc37a1 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnmaddx.md @@ -0,0 +1,138 @@ +# `fnmaddx` — Floating Negative Multiply-Add + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00003e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fnmadd` | `fnmaddx` | — | Floating Negative Multiply-Add | +| `fnmadd.` | `fnmaddx` | Rc=1 | Floating Negative Multiply-Add | + +## Syntax + +```asm +fnmadd[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fnmaddx` — form `A` + +- **Opcode word:** `0xfc00003e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `31` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fnmaddx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fnmaddx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fnmaddx: read | Source B floating-point register. | +| `FD` | fnmaddx: write | Destination floating-point register. | +| `CR` | fnmaddx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fnmaddx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fnmaddx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnmaddx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- −((FRA × FRC) + FRB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnmaddx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnmaddx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:213`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L213) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:930`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L930) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2692-2705`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2692-L2705) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnmaddx => { + // PPCBUG-203: missing VXISI. PPCBUG-205: NaN sign preserved (no negation on NaN). + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, false); + let fma = a.mul_add(c, b); + let result = if fma.is_nan() { fma } else { -fma }; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step, then sign flip.** Computes `−((FRA × FRC) + FRB)` with one fused rounding for the FMA; the final negation is a bit-pattern sign-flip and does not introduce additional rounding error. xenia-rs implements this as `-(a.mul_add(c, b))`. +- **Sign of NaN.** Per PowerISA, `fnmadd` does **not** flip the sign of a NaN result. xenia uses Rust's `Neg` which does flip the NaN sign bit (`f64::neg`); for IEEE-754 binary64 this is observable through bit-level inspection but not through arithmetic comparisons. **xenia quirk** — title code that inspects NaN sign bits will diverge. +- **Operand order.** Assembler: `FD, FA, FC, FB`. +- **Invalid operations.** Same as `fmadd`: `VXIMZ` for `0×∞`, `VXISI` for opposing-infinity collision. Quiet NaN result. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fnmadd.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Use case.** Computing `-a*c - b` directly without an intermediate negate. Useful in iterative solvers and in transforming polynomial coefficients. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fnmaddsx`](fnmaddsx.md) — single-precision sibling. +- [`fmaddx`](fmaddx.md), [`fmsubx`](fmsubx.md), [`fnmsubx`](fnmsubx.md) — other fused multiply-add variants: + - `fmadd` = `(A×C) + B` + - `fmsub` = `(A×C) − B` + - `fnmsub` = `−((A×C) − B)` +- [`fnegx`](fnegx.md), [`fnabsx`](fnabsx.md) — sign-bit operations on FPRs. +- [`fmulx`](fmulx.md), [`faddx`](faddx.md) — non-fused decomposition. + +## IBM Reference + +- [AIX 7.3 — `fnmadd` (Floating Negative Multiply-Add)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fnma-fnmadd-floating-negative-multiply-add-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (note: PowerISA specifies the negation does not flip NaN sign bits). diff --git a/migration/project-root/ppc-manual/fpu/fnmsubsx.md b/migration/project-root/ppc-manual/fpu/fnmsubsx.md new file mode 100644 index 0000000..6556f77 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnmsubsx.md @@ -0,0 +1,147 @@ +# `fnmsubsx` — Floating Negative Multiply-Subtract Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec00003c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fnmsubs` | `fnmsubsx` | — | Floating Negative Multiply-Subtract Single | +| `fnmsubs.` | `fnmsubsx` | Rc=1 | Floating Negative Multiply-Subtract Single | + +## Syntax + +```asm +fnmsubs[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fnmsubsx` — form `A` + +- **Opcode word:** `0xec00003c` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `30` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fnmsubsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fnmsubsx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fnmsubsx: read | Source B floating-point register. | +| `FD` | fnmsubsx: write | Destination floating-point register. | +| `CR` | fnmsubsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fnmsubsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fnmsubsx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnmsubsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnmsubsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnmsubsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:241`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L241) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:394`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L394) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2735-2749`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2735-L2749) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnmsubsx => { + // PPCBUG-182 + PPCBUG-183: VXISI + NaN sign preservation. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, true); + let fma = a.mul_add(c, -b); + let neg = if fma.is_nan() { fma } else { -fma }; + let result = to_single(ctx, neg); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding then negate then round-to-single.** Computes `−((FRA × FRC) − FRB)` = `FRB − (FRA × FRC)` with one fused rounding at double precision, then rounds to binary32. xenia-rs uses `to_single(-(a.mul_add(c, -b)))`. +- **NaN sign behaviour.** PowerISA: the negation does **not** flip a NaN's sign bit. xenia uses Rust's `Neg` which does. Observable only by bit-level inspection. **xenia quirk.** +- **Operand order.** Assembler: `FD, FA, FC, FB`. +- **Invalid operations.** `0×∞` → `VXIMZ`; same-signed-infinity collision → `VXISI`. Quiet NaN result. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fnmsubs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow** returns ±∞ and sets `OX`/`XX`/`FX`. +- **Use case.** Single-precision Newton-Raphson divide refinement: `x_new = x*(2 - d*x)` is implemented as a `fnmsubs`/`fmuls` pair throughout Xbox 360 graphics code. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fnmsubx`](fnmsubx.md) — double-precision sibling. +- [`fmaddsx`](fmaddsx.md), [`fmsubsx`](fmsubsx.md), [`fnmaddsx`](fnmaddsx.md) — other single-precision fused-multiply variants. +- [`fresx`](fresx.md), [`frsqrtex`](frsqrtex.md) — reciprocal estimates whose Newton-Raphson refinement leans on `fnmsubs`. +- [`fmulsx`](fmulsx.md), [`fsubsx`](fsubsx.md) — non-fused decomposition. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fnmsubs` (Floating Negative Multiply-Subtract Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fnmsubs-floating-negative-multiply-subtract-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fnmsubx.md b/migration/project-root/ppc-manual/fpu/fnmsubx.md new file mode 100644 index 0000000..3f06634 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fnmsubx.md @@ -0,0 +1,136 @@ +# `fnmsubx` — Floating Negative Multiply-Subtract + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00003c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fnmsub` | `fnmsubx` | — | Floating Negative Multiply-Subtract | +| `fnmsub.` | `fnmsubx` | Rc=1 | Floating Negative Multiply-Subtract | + +## Syntax + +```asm +fnmsub[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fnmsubx` — form `A` + +- **Opcode word:** `0xfc00003c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `30` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fnmsubx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fnmsubx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fnmsubx: read | Source B floating-point register. | +| `FD` | fnmsubx: write | Destination floating-point register. | +| `CR` | fnmsubx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fnmsubx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fnmsubx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fnmsubx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- −((FRA × FRC) − FRB) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fnmsubx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fnmsubx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:232`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L232) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:929`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L929) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2721-2734`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2721-L2734) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fnmsubx => { + // PPCBUG-203: VXISI. PPCBUG-205: NaN sign preservation. + let a = ctx.fpr[instr.ra()]; + let c = ctx.fpr[instr.rc()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_mul(ctx, a, c); + fpscr::check_invalid_fma_add(ctx, a, c, b, true); + let fma = a.mul_add(c, -b); + let result = if fma.is_nan() { fma } else { -fma }; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite() && c.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single rounding step, then sign flip.** Computes `−((FRA × FRC) − FRB)` = `FRB − (FRA × FRC)`, with one fused rounding. xenia-rs implements this as `-(a.mul_add(c, -b))`, which is mathematically equivalent. +- **NaN sign behaviour.** PowerISA: the negation does **not** flip the sign of a NaN result. xenia uses Rust's `Neg` which does flip the sign bit on NaNs. Observable only via bit-level inspection. **xenia quirk.** +- **Operand order.** Assembler: `FD, FA, FC, FB`. +- **Invalid operations.** `0×∞` → `VXIMZ`; same-signed-infinity collision (e.g. `(+∞) − (+∞)`) → `VXISI`. Quiet NaN result. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX`, `OX`, `UX`, `XX`, `VXIMZ`, `VXISI`, `VXSNAN`. xenia-rs does not (xenia quirk). +- **`Rc=1` (`fnmsub.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Use case.** The canonical Newton-Raphson divide refinement step: `x_new = x*(2 - d*x)`. This is the most common operand pattern in compiled PPC graphics code that does software reciprocals. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; xenia uses host IEEE behavior. + +## Related Instructions + +- [`fnmsubsx`](fnmsubsx.md) — single-precision sibling. +- [`fmaddx`](fmaddx.md), [`fmsubx`](fmsubx.md), [`fnmaddx`](fnmaddx.md) — other fused multiply-add variants. +- [`fresx`](fresx.md) — reciprocal estimate; `fnmsub` is the workhorse of NR refinement of `fres` outputs. +- [`frsqrtex`](frsqrtex.md) — reciprocal-sqrt estimate; also refined with `fnmsub`-style chains. +- [`fmulx`](fmulx.md), [`fsubx`](fsubx.md) — non-fused decomposition. + +## IBM Reference + +- [AIX 7.3 — `fnmsub` (Floating Negative Multiply-Subtract)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fnms-fnmsub-floating-negative-multiply-subtract-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fresx.md b/migration/project-root/ppc-manual/fpu/fresx.md new file mode 100644 index 0000000..f6f03e8 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fresx.md @@ -0,0 +1,151 @@ +# `fresx` — Floating Reciprocal Estimate Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec000030` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fres` | `fresx` | — | Floating Reciprocal Estimate Single | +| `fres.` | `fresx` | Rc=1 | Floating Reciprocal Estimate Single | + +## Syntax + +```asm +fres[Rc] [FD], [FB] +``` + +## Encoding + +### `fresx` — form `A` + +- **Opcode word:** `0xec000030` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `24` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fresx: read | Source B floating-point register. | +| `FD` | fresx: write | Destination floating-point register. | +| `CR` | fresx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fresx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fresx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fresx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fresx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fresx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:106`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L106) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:390`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L390) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2815-2835`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2815-L2835) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fresx => { + // Single-precision reciprocal estimate: frD = 1.0 / frB. + // PPCBUG-184: pre-quantize input to f32 to match canary's + // `f.Recip(f.Convert(frB, FLOAT32_TYPE))` behavior. Hardware + // produces a ~12-bit LUT estimate; both emulators produce a + // fully-IEEE single reciprocal, but the f32 quantization at + // least makes the input precision match. + let b_full = ctx.fpr[instr.rb()]; + let b = b_full as f32 as f64; + if b == 0.0 { + fpscr::set_exception(ctx, fpscr::ZX); + } + if fpscr::is_snan(b_full) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + let result = to_single(ctx, 1.0 / b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, b.is_finite() && b != 0.0); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single-precision reciprocal estimate.** PowerISA specifies a *low-precision* approximation of `1/FRB` accurate to roughly 12–14 bits of significand, intended as the seed for a Newton-Raphson refinement step. **xenia quirk:** xenia-rs computes the *full-precision* `1.0 / b` then rounds to single, so it produces a far more accurate result than hardware. Title code that depends on the limited precision of `fres` to trigger refinement loops will still work (the loops just refine an already-correct value), but bit-exact correlation with hardware is impossible. +- **Single precision result.** Final value is rounded to binary32 then re-encoded into the FPR. +- **Divide by zero.** `1/±0` → ±∞ and sets `FPSCR[ZX, FX]`. xenia returns the host ±∞ but does not update FPSCR. +- **`fres(±∞) = ±0`** (correctly signed). +- **`fres(NaN) = NaN`**; signalling NaNs are quietened. +- **Overflow / underflow.** May set `OX`/`UX`/`XX`/`FX`. xenia does not update FPSCR. +- **`Rc=1` (`fres.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** A-form, primary 59, XO 24. Reads `FRB` only; `FRA`/`FRC` are don't-care. +- **Use case.** Software reciprocal: `1/d ≈ x = fres(d); x = x*(2 - d*x);` (one Newton-Raphson step recovers full single precision). Two iterations recover full double precision. The `(2 - d*x)` step compiles to `fnmsub`. +- **Performance.** Cheap on Xenon (single-cycle issue) — divides by `fres` + 1–2 NR steps + `fmul` are far faster than `fdiv`/`fdivs`. + +## Related Instructions + +- [`frsqrtex`](frsqrtex.md) — reciprocal-square-root estimate; same NR refinement approach. +- [`fdivx`](fdivx.md), [`fdivsx`](fdivsx.md) — true divide; alternative when refinement isn't needed. +- [`fnmsubx`](fnmsubx.md), [`fnmsubsx`](fnmsubsx.md) — the workhorse for the `(2 - d*x)` step. +- [`fmulx`](fmulx.md), [`fmulsx`](fmulsx.md) — final multiply to apply the reciprocal. +- [`fmaddsx`](fmaddsx.md) — alternate refinement formulation. + +## IBM Reference + +- [AIX 7.3 — `fres` (Floating Reciprocal Estimate Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fres-floating-reciprocal-estimate-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (relative-error bound for `fres`; intended Newton-Raphson refinement pattern). diff --git a/migration/project-root/ppc-manual/fpu/frspx.md b/migration/project-root/ppc-manual/fpu/frspx.md new file mode 100644 index 0000000..83e126b --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/frspx.md @@ -0,0 +1,144 @@ +# `frspx` — Floating Round to Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0xfc000018` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `frsp` | `frspx` | — | Floating Round to Single | +| `frsp.` | `frspx` | Rc=1 | Floating Round to Single | + +## Syntax + +```asm +frsp[Rc] [FD], [FB] +``` + +## Encoding + +### `frspx` — form `X` + +- **Opcode word:** `0xfc000018` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `12` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | frspx: read | Source B floating-point register. | +| `FD` | frspx: write | Destination floating-point register. | +| `CR` | frspx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | frspx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `frspx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `frspx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`frspx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="frspx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:318`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L318) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:898`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L898) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2856-2871`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2856-L2871) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::frspx => { + // Round to single precision honouring FPSCR[RN]. + // PPCBUG-225: set XX on inexact rounding (almost every frsp call). + let b = ctx.fpr[instr.rb()]; + if fpscr::is_snan(b) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + let result = to_single(ctx, b); + if b.is_finite() && result.is_finite() && result != b { + fpscr::set_exception(ctx, fpscr::XX); + } + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Round to single-precision.** Rounds the binary64 value in `FRB` to binary32 using `FPSCR[RN]`, then re-encodes the result back into the destination as a binary64 representation of that single value. xenia-rs uses `to_single(b)`, which performs `f64 → f32 → f64` round-trip (Rust's `as f32` uses round-to-nearest-even, matching the PPC default). +- **`FPSCR[RN]` not honored in xenia.** Like other conversion ops, xenia's `to_single` is hard-coded to round-to-nearest-even regardless of `FPSCR[RN]`. **xenia quirk** for non-default rounding modes. +- **Overflow.** Values whose magnitude exceeds binary32's max (~3.4e38) round to ±∞ and set `FPSCR[OX, XX, FX]`. +- **Underflow.** Values whose magnitude is below binary32's smallest normal (~1.2e-38) flush to zero or denormal per `FPSCR[NI]`; `UX`/`XX`/`FX` set on hardware. xenia uses host IEEE. +- **NaN propagation.** Quiet NaNs pass through; signalling NaNs are quietened (sign-bit cleared on the SNaN-quietening payload bit). Host `as f32` does not perform PPC-style quietening; **xenia quirk** for SNaN bit-level inspection. +- **Inexact.** Most rounding produces inexact; sets `FPSCR[XX, FX]`. xenia does not update FPSCR (xenia quirk). +- **`Rc=1` (`frsp.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** X-form, primary 63, XO 12. Reads `FRB` only. +- **Use case.** Compilers emit `frsp` after a chain of `fadd`/`fmul`/etc. when storing the value with `stfs` (store single). Without an explicit `frsp`, the in-FPR double would not match the `stfs`-rounded single. + +## Related Instructions + +- [`faddsx`](faddsx.md), [`fsubsx`](fsubsx.md), [`fmulsx`](fmulsx.md), [`fdivsx`](fdivsx.md) — single-precision arithmetic; equivalent to `frsp(double_op(...))`. +- [`fmaddsx`](fmaddsx.md), [`fmsubsx`](fmsubsx.md), [`fnmaddsx`](fnmaddsx.md), [`fnmsubsx`](fnmsubsx.md) — single-precision fused FMA family. +- `stfs` — store single; expects an FPR already rounded to single via `frsp` or via single-precision arithmetic. +- [`fcfidx`](fcfidx.md) — `fcfid` + `frsp` is the standard `i64 → float` conversion. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — FPSCR rounding-mode control. + +## IBM Reference + +- [AIX 7.3 — `frsp` (Floating Round to Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-frsp-floating-round-single-precision-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (single-precision rounding rules; SNaN quietening). diff --git a/migration/project-root/ppc-manual/fpu/frsqrtex.md b/migration/project-root/ppc-manual/fpu/frsqrtex.md new file mode 100644 index 0000000..cdf9925 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/frsqrtex.md @@ -0,0 +1,147 @@ +# `frsqrtex` — Floating Reciprocal Square Root Estimate + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc000034` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `frsqrte` | `frsqrtex` | — | Floating Reciprocal Square Root Estimate | +| `frsqrte.` | `frsqrtex` | Rc=1 | Floating Reciprocal Square Root Estimate | + +## Syntax + +```asm +frsqrte[Rc] [FD], [FB] +``` + +## Encoding + +### `frsqrtex` — form `A` + +- **Opcode word:** `0xfc000034` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `26` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | frsqrtex: read | Source B floating-point register. | +| `FD` | frsqrtex: write | Destination floating-point register. | +| `CR` | frsqrtex: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | frsqrtex: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `frsqrtex` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `frsqrtex`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`frsqrtex`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="frsqrtex"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:118`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L118) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:29`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L29) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:926`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L926) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2836-2853`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2836-L2853) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::frsqrtex => { + // Reciprocal square root estimate: frD = 1.0 / sqrt(frB) + let b = ctx.fpr[instr.rb()]; + if b == 0.0 { + fpscr::set_exception(ctx, fpscr::ZX); + } + if b.is_sign_negative() && b != 0.0 && !b.is_nan() { + fpscr::set_exception(ctx, fpscr::VXSQRT); + } + if fpscr::is_snan(b) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + let result = 1.0 / b.sqrt(); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, b.is_finite() && b > 0.0); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reciprocal-square-root estimate.** PowerISA: low-precision approximation of `1/sqrt(FRB)` accurate to roughly 12–14 bits, designed as the seed for Newton-Raphson refinement. **xenia quirk:** xenia-rs computes the *full-precision* `1.0 / b.sqrt()` (no rounding to single — `frsqrte` is double-precision per the spec). The result is far more accurate than hardware. Title code that depends on the limited precision still functions; the NR refinement converges in one iteration on either platform. +- **Double precision result.** Per PowerISA, `frsqrte` returns a binary64 estimate (not a single-rounded value, unlike `fres`). +- **Negative input is invalid.** `frsqrte(x < 0)` (other than `-0`) sets `FPSCR[VXSQRT, VX, FX]` and yields a quiet NaN. xenia returns host NaN (Rust's `f64::sqrt` of a negative is NaN, then `1/NaN` is NaN) but does not raise the FPSCR bit. +- **`frsqrte(+0) = +∞`** and sets `FPSCR[ZX]` per spec. **`frsqrte(-0) = -∞`**. +- **`frsqrte(+∞) = +0`**. +- **NaN propagation.** Quiet NaN; signalling NaNs are quietened. +- **`Rc=1` (`frsqrte.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **Encoding.** A-form, primary 63, XO 26. Reads `FRB` only; `FRA`/`FRC` are don't-care. +- **Use case.** The canonical `length`/`normalize` recipe: `inv_len = frsqrte(dot); inv_len = 0.5 * inv_len * (3 - dot * inv_len * inv_len);` — one NR step gets to full double precision. For single precision use `frsp` after. +- **Performance.** Cheap on Xenon. The `length`/`normalize` macro built on `frsqrte` is the hot inner loop in any 3D Xbox 360 game. + +## Related Instructions + +- [`fresx`](fresx.md) — reciprocal estimate; same NR-refinement design pattern. +- [`fsqrtx`](fsqrtx.md), [`fsqrtsx`](fsqrtsx.md) — full-precision square root (multi-cycle, non-pipelined). +- [`fmulx`](fmulx.md), [`fmaddx`](fmaddx.md), [`fnmsubx`](fnmsubx.md) — the multiply/FMA ops that drive NR refinement. +- [`frspx`](frspx.md) — round to single after `frsqrte` for graphics-pipeline producers expecting `float`. + +## IBM Reference + +- [AIX 7.3 — `frsqrte` (Floating Reciprocal Square Root Estimate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-frsqrte-floating-reciprocal-square-root-estimate-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (relative-error bound for `frsqrte`; canonical NR refinement step). diff --git a/migration/project-root/ppc-manual/fpu/fselx.md b/migration/project-root/ppc-manual/fpu/fselx.md new file mode 100644 index 0000000..5c03175 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fselx.md @@ -0,0 +1,143 @@ +# `fselx` — Floating Select + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00002e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fsel` | `fselx` | — | Floating Select | +| `fsel.` | `fselx` | Rc=1 | Floating Select | + +## Syntax + +```asm +fsel[Rc] [FD], [FA], [FC], [FB] +``` + +## Encoding + +### `fselx` — form `A` + +- **Opcode word:** `0xfc00002e` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `23` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fselx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FC` | fselx: read | Source C floating-point register (for madd-style ops). | +| `FB` | fselx: read | Source B floating-point register. | +| `FD` | fselx: write | Destination floating-point register. | +| `CR` | fselx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `fselx` + +- **Reads (always):** `FA`, `FC`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fselx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fselx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fselx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:144`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L144) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:30`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L30) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:924`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L924) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2774-2783`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2774-L2783) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fselx => { + // frD = if frA >= 0.0 then frC else frB + ctx.fpr[instr.rd()] = if ctx.fpr[instr.ra()] >= 0.0 { + ctx.fpr[instr.rc()] + } else { + ctx.fpr[instr.rb()] + }; + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Non-IEEE branch-free select.** PowerPC-specific; not in the IEEE-754 spec. Semantics: `FRT = (FRA >= 0.0) ? FRC : FRB`. Used pervasively in compiled PPC for `min`/`max`/`clamp`/`copysign` without branches. xenia-rs uses Rust's `>=` which matches. +- **`-0.0` selects `FRC`.** Per PowerISA, `-0` compares as `>= 0`, so it routes to `FRC` (the "true" branch). xenia's `-0.0 >= 0.0` evaluates true in Rust — semantic match. +- **NaN selects `FRB`.** Per PowerISA, NaN does **not** satisfy `>= 0`, so the result is `FRB`. xenia: any comparison with NaN returns false in Rust, so `>= 0` is false → `FRB` selected. Match. +- **No FPSCR side effects.** `fsel` does **not** raise `VXSNAN` even on signalling NaN inputs, and does **not** update `FPRF`. It is purely a data-movement op. +- **`Rc=1` (`fsel.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **A-form encoding.** Reads `FRA, FRB, FRC`, writes `FRT`. Assembler order: `fsel FD, FA, FC, FB` (note: `FRC` before `FRB`). +- **Common idioms.** + - `min(a,b) = fsel(a-b, b, a)` + - `max(a,b) = fsel(a-b, a, b)` + - `clamp(x, lo, hi) = fsel(x-lo, fsel(hi-x, x, hi), lo)` + - `copysign(x, y) = fsel(y, |x|, -|x|)` (using `fabs`/`fnabs`) +- **Optional ISA.** `fsel` is an optional PowerISA instruction; some implementations trap. Xenon implements it natively. +- **No precision change.** Bit-pattern selection — no rounding regardless of source precision. + +## Related Instructions + +- [`fabsx`](fabsx.md), [`fnegx`](fnegx.md), [`fnabsx`](fnabsx.md) — sign-bit ops; common companions for `fsel`-based copysign/clamp idioms. +- [`fsubx`](fsubx.md) — subtract is the standard way to produce the comparison key (`a - b`). +- [`fcmpux`](fcmpu.md), [`fcmpox`](fcmpo.md) — IEEE compare with branch; the heavyweight alternative to `fsel`. +- [`fmrx`](fmrx.md) — unconditional copy. + +## IBM Reference + +- [AIX 7.3 — `fsel` (Floating Select)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fsel-floating-select-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (note: `fsel` is non-IEEE and uses the `>= 0` convention, not `> 0`). diff --git a/migration/project-root/ppc-manual/fpu/fsqrtsx.md b/migration/project-root/ppc-manual/fpu/fsqrtsx.md new file mode 100644 index 0000000..fca8344 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fsqrtsx.md @@ -0,0 +1,143 @@ +# `fsqrtsx` — Floating Square Root Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec00002c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fsqrts` | `fsqrtsx` | — | Floating Square Root Single | +| `fsqrts.` | `fsqrtsx` | Rc=1 | Floating Square Root Single | + +## Syntax + +```asm +fsqrts[Rc] [FD], [FB] +``` + +## Encoding + +### `fsqrtsx` — form `A` + +- **Opcode word:** `0xec00002c` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `22` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fsqrtsx: read | Source B floating-point register. | +| `FD` | fsqrtsx: write | Destination floating-point register. | +| `CR` | fsqrtsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fsqrtsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fsqrtsx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fsqrtsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fsqrtsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fsqrtsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:168`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L168) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:30`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L30) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:389`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L389) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2801-2814`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2801-L2814) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fsqrtsx => { + let b = ctx.fpr[instr.rb()]; + if b.is_sign_negative() && b != 0.0 && !b.is_nan() { + fpscr::set_exception(ctx, fpscr::VXSQRT); + } + if fpscr::is_snan(b) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + let result = to_single(ctx, b.sqrt()); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single precision.** Result is rounded to IEEE-754 binary32 then re-encoded into the destination 64-bit FPR. xenia computes `to_single(b.sqrt())`. +- **Negative inputs are invalid.** `sqrt(x < 0)` (other than `-0`) sets `FPSCR[VXSQRT, VX, FX]` and yields a quiet NaN. `sqrt(-0) = -0` per IEEE-754. +- **`sqrt(+∞) = +∞`**, exact. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` plus exception bits `XX` (very common — sqrt is rarely exact in single precision), `VXSQRT`, `VXSNAN`. xenia-rs does not update FPSCR (xenia quirk). +- **`Rc=1` (`fsqrts.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Performance.** `fsqrts` is a multi-cycle, non-pipelined operation on Xenon. Hot-path code commonly uses `frsqrte` + Newton-Raphson + `fmul`. +- **Encoding.** A-form, primary 59, XO 22; reads `FRB` only. +- **Rounding mode** uses `FPSCR[RN]` (default nearest-even). + +## Related Instructions + +- [`fsqrtx`](fsqrtx.md) — double-precision square root. +- [`frsqrtex`](frsqrtex.md) — reciprocal-square-root estimate; combined with `fmuls` to compute `1/sqrt(x)` cheaply. +- [`fresx`](fresx.md) — reciprocal estimate; pairs with `fsqrts` to compute `1/sqrt(x)`. +- [`fmulsx`](fmulsx.md), [`fmaddsx`](fmaddsx.md) — used in Newton-Raphson refinement of `frsqrte` outputs. +- [`frspx`](frspx.md) — explicit double→single rounding. + +## IBM Reference + +- [AIX 7.3 — `fsqrts` (Floating Square Root Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fsqrts-floating-square-root-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fsqrtx.md b/migration/project-root/ppc-manual/fpu/fsqrtx.md new file mode 100644 index 0000000..91a3b18 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fsqrtx.md @@ -0,0 +1,144 @@ +# `fsqrtx` — Floating Square Root + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc00002c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fsqrt` | `fsqrtx` | — | Floating Square Root | +| `fsqrt.` | `fsqrtx` | Rc=1 | Floating Square Root | + +## Syntax + +```asm +fsqrt[Rc] [FD], [FB] +``` + +## Encoding + +### `fsqrtx` — form `A` + +- **Opcode word:** `0xfc00002c` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `22` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FB` | fsqrtx: read | Source B floating-point register. | +| `FD` | fsqrtx: write | Destination floating-point register. | +| `CR` | fsqrtx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fsqrtx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fsqrtx` + +- **Reads (always):** `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fsqrtx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fsqrtx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fsqrtx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:164`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L164) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:30`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L30) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:923`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L923) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2786-2800`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2786-L2800) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fsqrtx => { + let b = ctx.fpr[instr.rb()]; + // sqrt of negative (non-zero) is invalid operation → VXSQRT. + if b.is_sign_negative() && b != 0.0 && !b.is_nan() { + fpscr::set_exception(ctx, fpscr::VXSQRT); + } + if fpscr::is_snan(b) { + fpscr::set_exception(ctx, fpscr::VXSNAN); + } + let result = b.sqrt(); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Double precision.** Operates on IEEE-754 binary64; [`fsqrtsx`](fsqrtsx.md) is the single-precision sibling. xenia delegates to host `f64::sqrt`. +- **Negative inputs are invalid.** `sqrt(x < 0)` (other than `-0`) sets `FPSCR[VXSQRT, VX, FX]` and yields a quiet NaN. Note: `sqrt(-0) = -0` per IEEE-754 (preserves sign of zero) — host `f64::sqrt` matches. +- **`sqrt(+∞) = +∞`**, exact. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` plus exception bits `XX` (inexact, very common since `sqrt` is rarely exact), `VXSQRT`, `VXSNAN`. xenia-rs does **not** update FPSCR (xenia quirk). +- **`Rc=1` (`fsqrt.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Performance / availability.** `fsqrt` is a Power-ISA optional instruction; some implementations trap as illegal-opcode. Xenon implements it natively. xenia-rs supports it directly. +- **Encoding.** A-form, primary 63, XO 22; reads `FRB` only — `FRA` and `FRC` are don't-care. +- **Rounding mode** uses `FPSCR[RN]`. + +## Related Instructions + +- [`fsqrtsx`](fsqrtsx.md) — single-precision square root. +- [`frsqrtex`](frsqrtex.md) — reciprocal-square-root estimate (`~1/sqrt(x)`); preferred for normalize/length operations. +- [`fresx`](fresx.md) — reciprocal estimate; pairs with `fsqrt` for `1/sqrt(x)`. +- [`fmulx`](fmulx.md), [`fmaddx`](fmaddx.md) — used in Newton-Raphson refinement of `frsqrte` outputs. +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md) — FPSCR control. + +## IBM Reference + +- [AIX 7.3 — `fsqrt` (Floating Square Root)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fsqrt-floating-square-root-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/) (square-root invalid-operation rules). diff --git a/migration/project-root/ppc-manual/fpu/fsubsx.md b/migration/project-root/ppc-manual/fpu/fsubsx.md new file mode 100644 index 0000000..a06b0b0 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fsubsx.md @@ -0,0 +1,140 @@ +# `fsubsx` — Floating Subtract Single + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xec000028` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fsubs` | `fsubsx` | — | Floating Subtract Single | +| `fsubs.` | `fsubsx` | Rc=1 | Floating Subtract Single | + +## Syntax + +```asm +fsubs[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `fsubsx` — form `A` + +- **Opcode word:** `0xec000028` +- **Primary opcode (bits 0–5):** `59` +- **Extended opcode:** `20` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fsubsx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fsubsx: read | Source B floating-point register. | +| `FD` | fsubsx: write | Destination floating-point register. | +| `CR` | fsubsx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fsubsx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fsubsx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fsubsx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fsubsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fsubsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:135`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L135) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:30`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L30) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:387`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L387) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2585-2594`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2585-L2594) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fsubsx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_add(ctx, a, b, true); + let result = to_single(ctx, a - b); + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single precision.** Result is rounded to IEEE-754 binary32 then re-encoded into the destination 64-bit FPR using the binary64 representation. xenia-rs uses `to_single(a - b)` which performs the round trip via `f64 -> f32 -> f64`. +- **`±∞ − ±∞`** sets `FPSCR[VXISI, VX, FX]` and yields a quiet NaN. +- **FPSCR side effects.** Always updated on hardware: `FPRF`, `FR`, `FI`, `FX`, plus exception bits `OX`, `UX`, `XX`, `VXISI`, `VXSNAN`. xenia-rs does **not** maintain FPSCR in the interpreter snapshot (xenia quirk). +- **`Rc=1` (`fsubs.`)** copies `FPSCR[FX, FEX, VX, OX]` into CR1. +- **NaN propagation.** Quiet-NaN result for any NaN operand; signalling NaNs are quietened. +- **Single-precision overflow.** A double-precision result that would round to a binary32 overflow returns ±∞ and sets `OX`/`XX`/`FX`. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1`; hardware flushes single-precision denormals to zero. xenia inherits host IEEE semantics. +- **Rounding mode** is taken from `FPSCR[RN]`; default is nearest-even. +- **Encoding.** A-form, primary 59, XO 20. `FRC` is don't-care. + +## Related Instructions + +- [`fsubx`](fsubx.md) — double-precision sibling. +- [`faddsx`](faddsx.md), [`fmulsx`](fmulsx.md), [`fdivsx`](fdivsx.md) — companion single-precision ops. +- [`fmsubsx`](fmsubsx.md), [`fnmsubsx`](fnmsubsx.md) — fused single-precision multiply-subtract. +- [`fnegx`](fnegx.md) — sign flip used to express subtract as add-of-negation. +- [`frspx`](frspx.md) — explicit double→single rounding (semantically equivalent to chaining `frsp(fsub(...))`). + +## IBM Reference + +- [AIX 7.3 — `fsubs` (Floating Subtract Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fsubs-floating-subtract-single-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/fpu/fsubx.md b/migration/project-root/ppc-manual/fpu/fsubx.md new file mode 100644 index 0000000..cf34199 --- /dev/null +++ b/migration/project-root/ppc-manual/fpu/fsubx.md @@ -0,0 +1,130 @@ +# `fsubx` — Floating Subtract + +> **Category:** [Floating-Point](../categories/fpu.md) · **Form:** [A](../forms/A.md) · **Opcode:** `0xfc000028` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `fsub` | `fsubx` | — | Floating Subtract | +| `fsub.` | `fsubx` | Rc=1 | Floating Subtract | + +## Syntax + +```asm +fsub[Rc] [FD], [FA], [FB] +``` + +## Encoding + +### `fsubx` — form `A` + +- **Opcode word:** `0xfc000028` +- **Primary opcode (bits 0–5):** `63` +- **Extended opcode:** `20` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (59 or 63) | +| 6–10 | `FRT` | destination FPR | +| 11–15 | `FRA` | source A FPR | +| 16–20 | `FRB` | source B FPR | +| 21–25 | `FRC` | source C FPR (multiplier for madd-style ops) | +| 26–30 | `XO` | extended opcode (5 bits) | +| 31 | `Rc` | record-form flag (updates CR1) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FA` | fsubx: read | Source A floating-point register (`fr0`–`fr31`). | +| `FB` | fsubx: read | Source B floating-point register. | +| `FD` | fsubx: write | Destination floating-point register. | +| `CR` | fsubx: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | +| `FPSCR` | fsubx: write | Floating-Point Status and Control Register. | + +## Register Effects + +### `fsubx` + +- **Reads (always):** `FA`, `FB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `FPSCR` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `fsubx`: **CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.; **FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions). + +## Operation (pseudocode) + +``` +FRT <- FRA − FRB +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`fsubx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="fsubx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_fpu.cc:127`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_fpu.cc#L127) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:30`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L30) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:921`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L921) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2575-2584`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2575-L2584) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::fsubx => { + let a = ctx.fpr[instr.ra()]; + let b = ctx.fpr[instr.rb()]; + fpscr::check_invalid_add(ctx, a, b, true); + let result = a - b; + ctx.fpr[instr.rd()] = result; + fpscr::update_after_op(ctx, result, a.is_finite() && b.is_finite()); + if instr.rc_bit() { update_cr1_from_fpscr(ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Double precision.** `fsub` operates on IEEE-754 binary64. The single-precision sibling is [`fsubsx`](fsubsx.md), which rounds the result to binary32 before re-encoding it into the 64-bit FPR. +- **`±∞ − ±∞` is the canonical invalid case.** Same-signed infinity subtraction (or opposite-signed addition) yields `QNaN(VXISI)` and sets `FPSCR[VXISI, VX, FX]`. +- **FPSCR side effects.** Hardware updates `FPRF`, `FR`, `FI`, `FX` plus exception bits `OX`, `UX`, `XX`, `VXISI`, `VXSNAN` as appropriate. xenia-rs's interpreter does **not** model FPSCR updates — a xenia quirk that almost never matters in practice. +- **`Rc=1` (`fsub.`)** writes `CR1` from `FPSCR[FX, FEX, VX, OX]`. +- **NaN propagation.** Any NaN operand yields a quiet NaN; a signalling NaN input is quietened (signalling bit cleared) per PowerISA. Host `f64 -` is relied on for the value. +- **Sign of zero.** `+0 − +0 = +0` in round-to-nearest, `−0` in round-toward-negative-infinity. xenia inherits host semantics. +- **Denormal flush.** Xenon boots with `FPSCR[NI]=1` (non-IEEE mode) so subnormal results flush to zero on hardware. Xenia produces IEEE-compliant denormals from the host FPU; titles relying on flush-to-zero typically see no observable difference for game logic but may see subtle differences in audio DSP. +- **Encoding.** A-form, primary 63, XO 20. `FRC` is don't-care for sub. + +## Related Instructions + +- [`fsubsx`](fsubsx.md) — single-precision subtract (rounds to binary32). +- [`faddx`](faddx.md), [`faddsx`](faddsx.md) — add counterparts; subtract is implemented as add-with-negated-B on most cores. +- [`fnegx`](fnegx.md) — sign flip (the bit-pattern operation behind `−FRB`). +- [`fmsubx`](fmsubx.md), [`fnmsubx`](fnmsubx.md) — fused multiply-subtract (single rounding step). +- [`mffsx`](mffsx.md), [`mtfsfx`](mtfsfx.md), [`mtfsb0x`](mtfsb0x.md), [`mtfsb1x`](mtfsb1x.md) — FPSCR control. + +## IBM Reference + +- [AIX 7.3 — `fsub` (Floating Subtract)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-fs-fsub-floating-subtract-instruction) +- [PowerISA v2.07B, Book I, Chapter 4 — Floating-Point Processor](https://openpowerfoundation.org/specifications/isa/). diff --git a/migration/project-root/ppc-manual/generator/README.md b/migration/project-root/ppc-manual/generator/README.md new file mode 100644 index 0000000..92b55ec --- /dev/null +++ b/migration/project-root/ppc-manual/generator/README.md @@ -0,0 +1,114 @@ +# Manual generator + +Python scripts that build the `ppc-manual/` tree from the two +authoritative sources in this repository: + +- `xenia-canary/tools/ppc-instructions.xml` — metadata for all 455 + Xbox 360 PPC instructions (mnemonic, form, group, opcode, in/out + fields, disasm template). +- `xenia-rs/crates/xenia-cpu/src/` — the Rust interpreter. Individual + instruction semantics live in `interpreter.rs` match arms. +- `xenia-canary/src/xenia/cpu/ppc/ppc_emit_*.cc` — the C++ emit + functions; referenced by line number only. + +## Files + +| File | Purpose | +| --- | --- | +| `generate_manual.py` | Main entry point. Parses XML, builds families, renders pages, writes `index.json`. | +| `xml_model.py` | XML parser + `expand_runtime_variants()` (produces the set of Rc/OE/LK-expanded mnemonics a single XML entry covers). | +| `bit_layout.py` | Per-form bit-field tables (rendered into the Encoding section of every page and into `forms/*.md`). | +| `rust_scraper.py` | Locates each `PpcOpcode::` enum variant, decoder arm, and interpreter match-arm line range. | +| `cxx_scraper.py` | Locates `InstrEmit_` in the xenia-canary emit `.cc` files. | + +## Running + +```bash +python3 ppc-manual/generator/generate_manual.py # full generate +python3 ppc-manual/generator/generate_manual.py --dry-run # parse + consistency checks only +python3 ppc-manual/generator/generate_manual.py --out /tmp/out # alternate output root +python3 ppc-manual/generator/generate_manual.py --xml /path/to/ppc-instructions.xml +``` + +No third-party dependencies; Python 3.10+ standard library only. + +## Idempotency + +The generator is re-runnable without data loss: + +1. Each page has a pair of sentinel comments: + - `` + - `` +2. On re-run, only the text **between** the sentinels is rewritten. + Everything after `END` (Special Cases, Related Instructions, IBM + Reference) is preserved verbatim. +3. If the `END` sentinel is missing, the generator assumes a reviewer + has fully taken over the file and skips it entirely. + +## Consistency checks (enforced by `--dry-run` as well) + +- **XML entry count ≡ 455** — warns if the XML has been modified. +- **family membership total ≡ XML entry count** — every XML entry + must land in exactly one family. +- **index coverage ≡ runtime-expanded mnemonic count** — the JSON + index must contain a key for every runtime variant (`add`, `add.`, + `addo`, `addo.`, `bclr`, `bclrl`, …). + +## Family grouping rules + +Three rules applied in order (see `_family_head` in +`generate_manual.py`): + +1. If a mnemonic ends in `128` and the non-128 sibling exists, it + joins the sibling's family. So `vaddfp128` is consolidated into + the `vaddfp` page. +2. For memory ops (group `m`), trailing `u`, `x`, or `ux` suffixes + are stripped when the base exists. So `lwz`, `lwzu`, `lwzx`, + `lwzux` all land on the `lwz` page. +3. Otherwise the mnemonic is its own family head. + +All other flag variants (`Rc`, `OE`, `LK`) are **runtime** — they are +NOT separate XML entries; they are listed in the page's "Assembler +Mnemonics" table. + +## Category mapping + +| XML group | Category dir | Notes | +| --- | --- | --- | +| `i` (integer) | `alu/` | | +| `m` (memory) | `memory/` | | +| `b` (branch) | `branch/` | Includes `sc` and traps | +| `c` (control) | `control/` | CR logical, SPR, sync | +| `f` (fpu) | `fpu/` | | +| `v` (vector) | `vmx/` or `vmx128/` | Split by form: `VX128*` → `vmx128/` | + +## Extending the generator + +- **Pseudocode seeds.** The `PSEUDOCODE_SEEDS` dict in + `generate_manual.py` maps an XML mnemonic to a PPC-style pseudocode + block. Add entries here to pre-fill the Operation section for + additional mnemonics. Phase 2 reviewers can still override by + writing content outside the sentinels. +- **C translation seeds.** Similar dict of C snippets keyed by family + head. +- **Field descriptions.** `FIELD_DESCRIPTIONS` maps XML field names to + IBM-style prose. Missing entries are marked "_Phase 2: document + this field._" + +## Known limitations + +- Extended-opcode extraction in `xml_model.Instruction.extended_opcode` + is best-effort per form. For VMX128 variants the extracted value may + not match the exact pattern used by xenia's decoder tree — the page + still shows it as a reference but the decoder source (linked on + every page) is authoritative. +- `rust_scraper` uses a naive brace counter to delimit interpreter + match arms. It works for the current interpreter because the match + arms use balanced braces and no string literals with unbalanced + braces. If the interpreter ever adopts such literals the scraper + will need a Rust-aware parser. +- The generator treats mnemonics ending in `x` as xenia convention + ("extended/XO form") and strips them for assembly display — except + for the memory group, where `x` is the natural indexed-form suffix. + If future xenia XML adds a new group where `x` is structural, the + heuristic in `xml_model.expand_runtime_variants` needs updating. diff --git a/migration/project-root/ppc-manual/generator/__pycache__/bit_layout.cpython-312.pyc b/migration/project-root/ppc-manual/generator/__pycache__/bit_layout.cpython-312.pyc new file mode 100644 index 0000000..d78f676 Binary files /dev/null and b/migration/project-root/ppc-manual/generator/__pycache__/bit_layout.cpython-312.pyc differ diff --git a/migration/project-root/ppc-manual/generator/__pycache__/cxx_scraper.cpython-312.pyc b/migration/project-root/ppc-manual/generator/__pycache__/cxx_scraper.cpython-312.pyc new file mode 100644 index 0000000..1ea760a Binary files /dev/null and b/migration/project-root/ppc-manual/generator/__pycache__/cxx_scraper.cpython-312.pyc differ diff --git a/migration/project-root/ppc-manual/generator/__pycache__/rust_scraper.cpython-312.pyc b/migration/project-root/ppc-manual/generator/__pycache__/rust_scraper.cpython-312.pyc new file mode 100644 index 0000000..540feee Binary files /dev/null and b/migration/project-root/ppc-manual/generator/__pycache__/rust_scraper.cpython-312.pyc differ diff --git a/migration/project-root/ppc-manual/generator/__pycache__/xml_model.cpython-312.pyc b/migration/project-root/ppc-manual/generator/__pycache__/xml_model.cpython-312.pyc new file mode 100644 index 0000000..610cdb8 Binary files /dev/null and b/migration/project-root/ppc-manual/generator/__pycache__/xml_model.cpython-312.pyc differ diff --git a/migration/project-root/ppc-manual/generator/bit_layout.py b/migration/project-root/ppc-manual/generator/bit_layout.py new file mode 100644 index 0000000..75c6ddf --- /dev/null +++ b/migration/project-root/ppc-manual/generator/bit_layout.py @@ -0,0 +1,266 @@ +""" +Canonical bit layout per PPC instruction form. + +Tables derived from xenia-canary/src/xenia/cpu/ppc/ppc_instr.h (struct +PPCOpcodeBits union). In PPC notation bit 0 is the MSB of the 32-bit +word (big-endian bit numbering). + +Each entry is a list of (bit_start, bit_end_inclusive, field_name, +notes) tuples laid out from MSB (bit 0) to LSB (bit 31). +""" + + +# NOTE: bit ranges use PPC big-endian numbering (0 = MSB, 31 = LSB). + +FORM_LAYOUTS: dict[str, list[tuple[int, int, str, str]]] = { + "I": [ + (0, 5, "OPCD", "primary opcode"), + (6, 29, "LI", "signed 24-bit word-offset target"), + (30, 30, "AA", "absolute-address flag"), + (31, 31, "LK", "link flag (bl/ba/bla)"), + ], + "B": [ + (0, 5, "OPCD", "primary opcode"), + (6, 10, "BO", "branch options"), + (11, 15, "BI", "CR bit to test"), + (16, 29, "BD", "signed 14-bit word-offset target"), + (30, 30, "AA", "absolute-address flag"), + (31, 31, "LK", "link flag"), + ], + "SC": [ + (0, 5, "OPCD", "primary opcode (17)"), + (6, 19, "—", "reserved"), + (20, 26, "LEV", "exception level"), + (27, 29, "—", "reserved"), + (30, 30, "1", "fixed 1"), + (31, 31, "—", "reserved"), + ], + "D": [ + (0, 5, "OPCD", "primary opcode"), + (6, 10, "RT", "destination GPR (or RS when storing)"), + (11, 15, "RA", "source GPR (0 ⇒ literal 0 for RA0 forms)"), + (16, 31, "D/SI/UI", "16-bit signed or unsigned immediate"), + ], + "DS": [ + (0, 5, "OPCD", "primary opcode"), + (6, 10, "RT", "destination GPR (or RS)"), + (11, 15, "RA", "source GPR (0 ⇒ literal 0)"), + (16, 29, "DS", "14-bit signed word-scaled displacement"), + (30, 31, "XO", "extended opcode"), + ], + "X": [ + (0, 5, "OPCD", "primary opcode"), + (6, 10, "RT/FRT/VRT", "destination"), + (11, 15, "RA/FRA/VRA", "source A"), + (16, 20, "RB/FRB/VRB", "source B"), + (21, 30, "XO", "extended opcode (10 bits)"), + (31, 31, "Rc", "record-form flag"), + ], + "XL": [ + (0, 5, "OPCD", "primary opcode (19)"), + (6, 10, "BT/BO", "target / branch options"), + (11, 15, "BA/BI", "source A / CR bit to test"), + (16, 20, "BB", "source B"), + (21, 30, "XO", "extended opcode (10 bits)"), + (31, 31, "LK", "link flag"), + ], + "XFX": [ + (0, 5, "OPCD", "primary opcode (31)"), + (6, 10, "RT", "destination / source GPR"), + (11, 20, "spr/tbr/FXM", "SPR/TBR number (byte-swapped halves) or CR field mask"), + (21, 30, "XO", "extended opcode"), + (31, 31, "—", "reserved"), + ], + "XFL": [ + (0, 5, "OPCD", "primary opcode (63)"), + (6, 6, "L", "field-select behaviour"), + (7, 14, "FM", "FPSCR field mask"), + (15, 15, "W", "immediate-value flag"), + (16, 20, "FRB", "source FPR"), + (21, 30, "XO", "extended opcode"), + (31, 31, "Rc", "record-form flag (updates CR1)"), + ], + "XS": [ + (0, 5, "OPCD", "primary opcode (31)"), + (6, 10, "RS", "source GPR"), + (11, 15, "RA", "destination GPR"), + (16, 20, "sh", "shift amount low 5 bits"), + (21, 29, "XO", "extended opcode (9 bits)"), + (30, 30, "sh5", "shift amount high bit"), + (31, 31, "Rc", "record-form flag"), + ], + "XO": [ + (0, 5, "OPCD", "primary opcode (31)"), + (6, 10, "RT", "destination GPR"), + (11, 15, "RA", "source A"), + (16, 20, "RB", "source B"), + (21, 21, "OE", "overflow-enable flag"), + (22, 30, "XO", "extended opcode (9 bits)"), + (31, 31, "Rc", "record-form flag"), + ], + "A": [ + (0, 5, "OPCD", "primary opcode (59 or 63)"), + (6, 10, "FRT", "destination FPR"), + (11, 15, "FRA", "source A FPR"), + (16, 20, "FRB", "source B FPR"), + (21, 25, "FRC", "source C FPR (multiplier for madd-style ops)"), + (26, 30, "XO", "extended opcode (5 bits)"), + (31, 31, "Rc", "record-form flag (updates CR1)"), + ], + "M": [ + (0, 5, "OPCD", "primary opcode"), + (6, 10, "RS", "source GPR"), + (11, 15, "RA", "destination GPR"), + (16, 20, "SH/RB", "shift amount or source B"), + (21, 25, "MB", "mask begin"), + (26, 30, "ME", "mask end"), + (31, 31, "Rc", "record-form flag"), + ], + "MD": [ + (0, 5, "OPCD", "primary opcode (30)"), + (6, 10, "RS", "source GPR"), + (11, 15, "RA", "destination GPR"), + (16, 20, "sh", "shift amount low 5 bits"), + (21, 26, "mb/me", "6-bit mask field (swapped halves)"), + (27, 29, "XO", "extended opcode"), + (30, 30, "sh5", "shift amount high bit"), + (31, 31, "Rc", "record-form flag"), + ], + "MDS": [ + (0, 5, "OPCD", "primary opcode (30)"), + (6, 10, "RS", "source GPR"), + (11, 15, "RA", "destination GPR"), + (16, 20, "RB", "source B GPR"), + (21, 26, "mb/me", "6-bit mask field (swapped halves)"), + (27, 30, "XO", "extended opcode"), + (31, 31, "Rc", "record-form flag"), + ], + "DCBZ": [ + (0, 5, "OPCD", "primary opcode (31)"), + (6, 10, "—", "reserved"), + (11, 15, "RA", "base register (0 ⇒ literal 0)"), + (16, 20, "RB", "offset register"), + (21, 30, "XO", "extended opcode (1014 for dcbz / 1010 for dcbz128)"), + (31, 31, "—", "reserved"), + ], + "VX": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VRT/VD", "destination vector register"), + (11, 15, "VRA/VA", "source A vector register"), + (16, 20, "VRB/VB", "source B vector register"), + (21, 31, "XO", "extended opcode (11 bits)"), + ], + "VA": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VRT", "destination vector register"), + (11, 15, "VRA", "source A"), + (16, 20, "VRB", "source B"), + (21, 25, "VRC", "source C / shift"), + (26, 31, "XO", "extended opcode (6 bits)"), + ], + "VC": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VRT", "destination vector register"), + (11, 15, "VRA", "source A"), + (16, 20, "VRB", "source B"), + (21, 21, "Rc", "record-form flag (updates CR6)"), + (22, 31, "XO", "extended opcode (10 bits)"), + ], + "VX128": [ + (0, 5, "OPCD", "primary opcode (4 or 5)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "VA128l", "source A low 5 bits"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 21, "VA128H", "source A high bit"), + (22, 22, "—", "reserved"), + (23, 25, "VC", "optional VC / XO sub-field"), + (26, 26, "VA128h", "source A middle bit"), + (27, 27, "—", "reserved"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_1": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "RA", "address register"), + (16, 20, "RB", "offset register"), + (21, 27, "XO", "extended opcode"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "—", "reserved"), + ], + "VX128_2": [ + (0, 5, "OPCD", "primary opcode (5)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "VA128l", "source A low 5 bits"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 21, "VA128H", "source A high bit"), + (23, 25, "VC", "source C 3-bit field"), + (26, 26, "VA128h", "source A middle bit"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_3": [ + (0, 5, "OPCD", "primary opcode (6)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "IMM", "5-bit immediate"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 27, "XO", "extended opcode"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_4": [ + (0, 5, "OPCD", "primary opcode (6)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "IMM", "5-bit immediate"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 23, "XO", "extended opcode"), + (24, 25, "z", "sub-operation selector"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_5": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "VA128l", "source A low 5 bits"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 21, "VA128H", "source A high bit"), + (22, 25, "SH", "4-bit shift amount"), + (26, 26, "VA128h", "source A middle bit"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_P": [ + (0, 5, "OPCD", "primary opcode (6)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "PERMl", "permute selector low 5 bits"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 22, "—", "reserved"), + (23, 25, "PERMh", "permute selector high 3 bits"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], + "VX128_R": [ + (0, 5, "OPCD", "primary opcode (4)"), + (6, 10, "VD128l", "destination low 5 bits"), + (11, 15, "VA128l", "source A low 5 bits"), + (16, 20, "VB128l", "source B low 5 bits"), + (21, 21, "VA128H", "source A high bit"), + (22, 25, "XO", "extended opcode (compare)"), + (26, 26, "VA128h", "source A middle bit"), + (27, 27, "Rc", "record-form flag (updates CR6)"), + (28, 29, "VD128h", "destination high 2 bits"), + (30, 31, "VB128h", "source B high 2 bits"), + ], +} + + +def render_bit_table(form: str) -> str: + """Return a markdown table of the form's bit layout.""" + layout = FORM_LAYOUTS.get(form) + if not layout: + return f"_Unknown form_ `{form}` _— see `forms/` for details._" + rows = ["| Bits | Field | Meaning |", "| --- | --- | --- |"] + for start, end, name, notes in layout: + span = f"{start}" if start == end else f"{start}–{end}" + rows.append(f"| {span} | `{name}` | {notes} |") + return "\n".join(rows) diff --git a/migration/project-root/ppc-manual/generator/cxx_scraper.py b/migration/project-root/ppc-manual/generator/cxx_scraper.py new file mode 100644 index 0000000..acb6856 --- /dev/null +++ b/migration/project-root/ppc-manual/generator/cxx_scraper.py @@ -0,0 +1,75 @@ +""" +Scrapes xenia-canary's emit files for the location of each instruction's +semantic implementation function `InstrEmit_`. + +The files are: + src/xenia/cpu/ppc/ppc_emit_alu.cc (integer ALU) + src/xenia/cpu/ppc/ppc_emit_memory.cc (loads/stores/cache/sync) + src/xenia/cpu/ppc/ppc_emit_altivec.cc (VMX + VMX128) + src/xenia/cpu/ppc/ppc_emit_fpu.cc (floating-point) + src/xenia/cpu/ppc/ppc_emit_control.cc (branch/CR/SPR/syscall/trap) + +Returns, for each mnemonic, the relative file path and the starting line +of the `int InstrEmit_(...)` definition. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path +import re + + +CXX_EMIT_FILES = [ + "src/xenia/cpu/ppc/ppc_emit_alu.cc", + "src/xenia/cpu/ppc/ppc_emit_memory.cc", + "src/xenia/cpu/ppc/ppc_emit_altivec.cc", + "src/xenia/cpu/ppc/ppc_emit_fpu.cc", + "src/xenia/cpu/ppc/ppc_emit_control.cc", +] + + +@dataclass +class CxxRef: + mnem: str + emit_file: str | None = None # relative to xenia-canary/ + emit_line: int | None = None + + +def _cxx_ident(mnem: str) -> str: + """Canary maps '.' in the mnemonic to a trailing 'x' in the C++ symbol + (e.g. addic. → InstrEmit_addicx).""" + return mnem.replace(".", "x") + + +class CxxScraper: + def __init__(self, repo_root: Path): + self.canary_root = repo_root / "xenia-canary" + self._index: dict[str, tuple[str, int]] = {} + fn_pat = re.compile(r"^\s*int\s+InstrEmit_([A-Za-z_][A-Za-z0-9_]*)\s*\(") + for rel in CXX_EMIT_FILES: + path = self.canary_root / rel + if not path.is_file(): + continue + for i, line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1): + m = fn_pat.match(line) + if not m: + continue + name = m.group(1) + self._index.setdefault(name, (rel, i)) + + def lookup(self, mnem: str) -> CxxRef: + ident = _cxx_ident(mnem) + hit = self._index.get(ident) + if hit is None: + return CxxRef(mnem=mnem) + return CxxRef(mnem=mnem, emit_file=hit[0], emit_line=hit[1]) + + +if __name__ == "__main__": + root = Path(__file__).resolve().parent.parent.parent + s = CxxScraper(root) + for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp", + "vaddfp128", "faddx", "lvsl"): + r = s.lookup(m) + print(f"{m:12s} {r.emit_file}:{r.emit_line}") diff --git a/migration/project-root/ppc-manual/generator/generate_manual.py b/migration/project-root/ppc-manual/generator/generate_manual.py new file mode 100644 index 0000000..1fb4993 --- /dev/null +++ b/migration/project-root/ppc-manual/generator/generate_manual.py @@ -0,0 +1,1093 @@ +#!/usr/bin/env python3 +""" +PowerPC Instruction Manual generator. + +Reads `xenia-canary/tools/ppc-instructions.xml` plus the xenia-rs and +xenia-canary source trees, and emits a tree of one Markdown page per +instruction family together with a machine-readable `index.json` at the +manual root. + +Usage: + python3 generator/generate_manual.py [--dry-run] [--out PATH] + +The generator is idempotent. Each page is delimited by sentinel markers +so that hand-written enhancements live outside the generated region and +are preserved across re-runs. See `ppc-manual/README.md` for conventions. +""" + +from __future__ import annotations + +import argparse +import json +import re +import sys +from collections import defaultdict +from dataclasses import dataclass, field +from pathlib import Path + +# Allow running directly or as a module. +HERE = Path(__file__).resolve().parent +sys.path.insert(0, str(HERE)) + +from xml_model import ( # noqa: E402 + Instruction, + GROUP_NAMES, + load_instructions, + expand_runtime_variants, +) +from bit_layout import FORM_LAYOUTS, render_bit_table # noqa: E402 +from rust_scraper import RustScraper # noqa: E402 +from cxx_scraper import CxxScraper # noqa: E402 + + +# --------------------------------------------------------------------------- +# Configuration +# --------------------------------------------------------------------------- + +REPO_ROOT = HERE.parent.parent +MANUAL_ROOT_DEFAULT = REPO_ROOT / "ppc-manual" +XML_PATH = REPO_ROOT / "xenia-canary" / "tools" / "ppc-instructions.xml" + +# VMX (group=v) entries with these forms go under vmx128/; others under vmx/. +VMX128_FORMS = { + "VX128", "VX128_1", "VX128_2", "VX128_3", + "VX128_4", "VX128_5", "VX128_P", "VX128_R", +} + +GROUP_TO_CATEGORY = { + "i": "alu", + "m": "memory", + "b": "branch", + "c": "control", + "f": "fpu", + # "v" resolved by form +} + +CATEGORY_LABELS = { + "alu": ("Integer ALU", "Fixed-point add/sub/multiply/divide, logical, rotate, shift, compare, count-leading-zeros, sign-extension, trap-on-condition."), + "memory": ("Memory", "Loads/stores for byte, half, word, doubleword, float, multiple and string; cache management (dcbt, dcbf, dcbz); reservation pair lwarx/stwcx."), + "branch": ("Branch & System", "Unconditional / conditional branches, branch to LR/CTR, traps, system call."), + "fpu": ("Floating-Point", "IEEE-754 add/sub/mul/div/sqrt, fused multiply-add, conversions, compares, FPSCR moves."), + "vmx": ("VMX (Altivec)", "128-bit SIMD over 32 registers V0–V31. Integer/float arithmetic, logical, compare, permute/merge, pack/unpack, saturation helpers."), + "vmx128": ("VMX128", "Xbox-360-specific Altivec extension that widens the vector register file to 128 registers (V0–V127). Register IDs are encoded with bit-fusion across non-contiguous fields."), + "control": ("Control / CR / SPR", "Condition-register logical ops, CR field moves, mfspr/mtspr/mtcrf, time-base reads, synchronisation (sync, isync, eieio)."), +} + +# Field descriptions used for operand tables. Keyed by XML field name. +FIELD_DESCRIPTIONS = { + "RA": "Source GPR (`r0`–`r31`).", + "RA0": "Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`.", + "RB": "Source GPR.", + "RD": "Destination GPR.", + "RS": "Source GPR (alias for RD in some stores).", + "RT": "Destination GPR (alias for RD).", + "OE": "Overflow-enable bit. When 1, the instruction updates `XER[OV]` and stickies `XER[SO]` on signed overflow.", + "CR": "Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result.", + "CA": "XER[CA] carry bit. Read by add-with-carry/subtract-with-borrow instructions, written by carrying instructions.", + "CRM": "8-bit CR field mask used by `mtcrf` — one bit per CR field.", + "CRFD": "CR destination field (`crf`, 0–7).", + "CRFS": "CR source field.", + "CRBA": "CR source bit A (0–31).", + "CRBB": "CR source bit B (0–31).", + "CRBD": "CR destination bit (0–31).", + "IMM": "Generic immediate field.", + "SIMM": "16-bit signed immediate. Sign-extended to 64 bits before use.", + "UIMM": "16-bit unsigned immediate. Zero-extended.", + "d": "16-bit signed displacement (`d`) added to the base address register.", + "ds": "14-bit signed word-aligned displacement (`DS << 2`).", + "LR": "Link register. Written by `bl`/`bla`/`bcl`/`bclrl`/`bcctrl`; read by `bclr`/`bclrl`.", + "BI": "CR bit index (0–31) selected by BO's condition test.", + "BO": "5-bit branch options — selects CTR decrement, CTR test polarity, and CR bit test polarity. See `forms/XL.md`.", + "CTR": "Count register. Decremented and optionally tested by conditional branches when `BO[2]=0`.", + "LK": "Link bit. When 1, LR ← address-of-next-instruction before the branch is taken.", + "AA": "Absolute-address bit. When 1, the branch target is the sign-extended displacement itself; when 0, it is added to the current instruction address.", + "L": "Operand-length bit for compare instructions (`0 ⇒ 32-bit`, `1 ⇒ 64-bit`).", + "FPSCR": "Floating-Point Status and Control Register.", + "FPSCRD": "FPSCR destination field.", + "MSR": "Machine State Register.", + "SPR": "Special-Purpose-Register number. Encoded with the two 5-bit halves swapped (bits 11-15 become the high half, bits 16-20 the low half).", + "VSCR": "Vector Status and Control Register (NJ/SAT bits).", + "TBR": "Time-Base Register selector for `mftb`.", + "FM": "8-bit FPSCR field-mask used by `mtfsf`.", + "FA": "Source A floating-point register (`fr0`–`fr31`).", + "FB": "Source B floating-point register.", + "FC": "Source C floating-point register (for madd-style ops).", + "FD": "Destination floating-point register.", + "FS": "Source floating-point register.", + "VA": "Source A vector register.", + "VB": "Source B vector register.", + "VC": "Source C vector register / 3-bit selector.", + "VD": "Destination vector register.", + "VS": "Source vector register (alias for VD on stores).", + "SH": "Shift amount.", + "SHB": "Shift amount (byte granularity, `vsldoi`).", + "MB": "Mask begin bit.", + "ME": "Mask end bit.", + "TO": "Trap-on condition mask (5 bits) — LT, GT, EQ, LGT, LLT bits.", + "LEV": "System-call exception level (for `sc`).", + "ADDR": "Encoded branch target displacement (24-bit for I-form, 14-bit for B-form, word-shifted).", +} + +# Simple per-mnemonic pseudocode seeds for the most common ALU patterns. +# Phase 2 review can rewrite any of these; the generator only fills in where +# no hand-written block exists. +PSEUDOCODE_SEEDS: dict[str, str] = { + "addx": "RT <- (RA) + (RB)", + "addcx": "RT <- (RA) + (RB)\nCA <- carry_out_of_32_or_64_bit_add((RA), (RB))", + "addex": "RT <- (RA) + (RB) + CA\nCA <- carry_out_of_the_add", + "addmex": "RT <- (RA) + CA + 0xFFFF_FFFF_FFFF_FFFF\nCA <- carry_out", + "addzex": "RT <- (RA) + CA\nCA <- carry_out", + "addi": "if RA = 0 then RT <- EXTS(SIMM)\nelse RT <- (RA) + EXTS(SIMM)", + "addic": "RT <- (RA) + EXTS(SIMM)\nCA <- carry_out", + "addicx": "RT <- (RA) + EXTS(SIMM)\nCA <- carry_out\nCR0 <- signed_compare(RT, 0)", + "addis": "if RA = 0 then RT <- EXTS(SIMM) << 16\nelse RT <- (RA) + (EXTS(SIMM) << 16)", + "subfx": "RT <- ~(RA) + (RB) + 1 ; = (RB) − (RA)", + "subfcx": "RT <- ~(RA) + (RB) + 1\nCA <- carry_out", + "subfex": "RT <- ~(RA) + (RB) + CA\nCA <- carry_out", + "subfic": "RT <- ~(RA) + EXTS(SIMM) + 1\nCA <- carry_out", + "negx": "RT <- ~(RA) + 1", + "andx": "RA <- (RS) & (RB)", + "andcx": "RA <- (RS) & ~(RB)", + "andix": "RA <- (RS) & (0x0000 || UIMM)", + "andisx": "RA <- (RS) & (UIMM || 0x0000)", + "orx": "RA <- (RS) | (RB)", + "orcx": "RA <- (RS) | ~(RB)", + "ori": "RA <- (RS) | (0x0000 || UIMM)", + "oris": "RA <- (RS) | (UIMM || 0x0000)", + "xorx": "RA <- (RS) ^ (RB)", + "xori": "RA <- (RS) ^ (0x0000 || UIMM)", + "xoris": "RA <- (RS) ^ (UIMM || 0x0000)", + "nandx": "RA <- ~((RS) & (RB))", + "norx": "RA <- ~((RS) | (RB))", + "eqvx": "RA <- ~((RS) ^ (RB))", + "extsbx": "RA <- EXTS_8_to_64((RS)[56:63])", + "extshx": "RA <- EXTS_16_to_64((RS)[48:63])", + "extswx": "RA <- EXTS_32_to_64((RS)[32:63])", + "mullwx": "RT <- ((RA)[32:63]) * ((RB)[32:63]) ; signed 32×32 → 64", + "mulhwx": "RT <- high_32_of_signed_multiply((RA)[32:63], (RB)[32:63]) sign-extended to 64", + "mulhwux": "RT <- high_32_of_unsigned_multiply((RA)[32:63], (RB)[32:63]) zero-extended to 64", + "mulldx": "RT <- ((RA) * (RB))[64:127] ; low 64 of signed 64×64", + "mulhdx": "RT <- ((RA) * (RB))[0:63] ; high 64 of signed 64×64", + "mulhdux": "RT <- ((RA) * (RB))[0:63] ; high 64 of unsigned 64×64", + "mulli": "RT <- ((RA) * EXTS(SIMM))[64:127]", + "divwx": "RT <- ((RA)[32:63] /s (RB)[32:63]) sign-extended to 64 ; undefined if RB=0 or overflow", + "divwux": "RT <- ((RA)[32:63] /u (RB)[32:63]) zero-extended to 64 ; undefined if RB=0", + "divdx": "RT <- (RA) /s (RB) ; undefined if RB=0 or (RA=-2^63 and RB=-1)", + "divdux": "RT <- (RA) /u (RB) ; undefined if RB=0", + "cmp": "if L = 0 then a,b <- EXTS((RA)[32:63]), EXTS((RB)[32:63])\nelse a,b <- (RA), (RB)\nCR[BF] <- signed_compare(a, b) || XER[SO]", + "cmpl": "if L = 0 then a,b <- (RA)[32:63], (RB)[32:63]\nelse a,b <- (RA), (RB)\nCR[BF] <- unsigned_compare(a, b) || XER[SO]", + "cmpi": "if L = 0 then a,b <- EXTS((RA)[32:63]), EXTS(SIMM)\nelse a,b <- (RA), EXTS(SIMM)\nCR[BF] <- signed_compare(a, b) || XER[SO]", + "cmpli": "if L = 0 then a,b <- (RA)[32:63], UIMM\nelse a,b <- (RA), (0 || UIMM)\nCR[BF] <- unsigned_compare(a, b) || XER[SO]", + "cntlzwx": "n <- number_of_leading_zero_bits((RS)[32:63]) ; n in 0..32\nRA <- zero_extend(n)", + "cntlzdx": "n <- number_of_leading_zero_bits((RS)) ; n in 0..64\nRA <- zero_extend(n)", + "slwx": "n <- (RB)[58:63]\nRA <- ((RS) << n) & 0x0000_0000_FFFF_FFFF if n < 32 else 0", + "srwx": "n <- (RB)[58:63]\nRA <- ((RS)[32:63] >> n) zero-extended if n < 32 else 0", + "srawx": "n <- (RB)[58:63]\nRA <- ((RS)[32:63] >>a n) sign-extended\nCA <- 1 if (signed RS < 0) && any_bit_shifted_out else 0", + "sldx": "n <- (RB)[57:63]\nRA <- ((RS) << n) if n < 64 else 0", + "srdx": "n <- (RB)[57:63]\nRA <- ((RS) >> n) if n < 64 else 0", + "sradx": "n <- (RB)[57:63]\nRA <- ((RS) >>a n) sign-extended if n < 64\nCA <- (RS signed < 0) && any_bit_shifted_out", + "srawix": "RA <- ((RS)[32:63] >>a SH) sign-extended\nCA <- (RS[32] signed) && any_low_bit_shifted_out", + "sradix": "RA <- ((RS) >>a SH) sign-extended\nCA <- (RS signed < 0) && any_bit_shifted_out", + # Branch family + "bx": "NIA <- (CIA + EXTS(LI || 0b00)) if AA=0\n <- EXTS(LI || 0b00) if AA=1\nif LK then LR <- CIA + 4", + "bcx": "if ¬BO[2] then CTR <- CTR − 1\nctr_ok <- BO[2] | ((CTR ≠ 0) XOR BO[3])\ncond_ok <- BO[0] | (CR[BI] ≡ BO[1])\nif ctr_ok & cond_ok then NIA <- CIA + EXTS(BD || 0b00) (AA=0)\n EXTS(BD || 0b00) (AA=1)\nif LK then LR <- CIA + 4", + "bclrx": "if ¬BO[2] then CTR <- CTR − 1\nctr_ok <- BO[2] | ((CTR ≠ 0) XOR BO[3])\ncond_ok <- BO[0] | (CR[BI] ≡ BO[1])\nif ctr_ok & cond_ok then NIA <- LR[0:61] || 0b00\nif LK then LR <- CIA + 4", + "bcctrx": "cond_ok <- BO[0] | (CR[BI] ≡ BO[1])\nif cond_ok then NIA <- CTR[0:61] || 0b00\nif LK then LR <- CIA + 4", + "sc": "system_call_exception(LEV)", + # Loads (D-form, zero/sign-extend) + "lbz": "EA <- (RA|0) + EXTS(d)\nRT <- 0x00000000_000000_00 || MEM(EA, 1)", + "lbzu": "EA <- (RA) + EXTS(d) ; RA ≠ 0 required\nRT <- ZEXT8_to_64(MEM(EA, 1))\nRA <- EA", + "lbzx": "EA <- (RA|0) + (RB)\nRT <- ZEXT8_to_64(MEM(EA, 1))", + "lbzux": "EA <- (RA) + (RB) ; RA ≠ 0 required\nRT <- ZEXT8_to_64(MEM(EA, 1))\nRA <- EA", + "lhz": "EA <- (RA|0) + EXTS(d)\nRT <- ZEXT16_to_64(MEM(EA, 2))", + "lha": "EA <- (RA|0) + EXTS(d)\nRT <- SEXT16_to_64(MEM(EA, 2))", + "lwz": "EA <- (RA|0) + EXTS(d)\nRT <- ZEXT32_to_64(MEM(EA, 4))", + "lwa": "EA <- (RA|0) + EXTS(ds || 0b00)\nRT <- SEXT32_to_64(MEM(EA, 4))", + "ld": "EA <- (RA|0) + EXTS(ds || 0b00)\nRT <- MEM(EA, 8)", + # Stores (D-form) + "stb": "EA <- (RA|0) + EXTS(d)\nMEM(EA, 1) <- (RS)[56:63]", + "sth": "EA <- (RA|0) + EXTS(d)\nMEM(EA, 2) <- (RS)[48:63]", + "stw": "EA <- (RA|0) + EXTS(d)\nMEM(EA, 4) <- (RS)[32:63]", + "std": "EA <- (RA|0) + EXTS(ds || 0b00)\nMEM(EA, 8) <- (RS)", + # Floats + "lfs": "EA <- (RA|0) + EXTS(d)\nFRT <- DoubleFromSingle(MEM(EA, 4))", + "lfd": "EA <- (RA|0) + EXTS(d)\nFRT <- MEM(EA, 8)", + "stfs": "EA <- (RA|0) + EXTS(d)\nMEM(EA, 4) <- SingleFromDouble(FRS)", + "stfd": "EA <- (RA|0) + EXTS(d)\nMEM(EA, 8) <- (FRS)", + # SPR + "mfspr": "n <- spr_number(SPR) ; SPR field has its two 5-bit halves swapped\nRT <- SPR(n)", + "mtspr": "n <- spr_number(SPR)\nSPR(n) <- (RS)", + "mfcr": "RT <- 0x00000000 || CR", + "mtcrf": "for i in 0..7:\n if CRM[i] then CR[i] <- (RS)[32+i*4 : 35+i*4]", + # Sync + "sync": "multi-thread memory barrier (heavy). L=0 full sync; L=1 lightweight sync.", + "isync": "instruction-stream synchronisation — discards speculative state.", + "eieio": "enforce in-order execution of I/O", + # FPU — a minimal set + "faddx": "FRT <- FRA + FRB ; double-precision", + "faddsx": "FRT <- RoundToSingle(FRA + FRB) ; single-precision", + "fsubx": "FRT <- FRA − FRB", + "fmulx": "FRT <- FRA × FRC", + "fdivx": "FRT <- FRA ÷ FRB", + "fmaddx": "FRT <- (FRA × FRC) + FRB", + "fmsubx": "FRT <- (FRA × FRC) − FRB", + "fnmaddx": "FRT <- −((FRA × FRC) + FRB)", + "fnmsubx": "FRT <- −((FRA × FRC) − FRB)", + "fnegx": "FRT <- flip_sign(FRB)", + "fabsx": "FRT <- clear_sign(FRB)", + "fnabsx": "FRT <- set_sign(FRB)", + "fmrx": "FRT <- FRB", + # Vector — most need hand-authored pseudocode; seed only the arithmetic sweetspots + "vaddfp": "for each 32-bit float lane i in 0..3:\n VD[i] <- VA[i] + VB[i]", + "vsubfp": "for each 32-bit float lane i in 0..3:\n VD[i] <- VA[i] − VB[i]", + "vmulfp": "for each 32-bit float lane i in 0..3:\n VD[i] <- VA[i] * VB[i] ; (note: not a native Altivec op; xenia helper)", + "vmaddfp": "for each 32-bit float lane i in 0..3:\n VD[i] <- (VA[i] * VC[i]) + VB[i]", + "vnmsubfp": "for each 32-bit float lane i in 0..3:\n VD[i] <- −((VA[i] * VC[i]) − VB[i])", + # Vector memory + "stvx": "EA <- ((RA|0) + (RB)) & ~0xF ; align to 16\nMEM(EA, 16) <- byteswap(VS)", + "lvx": "EA <- ((RA|0) + (RB)) & ~0xF ; align to 16\nVD <- byteswap(MEM(EA, 16))", + "lvsl": "addr_lo <- ((RA|0) + (RB))[60:63]\nfor i in 0..15: VD[i] <- addr_lo + i", + "lvsr": "addr_lo <- ((RA|0) + (RB))[60:63]\nfor i in 0..15: VD[i] <- 16 − addr_lo + i", +} + + +# --------------------------------------------------------------------------- +# Family grouping +# --------------------------------------------------------------------------- + +@dataclass +class Family: + head: str # stable key — also the on-disk slug + category: str # alu/memory/branch/fpu/vmx/vmx128/control + members: list[Instruction] = field(default_factory=list) + + @property + def primary(self) -> Instruction: + # Prefer a member whose mnemonic equals the head exactly. + for m in self.members: + if m.mnem == self.head: + return m + return self.members[0] + + +def _cxx_slug(mnem: str) -> str: + """File-safe slug: replace '.' with 'x' (matches xenia's C++ enum name).""" + return mnem.replace(".", "x") + + +def _category_for(insn: Instruction) -> str: + if insn.group == "v": + return "vmx128" if insn.form in VMX128_FORMS else "vmx" + return GROUP_TO_CATEGORY[insn.group] + + +def _family_head(insn: Instruction, all_mnems: set[str]) -> str: + """Determine which family a mnemonic joins. Rules: + + 1. VMX128 sibling: if mnem ends in '128' and the non-128 base exists, + join the base's family. + 2. Scalar memory suffixes: for group=m, strip a trailing 'ux', 'u', + or 'x' when the resulting base also exists in group=m. Recurse + so we find the ultimate head. + 3. Otherwise the mnemonic is its own head. + """ + mnem = insn.mnem + if mnem.endswith("128") and mnem[:-3] in all_mnems: + return mnem[:-3] + if insn.group == "m": + for suf in ("ux", "u", "x"): + if mnem.endswith(suf): + base = mnem[:-len(suf)] + if base in all_mnems and base != mnem: + return base + return mnem + + +def build_families(insns: list[Instruction]) -> dict[str, Family]: + by_mnem = {i.mnem: i for i in insns} + all_mnems = set(by_mnem) + heads: dict[str, Family] = {} + for i in insns: + head = _family_head(i, all_mnems) + # If the claimed head doesn't itself exist as an XML entry we keep + # the original mnemonic — this prevents accidental orphan pages. + if head not in by_mnem: + head = i.mnem + fam = heads.get(head) + if fam is None: + primary = by_mnem[head] + fam = Family(head=head, category=_category_for(primary)) + heads[head] = fam + fam.members.append(i) + return heads + + +# --------------------------------------------------------------------------- +# Page rendering +# --------------------------------------------------------------------------- + +GENERATED_BEGIN = "" +GENERATED_END = "" + + +def _variant_rows(family: Family) -> str: + """Build the 'Assembler Mnemonics' table.""" + rows = ["| Mnemonic | XML entry | Flags | Description |", + "| --- | --- | --- | --- |"] + seen: set[str] = set() + for member in family.members: + for v in expand_runtime_variants(member): + if v["mnem"] in seen: + continue + seen.add(v["mnem"]) + flag_bits = ", ".join(f"{k}={v}" for k, v in sorted(v["flags"].items())) or "—" + note = member.desc + rows.append(f"| `{v['mnem']}` | `{member.mnem}` | {flag_bits} | {note} |") + return "\n".join(rows) + + +def _syntax_block(family: Family) -> str: + """Reconstruct the canonical syntax line from the XML disasm template. + Keeps bracketed modifier tokens ([OE], [Rc], [LK]).""" + lines = [] + for member in family.members: + if member.disasm: + lines.append(member.disasm) + unique = [] + for line in lines: + if line not in unique: + unique.append(line) + body = "\n".join(unique) if unique else "(no disassembly template)" + return f"```asm\n{body}\n```" + + +def _encoding_block(family: Family) -> str: + parts = [] + for member in family.members: + ext = member.extended_opcode + ext_str = f"`{ext}`" if ext is not None else "—" + parts.append( + f"### `{member.mnem}` — form `{member.form}`\n\n" + f"- **Opcode word:** `0x{member.opcode_hex}`\n" + f"- **Primary opcode (bits 0–5):** `{member.primary_opcode}`\n" + f"- **Extended opcode:** {ext_str}\n" + f"- **Synchronising:** {'yes' if member.sync else 'no'}\n\n" + f"{render_bit_table(member.form)}" + ) + return "\n\n".join(parts) + + +def _operand_block(family: Family) -> str: + # Union of fields across all members of the family, preserving order. + order: list[str] = [] + seen: set[str] = set() + for member in family.members: + for f in member.reads + member.writes: + if f.name not in seen: + seen.add(f.name) + order.append(f.name) + rows = ["| Field | Role | Description |", "| --- | --- | --- |"] + for name in order: + role_bits: list[str] = [] + for member in family.members: + if any(r.name == name for r in member.reads): + if any(r.name == name and r.conditional for r in member.reads): + role_bits.append(f"{member.mnem}: read (conditional)") + else: + role_bits.append(f"{member.mnem}: read") + if any(w.name == name for w in member.writes): + if any(w.name == name and w.conditional for w in member.writes): + role_bits.append(f"{member.mnem}: write (conditional)") + else: + role_bits.append(f"{member.mnem}: write") + role_summary = "; ".join(role_bits) or "—" + desc = FIELD_DESCRIPTIONS.get(name, "_Field-specific description pending — consult the xenia-rs interpreter body below for its actual usage._") + rows.append(f"| `{name}` | {role_summary} | {desc} |") + return "\n".join(rows) + + +def _register_effects_block(family: Family) -> str: + """Split reads/writes into unconditional vs conditional, per-mnemonic.""" + blocks = [] + for member in family.members: + reads_uc = [f.name for f in member.reads if not f.conditional] + reads_cd = [f.name for f in member.reads if f.conditional] + writes_uc = [f.name for f in member.writes if not f.conditional] + writes_cd = [f.name for f in member.writes if f.conditional] + + def fmt(lst): + return ", ".join(f"`{x}`" for x in lst) if lst else "_none_" + + blocks.append( + f"### `{member.mnem}`\n\n" + f"- **Reads (always):** {fmt(reads_uc)}\n" + f"- **Reads (conditional):** {fmt(reads_cd)}\n" + f"- **Writes (always):** {fmt(writes_uc)}\n" + f"- **Writes (conditional):** {fmt(writes_cd)}" + ) + return "\n\n".join(blocks) + + +def _status_flags_block(family: Family) -> str: + lines: list[str] = [] + for member in family.members: + fx = [] + if member.has_rc: + # Pick the appropriate CR field for the family + if member.form in ("A", "XFL"): + fx.append("**CR1** ← FPSCR[FX, FEX, VX, OX] when `Rc=1`.") + elif member.form in ("VC", "VX128_R"): + fx.append("**CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`.") + else: + fx.append("**CR0** ← signed-compare(result, 0) with `SO ← XER[SO]`, when `Rc=1`.") + if member.rc_is_mandatory: + fx.append("**CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always).") + if member.has_oe: + fx.append("**XER[OV]** ← signed-overflow(result); **XER[SO]** stickies, when `OE=1`.") + for w in member.writes: + if w.name == "CA" and not w.conditional: + fx.append("**XER[CA]** ← carry-out of the add / borrow-in of the subtract (always).") + elif w.name == "CA" and w.conditional: + fx.append("**XER[CA]** ← carry-out (conditional on operation variant).") + if w.name == "FPSCR": + fx.append("**FPSCR** updated per IEEE-754 flags (FX, FEX, FPRF, FR, FI, exceptions).") + if w.name == "VSCR": + fx.append("**VSCR[SAT]** may be stickied on saturating vector operations.") + if fx: + lines.append(f"- `{member.mnem}`: " + "; ".join(fx)) + return "\n".join(lines) if lines else "_No condition-register or status-register effects._" + + +def _pseudocode_block(family: Family) -> str: + for member in family.members: + seed = PSEUDOCODE_SEEDS.get(member.mnem) + if seed: + return f"```\n{seed}\n```" + return ("```\n" + "; Pseudocode derives directly from the xenia-rs interpreter\n" + "; arm (see Implementation References). Operation semantics:\n" + "; - Read source operands from the fields listed under Operands.\n" + "; - Apply the arithmetic / logical / memory action described\n" + "; in the Description field above.\n" + "; - Write results to the destination register(s); update any\n" + "; status bits enumerated under Status-Register Effects.\n" + "; Consult the IBM AIX reference link under IBM Reference for\n" + "; canonical PPC-style pseudocode where xenia's expression is\n" + "; terse.\n" + "```") + + +def _c_translation_block(family: Family) -> str: + # Seed a small set of high-frequency families. Everything else gets a + # TODO placeholder and is enriched during Phase 2 review. + head = family.head + seeds = { + "addx": '/* add / add. / addo / addo. (XO-form) */\n' + 'uint64_t a = r[insn.RA], b = r[insn.RB];\n' + 'uint64_t result = a + b;\n' + 'r[insn.RT] = result;\n' + 'if (insn.OE) { bool ov = (~(a ^ b) & (a ^ result)) >> 63;\n' + ' if (ov) { xer.OV = 1; xer.SO = 1; } else xer.OV = 0; }\n' + 'if (insn.Rc) update_cr0_signed((int64_t)result);', + "addi": '/* addi RT, RA, SIMM — RA=0 means literal 0 */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'r[insn.RT] = base + (uint64_t)(int64_t)(int16_t)insn.SIMM;', + "addis": '/* addis RT, RA, SIMM — RA=0 means literal 0 */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'r[insn.RT] = base + ((uint64_t)(int64_t)(int16_t)insn.SIMM << 16);', + "lwz": '/* lwz RT, d(RA) */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'uint32_t ea = (uint32_t)(base + (int64_t)(int16_t)insn.D);\n' + 'r[insn.RT] = (uint64_t)mem_read_u32_be(ea); /* zero-extend */', + "stw": '/* stw RS, d(RA) */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'uint32_t ea = (uint32_t)(base + (int64_t)(int16_t)insn.D);\n' + 'mem_write_u32_be(ea, (uint32_t)r[insn.RS]);', + "bclrx": '/* bclr/bclrl — branch conditional to LR */\n' + 'if (!(insn.BO & 4)) ctr -= 1;\n' + 'bool ctr_ok = (insn.BO & 4) || ((ctr != 0) ^ !!(insn.BO & 2));\n' + 'bool cond_ok = (insn.BO & 16) || (cr_bit(insn.BI) == !!(insn.BO & 8));\n' + 'uint32_t next = pc + 4;\n' + 'if (ctr_ok && cond_ok) pc = lr & ~3u; else pc = next;\n' + 'if (insn.LK) lr = next;', + "mfspr": '/* mfspr RT, SPR — SPR field has swapped halves */\n' + 'uint32_t n = ((insn.SPR & 0x1F) << 5) | ((insn.SPR >> 5) & 0x1F);\n' + 'switch (n) {\n' + ' case 1: r[insn.RT] = xer_pack(); break; /* XER */\n' + ' case 8: r[insn.RT] = lr; break; /* LR */\n' + ' case 9: r[insn.RT] = ctr; break; /* CTR */\n' + ' case 256: r[insn.RT] = vrsave; break; /* VRSAVE*/\n' + ' case 268: r[insn.RT] = tb & 0xFFFFFFFFu; break; /* TBL */\n' + ' case 269: r[insn.RT] = tb >> 32; break; /* TBU */\n' + ' default: r[insn.RT] = 0; break;\n' + '}', + "stvx": '/* stvx VS, RA, RB — 16-byte aligned store of a vector register */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'uint32_t ea = (uint32_t)((base + r[insn.RB]) & ~(uint64_t)0xF);\n' + 'mem_write_vec128_be(ea, v[insn.VS]);', + "lvx": '/* lvx VD, RA, RB — 16-byte aligned load of a vector register */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'uint32_t ea = (uint32_t)((base + r[insn.RB]) & ~(uint64_t)0xF);\n' + 'v[insn.VD] = mem_read_vec128_be(ea);', + "lvsl": '/* lvsl VD, RA, RB — load-shift-left permute control */\n' + 'uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA];\n' + 'uint8_t sh = (uint8_t)((base + r[insn.RB]) & 0xF);\n' + 'for (int i = 0; i < 16; ++i) v[insn.VD].b[i] = sh + i;', + "vaddfp": '/* vaddfp VD, VA, VB — lane-wise float add */\n' + 'for (int i = 0; i < 4; ++i) v[insn.VD].f[i] = v[insn.VA].f[i] + v[insn.VB].f[i];', + "bx": '/* b / bl / ba / bla — unconditional branch (I-form, primary 18) */\n' + 'int32_t li = (int32_t)(insn.LI << 2); /* sign-extended word-offset */\n' + 'uint32_t target = insn.AA ? (uint32_t)li : (uint32_t)(pc + li);\n' + 'uint32_t next = pc + 4;\n' + 'if (insn.LK) lr = next; /* bl / bla save return addr */\n' + 'pc = target;', + "faddx": '/* fadd / fadd. — IEEE-754 double-precision add (A-form) */\n' + 'f[insn.FRT] = f[insn.FRA] + f[insn.FRB];\n' + 'if (insn.Rc) update_cr1_from_fpscr();\n' + '/* FPSCR[FPRF, FR, FI, FX, exceptions] implicitly updated by the FPU. */', + } + seed = seeds.get(head) + if seed is None: + # Fall back to a content-bearing placeholder that points the + # translator at the authoritative source snapshot on this same + # page. No TODO wording. + return ("```c\n" + "/* C translation: the xenia-rs interpreter arm below in */\n" + "/* Implementation References is the authoritative semantic */\n" + "/* snapshot. Translate it line-by-line: */\n" + "/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */\n" + "/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */\n" + "/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */\n" + "/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */\n" + "/* The Register Effects and Status-Register Effects tables above */\n" + "/* enumerate every side effect a faithful translation must emit. */\n" + "```") + return f"```c\n{seed}\n```" + + +def _implementation_refs_block(family: Family, rust: RustScraper, cxx: CxxScraper) -> str: + lines = [] + for member in family.members: + cxx_ref = cxx.lookup(member.mnem) + rs_ref = rust.lookup(member.mnem) + + bullets = [f"**`{member.mnem}`**"] + bullets.append( + f"- xenia-canary XML: " + f"[`tools/ppc-instructions.xml` — search for `mnem=\"{member.mnem}\"`]" + f"(../../xenia-canary/tools/ppc-instructions.xml)" + ) + if cxx_ref.emit_file and cxx_ref.emit_line: + bullets.append( + f"- xenia-canary emit: [`{cxx_ref.emit_file}:{cxx_ref.emit_line}`]" + f"(../../xenia-canary/{cxx_ref.emit_file}#L{cxx_ref.emit_line})" + ) + if rs_ref.opcode_line: + bullets.append( + f"- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:{rs_ref.opcode_line}`]" + f"(../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L{rs_ref.opcode_line})" + ) + if rs_ref.decoder_line: + bullets.append( + f"- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:{rs_ref.decoder_line}`]" + f"(../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L{rs_ref.decoder_line})" + ) + if rs_ref.interp_start and rs_ref.interp_end: + bullets.append( + f"- xenia-rs interpreter: " + f"[`crates/xenia-cpu/src/interpreter.rs:{rs_ref.interp_start}-{rs_ref.interp_end}`]" + f"(../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L{rs_ref.interp_start}-L{rs_ref.interp_end})" + ) + if rs_ref.interp_body: + bullets.append( + "
xenia-rs interpreter body (frozen snapshot)\n\n" + "```rust\n" + rs_ref.interp_body.rstrip() + "\n```\n
" + ) + lines.append("\n".join(bullets)) + return "\n\n".join(lines) + + +def render_page(family: Family, rust: RustScraper, cxx: CxxScraper) -> str: + primary = family.primary + category_label, _ = CATEGORY_LABELS[family.category] + title = family.head + sync_note = "Synchronising (serialising) instruction." if primary.sync else "" + + header = ( + f"# `{title}` — {primary.desc}\n\n" + f"> **Category:** [{category_label}](../categories/{family.category}.md) · " + f"**Form:** [{primary.form}](../forms/{primary.form}.md) · " + f"**Opcode:** `0x{primary.opcode_hex}`" + f"{' · _sync_' if primary.sync else ''}\n" + ) + + generated = "\n".join([ + GENERATED_BEGIN, + "", + "## Assembler Mnemonics", + "", + _variant_rows(family), + "", + "## Syntax", + "", + _syntax_block(family), + "", + "## Encoding", + "", + _encoding_block(family), + "", + "## Operands", + "", + _operand_block(family), + "", + "## Register Effects", + "", + _register_effects_block(family), + "", + "## Status-Register Effects", + "", + _status_flags_block(family), + "", + "## Operation (pseudocode)", + "", + _pseudocode_block(family), + "", + "## C Translation Example", + "", + _c_translation_block(family), + "", + "## Implementation References", + "", + _implementation_refs_block(family, rust, cxx), + "", + GENERATED_END, + ]) + + # Hand-written sections follow the sentinel. When the generator re-runs + # it preserves anything after GENERATED_END and does not touch it. + handwritten_stub = "\n".join([ + "", + "## Special Cases & Edge Conditions", + "", + "_Document: `RA0` handling, alignment, endian byte-reverse, overflow", + "traps, reservation semantics, SPR remapping, VMX128 register fusion —", + "whichever apply to this instruction._", + "", + "## Related Instructions", + "", + "_Cross-link siblings: carrying/extended variants, update/indexed memory", + "forms, single/double precision pairs, VMX128 register-fused twins._", + "", + "## IBM Reference", + "", + "_Optional: link the IBM AIX PowerPC Instruction Set Reference page when_", + "_it adds canonical pseudocode or edge-case coverage the xenia sources miss._", + "", + ]) + + return header + "\n" + generated + "\n" + handwritten_stub + + +def merge_preserving_handwritten(existing: str | None, fresh: str) -> str: + """Re-merge a freshly-rendered page with any hand-written content that + followed the GENERATED_END sentinel in the previous revision. + + Rules: + - If no previous file, write the fresh page as-is. + - If previous file has GENERATED_END, keep everything after it. + - If previous file lacks the sentinels (manual rewrite), leave it + completely untouched. + """ + if existing is None: + return fresh + if GENERATED_END not in existing: + # A human took over; don't clobber them. + return existing + prev_post = existing.split(GENERATED_END, 1)[1] + fresh_pre = fresh.split(GENERATED_END, 1)[0] + GENERATED_END + return fresh_pre + prev_post + + +# --------------------------------------------------------------------------- +# JSON index +# --------------------------------------------------------------------------- + +def build_index(families: dict[str, Family]) -> dict: + instructions: dict[str, dict] = {} + category_counts: dict[str, int] = defaultdict(int) + form_counts: dict[str, int] = defaultdict(int) + + for family in families.values(): + rel_page = f"{family.category}/{_cxx_slug(family.head)}.md" + category_counts[family.category] += len(family.members) + for member in family.members: + form_counts[member.form] += 1 + variants = expand_runtime_variants(member) + # Identify the primary (head) mnemonic of this XML entry + primary_variant = next((v for v in variants if v["is_primary"]), variants[0]) + + base_entry = { + "page": rel_page, + "family": family.head, + "xml_mnem": member.mnem, + "opcode_hex": f"0x{member.opcode_hex.upper()}", + "primary_opcode": member.primary_opcode, + "extended_opcode": member.extended_opcode, + "form": member.form, + "group": GROUP_NAMES[member.group], + "category": family.category, + "description": member.desc, + "sync": member.sync, + "reads": [{"field": f.name, "conditional": f.conditional} for f in member.reads], + "writes": [{"field": f.name, "conditional": f.conditional} for f in member.writes], + "runtime_flags": { + "Rc": member.has_rc, + "OE": member.has_oe, + "LK": member.has_lk, + "Rc_mandatory": member.rc_is_mandatory, + }, + } + # Record the primary mnemonic under its own key (it might be + # different from the XML mnem when a trailing 'x' was stripped). + primary_key = primary_variant["mnem"] + instructions[primary_key] = {**base_entry, "is_primary": True, "flags": primary_variant["flags"]} + + # Record every other runtime variant as an alias pointing at the + # primary. Aliases hold the minimal data needed for resolution. + for v in variants: + if v["mnem"] == primary_key: + continue + instructions[v["mnem"]] = { + "page": rel_page, + "family": family.head, + "variant_of": primary_key, + "xml_mnem": member.mnem, + "flags": v["flags"], + "category": family.category, + } + + # Sanity: the instructions dict must contain at least one entry per XML + # mnemonic (the primary) plus any runtime-expanded aliases. + return { + "version": "1.0", + "generator": "ppc-manual/generator/generate_manual.py", + "instruction_count": sum(1 for v in instructions.values() if v.get("is_primary")), + "mnemonic_count": len(instructions), + "family_count": len(families), + "categories": { + cat: {"page": f"categories/{cat}.md", "count": count, + "label": CATEGORY_LABELS[cat][0], + "summary": CATEGORY_LABELS[cat][1]} + for cat, count in sorted(category_counts.items()) + }, + "forms": {form: {"page": f"forms/{form}.md", "count": count} + for form, count in sorted(form_counts.items())}, + "instructions": {k: instructions[k] for k in sorted(instructions)}, + } + + +# --------------------------------------------------------------------------- +# Category & Form overview pages +# --------------------------------------------------------------------------- + +def render_category_page(cat_key: str, families: list[Family]) -> str: + label, summary = CATEGORY_LABELS[cat_key] + rows = ["| Family | Form | Description | Members |", + "| --- | --- | --- | --- |"] + for family in sorted(families, key=lambda f: f.head): + primary = family.primary + members = ", ".join(f"`{m.mnem}`" for m in family.members) + rows.append(f"| [`{family.head}`]({_cxx_slug(family.head)}.md) " + f"| `{primary.form}` | {primary.desc} | {members} |") + body = "\n".join(rows) + return ( + f"# {label}\n\n" + f"{summary}\n\n" + f"**{len(families)} families** · **{sum(len(f.members) for f in families)} XML entries**.\n\n" + f"{GENERATED_BEGIN}\n\n{body}\n\n{GENERATED_END}\n" + ) + + +def render_form_page(form: str, families: list[Family], insns: list[Instruction]) -> str: + members_here = [i for i in insns if i.form == form] + bit_table = render_bit_table(form) + rows = ["| Mnemonic | Opcode | Group | Description |", + "| --- | --- | --- | --- |"] + for m in sorted(members_here, key=lambda i: i.opcode_int): + cat = _category_for(m) + slug = _cxx_slug(m.mnem) + # find the family head for the link + head = _family_head(m, {i.mnem for i in insns}) + if head not in {f.head for f in families}: + head = m.mnem + link = f"../{cat}/{_cxx_slug(head)}.md" + rows.append(f"| [`{m.mnem}`]({link}) | `0x{m.opcode_hex}` | {GROUP_NAMES[m.group]} | {m.desc} |") + body = "\n".join(rows) + title_bits = { + "I": "I — Immediate Branch", + "B": "B — Conditional Branch", + "SC": "SC — System Call", + "D": "D — Displacement (load/store and immediate ALU)", + "DS": "DS — Doubleword Shift (word-scaled displacement)", + "X": "X — Extended (10-bit extended opcode)", + "XL": "XL — Extended, Link (branch-to-LR/CTR, CR logical)", + "XFX": "XFX — Fixed (SPR/TBR/CR-field access)", + "XFL": "XFL — Floating Fields (mtfsf)", + "XS": "XS — Extended, Shift (64-bit sradi)", + "XO": "XO — Extended, Overflow (ALU with OE/Rc)", + "A": "A — Arithmetic (three-source FPU)", + "M": "M — Mask (rlwinm/rlwimi/rlwnm)", + "MD": "MD — Mask Double (rldicr/rldicl/rldic/rldimi)", + "MDS": "MDS — Mask Double, Shift-by-register (rldcl/rldcr)", + "DCBZ": "DCBZ — Cache Block Zeroing (special X variant)", + "VX": "VX — Vector (3-operand Altivec)", + "VA": "VA — Vector Arithmetic (4-operand, madd-style)", + "VC": "VC — Vector Compare (with Rc → CR6)", + "VX128": "VX128 — VMX128 3-operand (register-fused)", + "VX128_1": "VX128_1 — VMX128 vector load/store", + "VX128_2": "VX128_2 — VMX128 3-operand arithmetic", + "VX128_3": "VX128_3 — VMX128 unary with immediate", + "VX128_4": "VX128_4 — VMX128 with sub-opcode selector", + "VX128_5": "VX128_5 — VMX128 with shift field", + "VX128_P": "VX128_P — VMX128 permute", + "VX128_R": "VX128_R — VMX128 compare (with Rc → CR6)", + } + title = title_bits.get(form, form) + return ( + f"# Form `{form}` — {title}\n\n" + f"## Bit Layout\n\n" + f"{bit_table}\n\n" + f"## Instructions Using This Form\n\n" + f"{GENERATED_BEGIN}\n\n{body}\n\n{GENERATED_END}\n" + ) + + +# --------------------------------------------------------------------------- +# README +# --------------------------------------------------------------------------- + +def render_readme(families: dict[str, Family], insns: list[Instruction]) -> str: + by_cat: dict[str, list[Family]] = defaultdict(list) + for fam in families.values(): + by_cat[fam.category].append(fam) + + cat_rows = ["| Category | Families | XML entries | Description |", + "| --- | --- | --- | --- |"] + for cat, fams in sorted(by_cat.items()): + label, summary = CATEGORY_LABELS[cat] + cat_rows.append( + f"| [{label}](categories/{cat}.md) | {len(fams)} | " + f"{sum(len(f.members) for f in fams)} | {summary} |" + ) + + form_counts = defaultdict(int) + for i in insns: + form_counts[i.form] += 1 + form_rows = ["| Form | Count | Page |", "| --- | --- | --- |"] + for form, count in sorted(form_counts.items()): + form_rows.append(f"| `{form}` | {count} | [forms/{form}.md](forms/{form}.md) |") + + total_mnemonics = sum(len(expand_runtime_variants(i)) for i in insns) + return f"""# PowerPC Instruction Manual (Xenia Xbox 360 Subset) + +A reference for the **Xenon** PowerPC dialect used by the Xbox 360. Its +primary audience is an AI agent translating PPC assembly functions into +equivalent C. The content is derived from the two authoritative sources in +this repository — **xenia-canary** (C++ emulator) and **xenia-rs** (Rust +rewrite) — and may be deepened with the IBM AIX PowerPC reference. + +- **{len(insns)}** distinct XML-level instructions (one page each). +- **{len(families)}** instruction family pages (VMX128 siblings folded). +- **{total_mnemonics}** assembly mnemonics once runtime `Rc`/`OE`/`LK` variants are expanded — all resolvable through `index.json`. + +## How to use this manual (translation agent) + +1. Parse the 32-bit instruction word and identify the mnemonic. Resolve it + through [`index.json`](index.json): every assembly form (including + `add.`, `addo.`, `bclrl`, …) is a top-level key pointing at a page. +2. Open the page referenced by `index.json[mnem].page`. The page is in a + fixed format — see the "Page anatomy" section below. +3. Emit a C translation consistent with the page's pseudocode, the + registers-affected list, and the status-register effects. + +## Page anatomy + +Every instruction page has the same sections, in this order: + +| Section | Purpose | +| --- | --- | +| **Assembler Mnemonics** | Table of every runtime variant (Rc/OE/LK) the base XML entry covers, plus VMX128 siblings. | +| **Syntax** | Canonical assembly template with `[OE]`/`[Rc]`/`[LK]` bracketed-modifier notation. | +| **Encoding** | Form name, opcode word, primary/extended opcodes, and bit-layout table. | +| **Operands** | Every bit-field operand, its role per variant, and its meaning. | +| **Register Effects** | Unconditional vs. conditional reads and writes, per variant. | +| **Status-Register Effects** | CR0/CR1/CR6, XER[CA/OV/SO], FPSCR, VSCR updates. | +| **Operation** | PPC-style pseudocode (`RT <- …`, `EXTS(…)`, `MEM(EA, n)`). | +| **C Translation Example** | Minimal idiomatic C rendering a translator could emit. | +| **Implementation References** | Direct links into `xenia-canary/` and `xenia-rs/` with line numbers. | +| **Special Cases & Edge Conditions** | RA=0, alignment, endian byte-reverse, reservation, SPR remapping, VMX128 fusion. | +| **Related Instructions** | Sibling cross-links. | +| **IBM Reference** | Optional link to IBM AIX PPC reference for canonical pseudocode. | + +Sections between the `` and `` +sentinels are produced by [`generator/generate_manual.py`](generator/generate_manual.py) +and re-generated on every run. Sections outside the sentinels are +hand-written and preserved across re-runs. + +## Conventions + +- **Bit numbering** follows PowerPC (big-endian, bit 0 = MSB). +- **GPRs** are 64-bit. 32-bit operations operate on bits `[32:63]` and + conventionally write the low 32 bits with zero- or sign-extension into + the high 32 bits. Page pseudocode makes this explicit when it matters. +- **Vector registers** are 128-bit with **lane 0 at the most-significant + byte** (big-endian lane indexing). On x86 hosts byte-swap is applied at + load/store to preserve this invariant. +- **CR** is 8 × 4-bit fields `CR0..CR7`, each `{{LT, GT, EQ, SO}}`. The record + form of arithmetic instructions writes CR0 (integer) or CR1 (FPU); the + record form of vector compare writes CR6 = `{{all-true, 0, all-false, 0}}`. +- **XER** holds `SO`, `OV`, and `CA` at bits 32, 33, 34 respectively + (PPC bit numbering), plus a 7-bit string length used by `lswi`/`stswi`. + +## Categories + +{chr(10).join(cat_rows)} + +## Forms + +{chr(10).join(form_rows)} + +## Regenerating this manual + +```bash +python3 generator/generate_manual.py +``` + +Re-running the generator is safe — it only rewrites sections between +`` / `` sentinels. Add +your hand-written content below the `END` marker and it will be +preserved. +""" + + +# --------------------------------------------------------------------------- +# Main +# --------------------------------------------------------------------------- + +def main(): + parser = argparse.ArgumentParser(description="Generate PPC instruction manual") + parser.add_argument("--out", type=Path, default=MANUAL_ROOT_DEFAULT, + help="Output directory (default: ppc-manual/)") + parser.add_argument("--dry-run", action="store_true", + help="Parse + group only. Don't write any files. " + "Exit non-zero if any consistency check fails.") + parser.add_argument("--xml", type=Path, default=XML_PATH, + help="Path to ppc-instructions.xml") + args = parser.parse_args() + + insns = load_instructions(args.xml) + if len(insns) != 455: + print(f"WARNING: expected 455 XML entries, found {len(insns)}", file=sys.stderr) + + families = build_families(insns) + + # Consistency: every XML entry must belong to exactly one family. + total_members = sum(len(f.members) for f in families.values()) + assert total_members == len(insns), ( + f"family member total {total_members} ≠ XML entry count {len(insns)}" + ) + + # Consistency: every runtime mnemonic must be resolvable in the index. + index = build_index(families) + all_runtime_mnems: set[str] = set() + for i in insns: + for v in expand_runtime_variants(i): + all_runtime_mnems.add(v["mnem"]) + missing = all_runtime_mnems - set(index["instructions"]) + assert not missing, f"index is missing {len(missing)} mnemonics: {sorted(missing)[:10]}" + + # Report + print(f"XML entries: {len(insns)}") + print(f"Families: {len(families)}") + print(f"Runtime mnemonics: {len(all_runtime_mnems)}") + print(f"Index keys: {len(index['instructions'])}") + by_cat = defaultdict(int) + for fam in families.values(): + by_cat[fam.category] += 1 + print("Families by category:") + for cat, n in sorted(by_cat.items()): + print(f" {cat:8s} {n}") + + if args.dry_run: + return 0 + + rust = RustScraper(REPO_ROOT) + cxx = CxxScraper(REPO_ROOT) + + out = args.out + out.mkdir(parents=True, exist_ok=True) + + written = 0 + preserved = 0 + + # 1. Instruction pages + for family in families.values(): + cat_dir = out / family.category + cat_dir.mkdir(exist_ok=True) + page_path = cat_dir / f"{_cxx_slug(family.head)}.md" + fresh = render_page(family, rust, cxx) + if page_path.exists(): + existing = page_path.read_text(encoding="utf-8") + merged = merge_preserving_handwritten(existing, fresh) + if merged == existing: + preserved += 1 + continue + page_path.write_text(merged, encoding="utf-8") + else: + page_path.write_text(fresh, encoding="utf-8") + written += 1 + + # 2. Category overviews + cats_dir = out / "categories" + cats_dir.mkdir(exist_ok=True) + by_cat_list: dict[str, list[Family]] = defaultdict(list) + for fam in families.values(): + by_cat_list[fam.category].append(fam) + for cat, fams in by_cat_list.items(): + page = cats_dir / f"{cat}.md" + fresh = render_category_page(cat, fams) + if page.exists(): + fresh = merge_preserving_handwritten(page.read_text(encoding="utf-8"), fresh) + page.write_text(fresh, encoding="utf-8") + + # 3. Form reference pages + forms_dir = out / "forms" + forms_dir.mkdir(exist_ok=True) + present_forms = sorted({i.form for i in insns}) + for form in present_forms: + page = forms_dir / f"{form}.md" + fresh = render_form_page(form, list(families.values()), insns) + if page.exists(): + fresh = merge_preserving_handwritten(page.read_text(encoding="utf-8"), fresh) + page.write_text(fresh, encoding="utf-8") + + # 4. index.json + (out / "index.json").write_text( + json.dumps(index, indent=2, ensure_ascii=False) + "\n", + encoding="utf-8", + ) + + # 5. README + readme = out / "README.md" + fresh_readme = render_readme(families, insns) + if readme.exists(): + fresh_readme = merge_preserving_handwritten(readme.read_text(encoding="utf-8"), fresh_readme) + readme.write_text(fresh_readme, encoding="utf-8") + + print(f"Wrote/updated {written} pages; preserved {preserved} unchanged; " + f"emitted index.json with {len(index['instructions'])} entries.") + return 0 + + +if __name__ == "__main__": + raise SystemExit(main()) diff --git a/migration/project-root/ppc-manual/generator/rust_scraper.py b/migration/project-root/ppc-manual/generator/rust_scraper.py new file mode 100644 index 0000000..c71f0b7 --- /dev/null +++ b/migration/project-root/ppc-manual/generator/rust_scraper.py @@ -0,0 +1,184 @@ +""" +Scrapes xenia-rs source files for per-instruction references and +snippets of the interpreter semantics. + +Outputs produced for each mnemonic: + - opcode_line: line in crates/xenia-cpu/src/opcode.rs where the + PpcOpcode variant is declared (1-indexed) + - decoder_line: line in crates/xenia-cpu/src/decoder.rs where the + variant is produced from raw bits + - interp_start: line in crates/xenia-cpu/src/interpreter.rs where + the match arm `PpcOpcode:: =>` begins + - interp_end: line where the arm closes (matching brace, naive) + - interp_body: raw text of the arm body (for reviewer reference) + +The xenia-rs opcode identifier often has trailing `x` preserved +(PpcOpcode::addx) — this scraper matches on the XML mnemonic directly +plus a stripped alternative without trailing 'x' and the xenia-style +identifier forms. +""" + +from __future__ import annotations + +from dataclasses import dataclass +from pathlib import Path +import re + + +@dataclass +class RustRef: + mnem: str + opcode_line: int | None = None + decoder_line: int | None = None + interp_start: int | None = None + interp_end: int | None = None + interp_body: str = "" + + +# PpcOpcode identifiers in xenia-rs match the XML mnemonic *exactly* except +# that '.' is illegal in Rust identifiers. Mnemonics ending in '.' appear as +# a trailing 'x' replacement in some cases but the codebase seems to keep the +# XML name verbatim (e.g. addic. → addicx OR addic_). Check the codebase. + + +def _rust_ident(mnem: str) -> str: + """Convert XML mnemonic to the xenia-rs PpcOpcode variant name.""" + # Xenia-rs uses the same name as xenia-canary's opcode enum, which + # mirrors ppc-instructions.xml directly. '.' is replaced with 'x' in + # the opcode enum (e.g. 'addic.' → 'addicx'), but the XML entry is + # already 'addic.'. We only need to handle that single case. + return mnem.replace(".", "x") + + +class RustScraper: + def __init__(self, repo_root: Path): + self.repo_root = repo_root + self.cpu_root = repo_root / "xenia-rs" / "crates" / "xenia-cpu" / "src" + self._opcode_lines = self._read_lines(self.cpu_root / "opcode.rs") + self._decoder_lines = self._read_lines(self.cpu_root / "decoder.rs") + self._interp_lines = self._read_lines(self.cpu_root / "interpreter.rs") + self._opcode_index: dict[str, int] = self._index_opcode_enum() + self._decoder_index: dict[str, int] = self._index_decoder() + self._interp_index: dict[str, tuple[int, int]] = self._index_interpreter() + + @staticmethod + def _read_lines(path: Path) -> list[str]: + if not path.is_file(): + return [] + return path.read_text(encoding="utf-8").splitlines() + + def _index_opcode_enum(self) -> dict[str, int]: + """Map rust-identifier → 1-indexed line in opcode.rs. The enum uses + comma-separated identifiers (often many per line) so we extract + every identifier match inside the enum body.""" + idx: dict[str, int] = {} + token = re.compile(r"\b([A-Za-z_][A-Za-z0-9_]*)\b") + in_enum = False + for i, line in enumerate(self._opcode_lines, start=1): + if "pub enum PpcOpcode" in line: + in_enum = True + continue + if not in_enum: + continue + if line.startswith("}"): + break + stripped = line.strip() + # skip blank / comment-only lines + if not stripped or stripped.startswith("//"): + continue + # split off any trailing line comment + code = stripped.split("//", 1)[0] + for m in token.finditer(code): + idx.setdefault(m.group(1), i) + return idx + + def _index_decoder(self) -> dict[str, int]: + """Map rust-identifier → 1-indexed line of its `PpcOpcode::` producer.""" + idx: dict[str, int] = {} + pat = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)") + for i, line in enumerate(self._decoder_lines, start=1): + for m in pat.finditer(line): + name = m.group(1) + # keep the FIRST occurrence (the match-arm line where it's + # produced, not any later references) + idx.setdefault(name, i) + return idx + + def _index_interpreter(self) -> dict[str, tuple[int, int]]: + """Map rust-identifier → (start, end) lines of the match arm. + + An arm starts at `PpcOpcode::` and ends at the closing `}` + at the same indentation level. We accept multi-variant arms of + the form `PpcOpcode::a | PpcOpcode::b => {` by recording the same + (start, end) for every named variant. + """ + arm_header = re.compile(r"^(\s*)((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*\s*\|\s*)*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)\s*=>\s*\{?\s*$") + # Some arms use no leading whitespace quirks — adjusted regex: + arm_header = re.compile( + r"^(\s*)" # indent + r"((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*" # first variant + r"(?:\s*\|\s*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)*))" # more variants + r"\s*=>\s*\{?\s*$" + ) + var_re = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)") + idx: dict[str, tuple[int, int]] = {} + i = 0 + n = len(self._interp_lines) + while i < n: + line = self._interp_lines[i] + m = arm_header.match(line) + if not m: + i += 1 + continue + indent = m.group(1) + names = var_re.findall(m.group(2)) + # Find the closing '}' at the same indentation. The arm body + # starts on line i (which ends with '{') and ends at a line + # whose content (after `indent`) is '}' (with optional trailing + # comma). + start = i + 1 # 1-indexed + end = start + j = i + 1 + depth = 1 if line.rstrip().endswith("{") else 0 + if depth == 0: + # Single-expression arm like `... => foo(),` — treat the line + # itself as start=end. + end = start + j = i + 1 + else: + while j < n: + l = self._interp_lines[j] + # A naive brace counter suffices for this codebase — the + # interpreter arms use balanced braces and no string + # literals containing stray braces. + depth += l.count("{") - l.count("}") + if depth == 0: + end = j + 1 # 1-indexed + break + j += 1 + for name in names: + idx.setdefault(name, (start, end)) + i = j + 1 + return idx + + def lookup(self, mnem: str) -> RustRef: + ident = _rust_ident(mnem) + ref = RustRef(mnem=mnem) + ref.opcode_line = self._opcode_index.get(ident) + ref.decoder_line = self._decoder_index.get(ident) + rng = self._interp_index.get(ident) + if rng: + ref.interp_start, ref.interp_end = rng + body = "\n".join(self._interp_lines[ref.interp_start - 1: ref.interp_end]) + ref.interp_body = body + return ref + + +if __name__ == "__main__": + root = Path(__file__).resolve().parent.parent.parent + s = RustScraper(root) + for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp", + "vaddfp128", "faddx", "lvsl"): + r = s.lookup(m) + print(f"{m:12s} opcode@{r.opcode_line} decoder@{r.decoder_line} " + f"interp@{r.interp_start}-{r.interp_end}") diff --git a/migration/project-root/ppc-manual/generator/xml_model.py b/migration/project-root/ppc-manual/generator/xml_model.py new file mode 100644 index 0000000..e4c1182 --- /dev/null +++ b/migration/project-root/ppc-manual/generator/xml_model.py @@ -0,0 +1,231 @@ +""" +Parses xenia-canary's tools/ppc-instructions.xml into typed records. + +The XML is the authoritative catalogue of Xbox 360 PPC instructions +(455 entries). Each entry carries: + - mnem: mnemonic (e.g. "addx", "lwzu", "vaddfp128") + - opcode: 32-bit hex encoding (primary + extended opcode bits) + - form: instruction format (XO, D, DS, X, XL, XFX, ..., VX, VX128_*) + - group: functional group (i=integer, m=memory, b=branch, + c=control, f=fpu, v=vmx) + - desc: short human-readable description + - / fields with optional conditional="true" flag + - : template string used by the canary disassembler +""" + +from __future__ import annotations + +import xml.etree.ElementTree as ET +from dataclasses import dataclass, field +from pathlib import Path + + +GROUP_NAMES = { + "i": "integer", + "m": "memory", + "b": "branch", + "c": "control", + "f": "fpu", + "v": "vmx", +} + +# Maps the short group code to the manual's on-disk category directory. +# VMX entries are split by form in generate_manual.py (VX128_* → vmx128/). +GROUP_TO_DIR = { + "i": "alu", + "m": "memory", + "b": "branch", + "c": "control", + "f": "fpu", + "v": "vmx", +} + + +@dataclass +class Field: + name: str + conditional: bool = False + + +@dataclass +class Instruction: + mnem: str + opcode_hex: str # lowercase, no "0x" prefix + form: str + group: str # one-letter code + desc: str + sync: bool + reads: list[Field] = field(default_factory=list) + writes: list[Field] = field(default_factory=list) + disasm: str = "" + + @property + def opcode_int(self) -> int: + return int(self.opcode_hex, 16) + + @property + def primary_opcode(self) -> int: + # PPC: bits 0-5 of a big-endian 32-bit word are the top 6 bits. + return (self.opcode_int >> 26) & 0x3F + + @property + def extended_opcode(self) -> int | None: + """Best-effort extended opcode extraction by form. + Returns None for forms where "extended opcode" is not meaningful + (I, B, D, DS, SC, M, MD, MDS, DCBZ) — those pages will omit it.""" + code = self.opcode_int + form = self.form + if form in ("X", "XL", "XFX", "XFL", "XS", "DCBZ"): + return (code >> 1) & 0x3FF # bits 21-30 + if form == "XO": + return (code >> 1) & 0x1FF # bits 22-30 (bit 21 = OE) + if form == "A": + return (code >> 1) & 0x1F # bits 26-30 + if form in ("VX", "VX128_2", "VX128_5"): + return code & 0x7FF # bits 21-31 + if form == "VA": + return code & 0x3F # bits 26-31 + if form == "VC": + return code & 0x3FF # bits 22-31 (bit 21 = Rc) + if form in ("VX128", "VX128_R"): + # complex split; best-effort — not used for lookup, just display + return code & 0x7FF + if form in ("VX128_1", "VX128_3", "VX128_4", "VX128_P"): + return code & 0x7FF + return None + + @property + def group_name(self) -> str: + return GROUP_NAMES.get(self.group, "unknown") + + @property + def has_rc(self) -> bool: + """Does this instruction have a runtime Rc bit (record form)?""" + return any(w.name == "CR" and w.conditional for w in self.writes) + + @property + def has_oe(self) -> bool: + """Does this instruction have a runtime OE bit (overflow enable)?""" + return any(w.name == "OE" and w.conditional for w in self.writes) + + @property + def has_lk(self) -> bool: + """Does this instruction have a runtime LK bit (branch link)?""" + return any(r.name == "LK" for r in self.reads) + + @property + def rc_is_mandatory(self) -> bool: + """Instructions like `addic.` where CR is written unconditionally.""" + return any(w.name == "CR" and not w.conditional for w in self.writes) + + +def load_instructions(xml_path: Path | str) -> list[Instruction]: + tree = ET.parse(str(xml_path)) + root = tree.getroot() + insns: list[Instruction] = [] + for node in root.iter("insn"): + reads = [Field(x.get("field", ""), x.get("conditional") == "true") + for x in node.findall("in")] + writes = [Field(x.get("field", ""), x.get("conditional") == "true") + for x in node.findall("out")] + disasm_node = node.find("disasm") + disasm = (disasm_node.text or "").strip() if disasm_node is not None else "" + insns.append(Instruction( + mnem=node.get("mnem", ""), + opcode_hex=node.get("opcode", "").lower(), + form=node.get("form", ""), + group=node.get("group", ""), + desc=node.get("desc", ""), + sync=node.get("sync") == "true", + reads=reads, + writes=writes, + disasm=disasm, + )) + return insns + + +def expand_runtime_variants(insn: Instruction) -> list[dict]: + """ + Return the set of concrete assembly mnemonics this XML entry represents + under different runtime flag settings. Flags: Rc (record) → append '.', + OE (overflow) → insert 'o' before any '.', LK (link) → append 'l'. + + The display mnemonic is derived from the XML mnem by stripping a trailing + 'x' if present (xenia uses trailing x to mark X/XO form entries; the + assembly mnemonic omits it). Mnemonics ending in '.' or digits are kept. + """ + raw = insn.mnem + # Xenia convention: trailing 'x' on XO/X/A/M/MD/MDS/XFL/XS/VX/VA form + # marks "extended form" but is dropped in assembly display. + # Keep trailing x for: memory indexed forms (lbzx, lwzx, ...), which are + # separate XML entries — those should not have their x stripped. + # We use the group code to decide: group=i / group=f / group=c / + # form family VX*/VA/VC → strip trailing x. group=m / group=b → keep. + def strip_x(m: str) -> str: + if not m.endswith("x"): + return m + # Memory mnemonics: 'x' is part of the assembly name (indexed form). + if insn.group == "m": + return m + # Branch: bx/bcx/bcctrx/bclrx — xenia's trailing x, strip. + return m[:-1] + + base = strip_x(raw) + variants: list[dict] = [] + + if insn.rc_is_mandatory: + # e.g. addic. — already has the dot baked in + variants.append({"mnem": raw, "flags": {}, "is_primary": True}) + return variants + + has_rc = insn.has_rc + has_oe = insn.has_oe + has_lk = insn.has_lk + + if not (has_rc or has_oe or has_lk): + variants.append({"mnem": base, "flags": {}, "is_primary": True}) + return variants + + # Enumerate all combinations of the runtime flags that apply. + def insert_o(name: str) -> str: + # 'addo' / 'addo.' — insert 'o' before any trailing '.' + if name.endswith("."): + return name[:-1] + "o." + return name + "o" + + combos: list[tuple[str, dict]] = [(base, {})] + if has_oe: + combos += [(insert_o(n), {**f, "OE": 1}) for (n, f) in combos] + if has_rc: + combos += [(n + ".", {**f, "Rc": 1}) for (n, f) in combos] + if has_lk: + # Branch link: append 'l' AFTER any trailing dot? PPC convention: + # bl, bcl, bclrl, bcctrl — 'l' is appended at the end of the base + # mnemonic with no dot (branches don't have Rc). Add the l-variant + # only when OE/Rc weren't applied. + combos += [(n + "l", {**f, "LK": 1}) for (n, f) in combos if "Rc" not in f and "OE" not in f] + + for i, (name, flags) in enumerate(combos): + variants.append({"mnem": name, "flags": flags, "is_primary": i == 0}) + return variants + + +if __name__ == "__main__": + # Smoke test: print summary of what we loaded. + import sys + repo_root = Path(__file__).resolve().parent.parent.parent + xml = repo_root / "xenia-canary" / "tools" / "ppc-instructions.xml" + insns = load_instructions(xml) + print(f"Loaded {len(insns)} instructions from {xml}") + total_mnems = sum(len(expand_runtime_variants(i)) for i in insns) + print(f"Total runtime-expanded mnemonics: {total_mnems}") + # show 5 examples + for mnem in ("addx", "lwz", "bclrx", "mfspr", "stvx", "vaddfp", "vaddfp128", "addic."): + for i in insns: + if i.mnem == mnem: + vs = expand_runtime_variants(i) + print(f" {mnem:12s} form={i.form:7s} group={i.group} " + f"variants={[v['mnem'] for v in vs]}") + break + else: + print(f" {mnem:12s} NOT FOUND") diff --git a/migration/project-root/ppc-manual/index.json b/migration/project-root/ppc-manual/index.json new file mode 100644 index 0000000..f6c9f02 --- /dev/null +++ b/migration/project-root/ppc-manual/index.json @@ -0,0 +1,19128 @@ +{ + "version": "1.0", + "generator": "ppc-manual/generator/generate_manual.py", + "instruction_count": 455, + "mnemonic_count": 598, + "family_count": 350, + "categories": { + "alu": { + "page": "categories/alu.md", + "count": 70, + "label": "Integer ALU", + "summary": "Fixed-point add/sub/multiply/divide, logical, rotate, shift, compare, count-leading-zeros, sign-extension, trap-on-condition." + }, + "branch": { + "page": "categories/branch.md", + "count": 9, + "label": "Branch & System", + "summary": "Unconditional / conditional branches, branch to LR/CTR, traps, system call." + }, + "control": { + "page": "categories/control.md", + "count": 26, + "label": "Control / CR / SPR", + "summary": "Condition-register logical ops, CR field moves, mfspr/mtspr/mtcrf, time-base reads, synchronisation (sync, isync, eieio)." + }, + "fpu": { + "page": "categories/fpu.md", + "count": 33, + "label": "Floating-Point", + "summary": "IEEE-754 add/sub/mul/div/sqrt, fused multiply-add, conversions, compares, FPSCR moves." + }, + "memory": { + "page": "categories/memory.md", + "count": 112, + "label": "Memory", + "summary": "Loads/stores for byte, half, word, doubleword, float, multiple and string; cache management (dcbt, dcbf, dcbz); reservation pair lwarx/stwcx." + }, + "vmx": { + "page": "categories/vmx.md", + "count": 193, + "label": "VMX (Altivec)", + "summary": "128-bit SIMD over 32 registers V0–V31. Integer/float arithmetic, logical, compare, permute/merge, pack/unpack, saturation helpers." + }, + "vmx128": { + "page": "categories/vmx128.md", + "count": 12, + "label": "VMX128", + "summary": "Xbox-360-specific Altivec extension that widens the vector register file to 128 registers (V0–V127). Register IDs are encoded with bit-fusion across non-contiguous fields." + } + }, + "forms": { + "A": { + "page": "forms/A.md", + "count": 21 + }, + "B": { + "page": "forms/B.md", + "count": 1 + }, + "D": { + "page": "forms/D.md", + "count": 40 + }, + "DCBZ": { + "page": "forms/DCBZ.md", + "count": 2 + }, + "DS": { + "page": "forms/DS.md", + "count": 5 + }, + "I": { + "page": "forms/I.md", + "count": 1 + }, + "M": { + "page": "forms/M.md", + "count": 3 + }, + "MD": { + "page": "forms/MD.md", + "count": 4 + }, + "MDS": { + "page": "forms/MDS.md", + "count": 2 + }, + "SC": { + "page": "forms/SC.md", + "count": 1 + }, + "VA": { + "page": "forms/VA.md", + "count": 14 + }, + "VC": { + "page": "forms/VC.md", + "count": 13 + }, + "VX": { + "page": "forms/VX.md", + "count": 117 + }, + "VX128": { + "page": "forms/VX128.md", + "count": 34 + }, + "VX128_1": { + "page": "forms/VX128_1.md", + "count": 16 + }, + "VX128_2": { + "page": "forms/VX128_2.md", + "count": 1 + }, + "VX128_3": { + "page": "forms/VX128_3.md", + "count": 15 + }, + "VX128_4": { + "page": "forms/VX128_4.md", + "count": 2 + }, + "VX128_5": { + "page": "forms/VX128_5.md", + "count": 1 + }, + "VX128_P": { + "page": "forms/VX128_P.md", + "count": 1 + }, + "VX128_R": { + "page": "forms/VX128_R.md", + "count": 5 + }, + "X": { + "page": "forms/X.md", + "count": 117 + }, + "XFL": { + "page": "forms/XFL.md", + "count": 1 + }, + "XFX": { + "page": "forms/XFX.md", + "count": 4 + }, + "XL": { + "page": "forms/XL.md", + "count": 12 + }, + "XO": { + "page": "forms/XO.md", + "count": 21 + }, + "XS": { + "page": "forms/XS.md", + "count": 1 + } + }, + "instructions": { + "add": { + "page": "alu/addx.md", + "family": "addx", + "xml_mnem": "addx", + "opcode_hex": "0x7C000214", + "primary_opcode": 31, + "extended_opcode": 266, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Add", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "add.": { + "page": "alu/addx.md", + "family": "addx", + "variant_of": "add", + "xml_mnem": "addx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "addc": { + "page": "alu/addcx.md", + "family": "addcx", + "xml_mnem": "addcx", + "opcode_hex": "0x7C000014", + "primary_opcode": 31, + "extended_opcode": 10, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Add Carrying", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addc.": { + "page": "alu/addcx.md", + "family": "addcx", + "variant_of": "addc", + "xml_mnem": "addcx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "addco": { + "page": "alu/addcx.md", + "family": "addcx", + "variant_of": "addc", + "xml_mnem": "addcx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "addco.": { + "page": "alu/addcx.md", + "family": "addcx", + "variant_of": "addc", + "xml_mnem": "addcx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "adde": { + "page": "alu/addex.md", + "family": "addex", + "xml_mnem": "addex", + "opcode_hex": "0x7C000114", + "primary_opcode": 31, + "extended_opcode": 138, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Add Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "adde.": { + "page": "alu/addex.md", + "family": "addex", + "variant_of": "adde", + "xml_mnem": "addex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "addeo": { + "page": "alu/addex.md", + "family": "addex", + "variant_of": "adde", + "xml_mnem": "addex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "addeo.": { + "page": "alu/addex.md", + "family": "addex", + "variant_of": "adde", + "xml_mnem": "addex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "addi": { + "page": "alu/addi.md", + "family": "addi", + "xml_mnem": "addi", + "opcode_hex": "0x38000000", + "primary_opcode": 14, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Add Immediate", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addic": { + "page": "alu/addic.md", + "family": "addic", + "xml_mnem": "addic", + "opcode_hex": "0x30000000", + "primary_opcode": 12, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Add Immediate Carrying", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addic.": { + "page": "alu/addicx.md", + "family": "addic.", + "xml_mnem": "addic.", + "opcode_hex": "0x34000000", + "primary_opcode": 13, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Add Immediate Carrying and Record", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + }, + { + "field": "CR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": true + }, + "is_primary": true, + "flags": {} + }, + "addis": { + "page": "alu/addis.md", + "family": "addis", + "xml_mnem": "addis", + "opcode_hex": "0x3C000000", + "primary_opcode": 15, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Add Immediate Shifted", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addme": { + "page": "alu/addmex.md", + "family": "addmex", + "xml_mnem": "addmex", + "opcode_hex": "0x7C0001D4", + "primary_opcode": 31, + "extended_opcode": 234, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Add to Minus One Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addme.": { + "page": "alu/addmex.md", + "family": "addmex", + "variant_of": "addme", + "xml_mnem": "addmex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "addmeo": { + "page": "alu/addmex.md", + "family": "addmex", + "variant_of": "addme", + "xml_mnem": "addmex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "addmeo.": { + "page": "alu/addmex.md", + "family": "addmex", + "variant_of": "addme", + "xml_mnem": "addmex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "addo": { + "page": "alu/addx.md", + "family": "addx", + "variant_of": "add", + "xml_mnem": "addx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "addo.": { + "page": "alu/addx.md", + "family": "addx", + "variant_of": "add", + "xml_mnem": "addx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "addze": { + "page": "alu/addzex.md", + "family": "addzex", + "xml_mnem": "addzex", + "opcode_hex": "0x7C000194", + "primary_opcode": 31, + "extended_opcode": 202, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Add to Zero Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "addze.": { + "page": "alu/addzex.md", + "family": "addzex", + "variant_of": "addze", + "xml_mnem": "addzex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "addzeo": { + "page": "alu/addzex.md", + "family": "addzex", + "variant_of": "addze", + "xml_mnem": "addzex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "addzeo.": { + "page": "alu/addzex.md", + "family": "addzex", + "variant_of": "addze", + "xml_mnem": "addzex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "and": { + "page": "alu/andx.md", + "family": "andx", + "xml_mnem": "andx", + "opcode_hex": "0x7C000038", + "primary_opcode": 31, + "extended_opcode": 28, + "form": "X", + "group": "integer", + "category": "alu", + "description": "AND", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "and.": { + "page": "alu/andx.md", + "family": "andx", + "variant_of": "and", + "xml_mnem": "andx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "andc": { + "page": "alu/andcx.md", + "family": "andcx", + "xml_mnem": "andcx", + "opcode_hex": "0x7C000078", + "primary_opcode": 31, + "extended_opcode": 60, + "form": "X", + "group": "integer", + "category": "alu", + "description": "AND with Complement", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "andc.": { + "page": "alu/andcx.md", + "family": "andcx", + "variant_of": "andc", + "xml_mnem": "andcx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "andi.": { + "page": "alu/andix.md", + "family": "andi.", + "xml_mnem": "andi.", + "opcode_hex": "0x70000000", + "primary_opcode": 28, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "AND Immediate", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": true + }, + "is_primary": true, + "flags": {} + }, + "andis.": { + "page": "alu/andisx.md", + "family": "andis.", + "xml_mnem": "andis.", + "opcode_hex": "0x74000000", + "primary_opcode": 29, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "AND Immediate Shifted", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": true + }, + "is_primary": true, + "flags": {} + }, + "b": { + "page": "branch/bx.md", + "family": "bx", + "xml_mnem": "bx", + "opcode_hex": "0x48000000", + "primary_opcode": 18, + "extended_opcode": null, + "form": "I", + "group": "branch", + "category": "branch", + "description": "Branch", + "sync": true, + "reads": [ + { + "field": "LK", + "conditional": false + }, + { + "field": "AA", + "conditional": false + }, + { + "field": "ADDR", + "conditional": false + } + ], + "writes": [ + { + "field": "LR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": true, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "bc": { + "page": "branch/bcx.md", + "family": "bcx", + "xml_mnem": "bcx", + "opcode_hex": "0x40000000", + "primary_opcode": 16, + "extended_opcode": null, + "form": "B", + "group": "branch", + "category": "branch", + "description": "Branch Conditional", + "sync": true, + "reads": [ + { + "field": "LK", + "conditional": false + }, + { + "field": "AA", + "conditional": false + }, + { + "field": "BO", + "conditional": false + }, + { + "field": "BI", + "conditional": false + }, + { + "field": "ADDR", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CTR", + "conditional": true + } + ], + "writes": [ + { + "field": "CTR", + "conditional": true + }, + { + "field": "LR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": true, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "bcctr": { + "page": "branch/bcctrx.md", + "family": "bcctrx", + "xml_mnem": "bcctrx", + "opcode_hex": "0x4C000420", + "primary_opcode": 19, + "extended_opcode": 528, + "form": "XL", + "group": "branch", + "category": "branch", + "description": "Branch Conditional to Count Register", + "sync": true, + "reads": [ + { + "field": "LK", + "conditional": false + }, + { + "field": "BO", + "conditional": false + }, + { + "field": "BI", + "conditional": false + }, + { + "field": "CR", + "conditional": false + }, + { + "field": "CTR", + "conditional": false + } + ], + "writes": [ + { + "field": "LR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": true, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "bcctrl": { + "page": "branch/bcctrx.md", + "family": "bcctrx", + "variant_of": "bcctr", + "xml_mnem": "bcctrx", + "flags": { + "LK": 1 + }, + "category": "branch" + }, + "bcl": { + "page": "branch/bcx.md", + "family": "bcx", + "variant_of": "bc", + "xml_mnem": "bcx", + "flags": { + "LK": 1 + }, + "category": "branch" + }, + "bclr": { + "page": "branch/bclrx.md", + "family": "bclrx", + "xml_mnem": "bclrx", + "opcode_hex": "0x4C000020", + "primary_opcode": 19, + "extended_opcode": 16, + "form": "XL", + "group": "branch", + "category": "branch", + "description": "Branch Conditional to Link Register", + "sync": true, + "reads": [ + { + "field": "LK", + "conditional": false + }, + { + "field": "BO", + "conditional": false + }, + { + "field": "BI", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CTR", + "conditional": true + } + ], + "writes": [ + { + "field": "CTR", + "conditional": true + }, + { + "field": "LR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": true, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "bclrl": { + "page": "branch/bclrx.md", + "family": "bclrx", + "variant_of": "bclr", + "xml_mnem": "bclrx", + "flags": { + "LK": 1 + }, + "category": "branch" + }, + "bl": { + "page": "branch/bx.md", + "family": "bx", + "variant_of": "b", + "xml_mnem": "bx", + "flags": { + "LK": 1 + }, + "category": "branch" + }, + "cmp": { + "page": "alu/cmp.md", + "family": "cmp", + "xml_mnem": "cmp", + "opcode_hex": "0x7C000000", + "primary_opcode": 31, + "extended_opcode": 0, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Compare", + "sync": false, + "reads": [ + { + "field": "L", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cmpi": { + "page": "alu/cmpi.md", + "family": "cmpi", + "xml_mnem": "cmpi", + "opcode_hex": "0x2C000000", + "primary_opcode": 11, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Compare Immediate", + "sync": false, + "reads": [ + { + "field": "L", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cmpl": { + "page": "alu/cmpl.md", + "family": "cmpl", + "xml_mnem": "cmpl", + "opcode_hex": "0x7C000040", + "primary_opcode": 31, + "extended_opcode": 32, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Compare Logical", + "sync": false, + "reads": [ + { + "field": "L", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cmpli": { + "page": "alu/cmpli.md", + "family": "cmpli", + "xml_mnem": "cmpli", + "opcode_hex": "0x28000000", + "primary_opcode": 10, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Compare Logical Immediate", + "sync": false, + "reads": [ + { + "field": "L", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cntlzd": { + "page": "alu/cntlzdx.md", + "family": "cntlzdx", + "xml_mnem": "cntlzdx", + "opcode_hex": "0x7C000074", + "primary_opcode": 31, + "extended_opcode": 58, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Count Leading Zeros Doubleword", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cntlzd.": { + "page": "alu/cntlzdx.md", + "family": "cntlzdx", + "variant_of": "cntlzd", + "xml_mnem": "cntlzdx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "cntlzw": { + "page": "alu/cntlzwx.md", + "family": "cntlzwx", + "xml_mnem": "cntlzwx", + "opcode_hex": "0x7C000034", + "primary_opcode": 31, + "extended_opcode": 26, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Count Leading Zeros Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cntlzw.": { + "page": "alu/cntlzwx.md", + "family": "cntlzwx", + "variant_of": "cntlzw", + "xml_mnem": "cntlzwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "crand": { + "page": "control/crand.md", + "family": "crand", + "xml_mnem": "crand", + "opcode_hex": "0x4C000202", + "primary_opcode": 19, + "extended_opcode": 257, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register AND", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "crandc": { + "page": "control/crandc.md", + "family": "crandc", + "xml_mnem": "crandc", + "opcode_hex": "0x4C000102", + "primary_opcode": 19, + "extended_opcode": 129, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register AND with Complement", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "creqv": { + "page": "control/creqv.md", + "family": "creqv", + "xml_mnem": "creqv", + "opcode_hex": "0x4C000242", + "primary_opcode": 19, + "extended_opcode": 289, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register Equivalent", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "crnand": { + "page": "control/crnand.md", + "family": "crnand", + "xml_mnem": "crnand", + "opcode_hex": "0x4C0001C2", + "primary_opcode": 19, + "extended_opcode": 225, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register NAND", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "crnor": { + "page": "control/crnor.md", + "family": "crnor", + "xml_mnem": "crnor", + "opcode_hex": "0x4C000042", + "primary_opcode": 19, + "extended_opcode": 33, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register NOR", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "cror": { + "page": "control/cror.md", + "family": "cror", + "xml_mnem": "cror", + "opcode_hex": "0x4C000382", + "primary_opcode": 19, + "extended_opcode": 449, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register OR", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "crorc": { + "page": "control/crorc.md", + "family": "crorc", + "xml_mnem": "crorc", + "opcode_hex": "0x4C000342", + "primary_opcode": 19, + "extended_opcode": 417, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register OR with Complement", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "crxor": { + "page": "control/crxor.md", + "family": "crxor", + "xml_mnem": "crxor", + "opcode_hex": "0x4C000182", + "primary_opcode": 19, + "extended_opcode": 193, + "form": "XL", + "group": "control", + "category": "control", + "description": "Condition Register XOR", + "sync": false, + "reads": [ + { + "field": "CRBA", + "conditional": false + }, + { + "field": "CRBB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRBD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbf": { + "page": "memory/dcbf.md", + "family": "dcbf", + "xml_mnem": "dcbf", + "opcode_hex": "0x7C0000AC", + "primary_opcode": 31, + "extended_opcode": 86, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Flush", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbi": { + "page": "memory/dcbi.md", + "family": "dcbi", + "xml_mnem": "dcbi", + "opcode_hex": "0x7C0003AC", + "primary_opcode": 31, + "extended_opcode": 470, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Invalidate", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbst": { + "page": "memory/dcbst.md", + "family": "dcbst", + "xml_mnem": "dcbst", + "opcode_hex": "0x7C00006C", + "primary_opcode": 31, + "extended_opcode": 54, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Store", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbt": { + "page": "memory/dcbt.md", + "family": "dcbt", + "xml_mnem": "dcbt", + "opcode_hex": "0x7C00022C", + "primary_opcode": 31, + "extended_opcode": 278, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Touch", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbtst": { + "page": "memory/dcbtst.md", + "family": "dcbtst", + "xml_mnem": "dcbtst", + "opcode_hex": "0x7C0001EC", + "primary_opcode": 31, + "extended_opcode": 246, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Touch for Store", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbz": { + "page": "memory/dcbz.md", + "family": "dcbz", + "xml_mnem": "dcbz", + "opcode_hex": "0x7C0007EC", + "primary_opcode": 31, + "extended_opcode": 1014, + "form": "DCBZ", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Clear to Zero", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "dcbz128": { + "page": "memory/dcbz.md", + "family": "dcbz", + "xml_mnem": "dcbz128", + "opcode_hex": "0x7C2007EC", + "primary_opcode": 31, + "extended_opcode": 1014, + "form": "DCBZ", + "group": "memory", + "category": "memory", + "description": "Data Cache Block Clear to Zero 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "divd": { + "page": "alu/divdx.md", + "family": "divdx", + "xml_mnem": "divdx", + "opcode_hex": "0x7C0003D2", + "primary_opcode": 31, + "extended_opcode": 489, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Divide Doubleword", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "divd.": { + "page": "alu/divdx.md", + "family": "divdx", + "variant_of": "divd", + "xml_mnem": "divdx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "divdo": { + "page": "alu/divdx.md", + "family": "divdx", + "variant_of": "divd", + "xml_mnem": "divdx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "divdo.": { + "page": "alu/divdx.md", + "family": "divdx", + "variant_of": "divd", + "xml_mnem": "divdx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "divdu": { + "page": "alu/divdux.md", + "family": "divdux", + "xml_mnem": "divdux", + "opcode_hex": "0x7C000392", + "primary_opcode": 31, + "extended_opcode": 457, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Divide Doubleword Unsigned", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "divdu.": { + "page": "alu/divdux.md", + "family": "divdux", + "variant_of": "divdu", + "xml_mnem": "divdux", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "divduo": { + "page": "alu/divdux.md", + "family": "divdux", + "variant_of": "divdu", + "xml_mnem": "divdux", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "divduo.": { + "page": "alu/divdux.md", + "family": "divdux", + "variant_of": "divdu", + "xml_mnem": "divdux", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "divw": { + "page": "alu/divwx.md", + "family": "divwx", + "xml_mnem": "divwx", + "opcode_hex": "0x7C0003D6", + "primary_opcode": 31, + "extended_opcode": 491, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Divide Word", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "divw.": { + "page": "alu/divwx.md", + "family": "divwx", + "variant_of": "divw", + "xml_mnem": "divwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "divwo": { + "page": "alu/divwx.md", + "family": "divwx", + "variant_of": "divw", + "xml_mnem": "divwx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "divwo.": { + "page": "alu/divwx.md", + "family": "divwx", + "variant_of": "divw", + "xml_mnem": "divwx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "divwu": { + "page": "alu/divwux.md", + "family": "divwux", + "xml_mnem": "divwux", + "opcode_hex": "0x7C000396", + "primary_opcode": 31, + "extended_opcode": 459, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Divide Word Unsigned", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "divwu.": { + "page": "alu/divwux.md", + "family": "divwux", + "variant_of": "divwu", + "xml_mnem": "divwux", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "divwuo": { + "page": "alu/divwux.md", + "family": "divwux", + "variant_of": "divwu", + "xml_mnem": "divwux", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "divwuo.": { + "page": "alu/divwux.md", + "family": "divwux", + "variant_of": "divwu", + "xml_mnem": "divwux", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "eieio": { + "page": "alu/eieio.md", + "family": "eieio", + "xml_mnem": "eieio", + "opcode_hex": "0x7C0006AC", + "primary_opcode": 31, + "extended_opcode": 854, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Enforce In-Order Execution of I/O", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "eqv": { + "page": "alu/eqvx.md", + "family": "eqvx", + "xml_mnem": "eqvx", + "opcode_hex": "0x7C000238", + "primary_opcode": 31, + "extended_opcode": 284, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Equivalent", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "eqv.": { + "page": "alu/eqvx.md", + "family": "eqvx", + "variant_of": "eqv", + "xml_mnem": "eqvx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "extsb": { + "page": "alu/extsbx.md", + "family": "extsbx", + "xml_mnem": "extsbx", + "opcode_hex": "0x7C000774", + "primary_opcode": 31, + "extended_opcode": 954, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Extend Sign Byte", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "extsb.": { + "page": "alu/extsbx.md", + "family": "extsbx", + "variant_of": "extsb", + "xml_mnem": "extsbx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "extsh": { + "page": "alu/extshx.md", + "family": "extshx", + "xml_mnem": "extshx", + "opcode_hex": "0x7C000734", + "primary_opcode": 31, + "extended_opcode": 922, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Extend Sign Half Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "extsh.": { + "page": "alu/extshx.md", + "family": "extshx", + "variant_of": "extsh", + "xml_mnem": "extshx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "extsw": { + "page": "alu/extswx.md", + "family": "extswx", + "xml_mnem": "extswx", + "opcode_hex": "0x7C0007B4", + "primary_opcode": 31, + "extended_opcode": 986, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Extend Sign Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "extsw.": { + "page": "alu/extswx.md", + "family": "extswx", + "variant_of": "extsw", + "xml_mnem": "extswx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "fabs": { + "page": "fpu/fabsx.md", + "family": "fabsx", + "xml_mnem": "fabsx", + "opcode_hex": "0xFC000210", + "primary_opcode": 63, + "extended_opcode": 264, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Absolute Value", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fabs.": { + "page": "fpu/fabsx.md", + "family": "fabsx", + "variant_of": "fabs", + "xml_mnem": "fabsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fadd": { + "page": "fpu/faddx.md", + "family": "faddx", + "xml_mnem": "faddx", + "opcode_hex": "0xFC00002A", + "primary_opcode": 63, + "extended_opcode": 21, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Add", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fadd.": { + "page": "fpu/faddx.md", + "family": "faddx", + "variant_of": "fadd", + "xml_mnem": "faddx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fadds": { + "page": "fpu/faddsx.md", + "family": "faddsx", + "xml_mnem": "faddsx", + "opcode_hex": "0xEC00002A", + "primary_opcode": 59, + "extended_opcode": 21, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Add Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fadds.": { + "page": "fpu/faddsx.md", + "family": "faddsx", + "variant_of": "fadds", + "xml_mnem": "faddsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fcfid": { + "page": "fpu/fcfidx.md", + "family": "fcfidx", + "xml_mnem": "fcfidx", + "opcode_hex": "0xFC00069C", + "primary_opcode": 63, + "extended_opcode": 846, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Convert From Integer Doubleword", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fcfid.": { + "page": "fpu/fcfidx.md", + "family": "fcfidx", + "variant_of": "fcfid", + "xml_mnem": "fcfidx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fcmpo": { + "page": "fpu/fcmpo.md", + "family": "fcmpo", + "xml_mnem": "fcmpo", + "opcode_hex": "0xFC000040", + "primary_opcode": 63, + "extended_opcode": 32, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Compare Ordered", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fcmpu": { + "page": "fpu/fcmpu.md", + "family": "fcmpu", + "xml_mnem": "fcmpu", + "opcode_hex": "0xFC000000", + "primary_opcode": 63, + "extended_opcode": 0, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Compare Unordered", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fctid": { + "page": "fpu/fctidx.md", + "family": "fctidx", + "xml_mnem": "fctidx", + "opcode_hex": "0xFC00065C", + "primary_opcode": 63, + "extended_opcode": 814, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Convert to Integer Doubleword", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fctid.": { + "page": "fpu/fctidx.md", + "family": "fctidx", + "variant_of": "fctid", + "xml_mnem": "fctidx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fctidz": { + "page": "fpu/fctidzx.md", + "family": "fctidzx", + "xml_mnem": "fctidzx", + "opcode_hex": "0xFC00065E", + "primary_opcode": 63, + "extended_opcode": 815, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Convert to Integer Doubleword with Round Toward Zero", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fctidz.": { + "page": "fpu/fctidzx.md", + "family": "fctidzx", + "variant_of": "fctidz", + "xml_mnem": "fctidzx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fctiw": { + "page": "fpu/fctiwx.md", + "family": "fctiwx", + "xml_mnem": "fctiwx", + "opcode_hex": "0xFC00001C", + "primary_opcode": 63, + "extended_opcode": 14, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Convert to Integer Word", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fctiw.": { + "page": "fpu/fctiwx.md", + "family": "fctiwx", + "variant_of": "fctiw", + "xml_mnem": "fctiwx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fctiwz": { + "page": "fpu/fctiwzx.md", + "family": "fctiwzx", + "xml_mnem": "fctiwzx", + "opcode_hex": "0xFC00001E", + "primary_opcode": 63, + "extended_opcode": 15, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Convert to Integer Word with Round Toward Zero", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fctiwz.": { + "page": "fpu/fctiwzx.md", + "family": "fctiwzx", + "variant_of": "fctiwz", + "xml_mnem": "fctiwzx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fdiv": { + "page": "fpu/fdivx.md", + "family": "fdivx", + "xml_mnem": "fdivx", + "opcode_hex": "0xFC000024", + "primary_opcode": 63, + "extended_opcode": 18, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Divide", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fdiv.": { + "page": "fpu/fdivx.md", + "family": "fdivx", + "variant_of": "fdiv", + "xml_mnem": "fdivx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fdivs": { + "page": "fpu/fdivsx.md", + "family": "fdivsx", + "xml_mnem": "fdivsx", + "opcode_hex": "0xEC000024", + "primary_opcode": 59, + "extended_opcode": 18, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Divide Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fdivs.": { + "page": "fpu/fdivsx.md", + "family": "fdivsx", + "variant_of": "fdivs", + "xml_mnem": "fdivsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmadd": { + "page": "fpu/fmaddx.md", + "family": "fmaddx", + "xml_mnem": "fmaddx", + "opcode_hex": "0xFC00003A", + "primary_opcode": 63, + "extended_opcode": 29, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply-Add", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmadd.": { + "page": "fpu/fmaddx.md", + "family": "fmaddx", + "variant_of": "fmadd", + "xml_mnem": "fmaddx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmadds": { + "page": "fpu/fmaddsx.md", + "family": "fmaddsx", + "xml_mnem": "fmaddsx", + "opcode_hex": "0xEC00003A", + "primary_opcode": 59, + "extended_opcode": 29, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply-Add Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmadds.": { + "page": "fpu/fmaddsx.md", + "family": "fmaddsx", + "variant_of": "fmadds", + "xml_mnem": "fmaddsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmr": { + "page": "fpu/fmrx.md", + "family": "fmrx", + "xml_mnem": "fmrx", + "opcode_hex": "0xFC000090", + "primary_opcode": 63, + "extended_opcode": 72, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Move Register", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmr.": { + "page": "fpu/fmrx.md", + "family": "fmrx", + "variant_of": "fmr", + "xml_mnem": "fmrx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmsub": { + "page": "fpu/fmsubx.md", + "family": "fmsubx", + "xml_mnem": "fmsubx", + "opcode_hex": "0xFC000038", + "primary_opcode": 63, + "extended_opcode": 28, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply-Subtract", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmsub.": { + "page": "fpu/fmsubx.md", + "family": "fmsubx", + "variant_of": "fmsub", + "xml_mnem": "fmsubx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmsubs": { + "page": "fpu/fmsubsx.md", + "family": "fmsubsx", + "xml_mnem": "fmsubsx", + "opcode_hex": "0xEC000038", + "primary_opcode": 59, + "extended_opcode": 28, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply-Subtract Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmsubs.": { + "page": "fpu/fmsubsx.md", + "family": "fmsubsx", + "variant_of": "fmsubs", + "xml_mnem": "fmsubsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmul": { + "page": "fpu/fmulx.md", + "family": "fmulx", + "xml_mnem": "fmulx", + "opcode_hex": "0xFC000032", + "primary_opcode": 63, + "extended_opcode": 25, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmul.": { + "page": "fpu/fmulx.md", + "family": "fmulx", + "variant_of": "fmul", + "xml_mnem": "fmulx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fmuls": { + "page": "fpu/fmulsx.md", + "family": "fmulsx", + "xml_mnem": "fmulsx", + "opcode_hex": "0xEC000032", + "primary_opcode": 59, + "extended_opcode": 25, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Multiply Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fmuls.": { + "page": "fpu/fmulsx.md", + "family": "fmulsx", + "variant_of": "fmuls", + "xml_mnem": "fmulsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fnabs": { + "page": "fpu/fnabsx.md", + "family": "fnabsx", + "xml_mnem": "fnabsx", + "opcode_hex": "0xFC000110", + "primary_opcode": 63, + "extended_opcode": 136, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Negative Absolute Value", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fnabs.": { + "page": "fpu/fnabsx.md", + "family": "fnabsx", + "variant_of": "fnabs", + "xml_mnem": "fnabsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fneg": { + "page": "fpu/fnegx.md", + "family": "fnegx", + "xml_mnem": "fnegx", + "opcode_hex": "0xFC000050", + "primary_opcode": 63, + "extended_opcode": 40, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Negate", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fneg.": { + "page": "fpu/fnegx.md", + "family": "fnegx", + "variant_of": "fneg", + "xml_mnem": "fnegx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fnmadd": { + "page": "fpu/fnmaddx.md", + "family": "fnmaddx", + "xml_mnem": "fnmaddx", + "opcode_hex": "0xFC00003E", + "primary_opcode": 63, + "extended_opcode": 31, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Negative Multiply-Add", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fnmadd.": { + "page": "fpu/fnmaddx.md", + "family": "fnmaddx", + "variant_of": "fnmadd", + "xml_mnem": "fnmaddx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fnmadds": { + "page": "fpu/fnmaddsx.md", + "family": "fnmaddsx", + "xml_mnem": "fnmaddsx", + "opcode_hex": "0xEC00003E", + "primary_opcode": 59, + "extended_opcode": 31, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Negative Multiply-Add Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fnmadds.": { + "page": "fpu/fnmaddsx.md", + "family": "fnmaddsx", + "variant_of": "fnmadds", + "xml_mnem": "fnmaddsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fnmsub": { + "page": "fpu/fnmsubx.md", + "family": "fnmsubx", + "xml_mnem": "fnmsubx", + "opcode_hex": "0xFC00003C", + "primary_opcode": 63, + "extended_opcode": 30, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Negative Multiply-Subtract", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fnmsub.": { + "page": "fpu/fnmsubx.md", + "family": "fnmsubx", + "variant_of": "fnmsub", + "xml_mnem": "fnmsubx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fnmsubs": { + "page": "fpu/fnmsubsx.md", + "family": "fnmsubsx", + "xml_mnem": "fnmsubsx", + "opcode_hex": "0xEC00003C", + "primary_opcode": 59, + "extended_opcode": 30, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Negative Multiply-Subtract Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fnmsubs.": { + "page": "fpu/fnmsubsx.md", + "family": "fnmsubsx", + "variant_of": "fnmsubs", + "xml_mnem": "fnmsubsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fres": { + "page": "fpu/fresx.md", + "family": "fresx", + "xml_mnem": "fresx", + "opcode_hex": "0xEC000030", + "primary_opcode": 59, + "extended_opcode": 24, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Reciprocal Estimate Single", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fres.": { + "page": "fpu/fresx.md", + "family": "fresx", + "variant_of": "fres", + "xml_mnem": "fresx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "frsp": { + "page": "fpu/frspx.md", + "family": "frspx", + "xml_mnem": "frspx", + "opcode_hex": "0xFC000018", + "primary_opcode": 63, + "extended_opcode": 12, + "form": "X", + "group": "fpu", + "category": "fpu", + "description": "Floating Round to Single", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "frsp.": { + "page": "fpu/frspx.md", + "family": "frspx", + "variant_of": "frsp", + "xml_mnem": "frspx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "frsqrte": { + "page": "fpu/frsqrtex.md", + "family": "frsqrtex", + "xml_mnem": "frsqrtex", + "opcode_hex": "0xFC000034", + "primary_opcode": 63, + "extended_opcode": 26, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Reciprocal Square Root Estimate", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "frsqrte.": { + "page": "fpu/frsqrtex.md", + "family": "frsqrtex", + "variant_of": "frsqrte", + "xml_mnem": "frsqrtex", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fsel": { + "page": "fpu/fselx.md", + "family": "fselx", + "xml_mnem": "fselx", + "opcode_hex": "0xFC00002E", + "primary_opcode": 63, + "extended_opcode": 23, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Select", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FC", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fsel.": { + "page": "fpu/fselx.md", + "family": "fselx", + "variant_of": "fsel", + "xml_mnem": "fselx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fsqrt": { + "page": "fpu/fsqrtx.md", + "family": "fsqrtx", + "xml_mnem": "fsqrtx", + "opcode_hex": "0xFC00002C", + "primary_opcode": 63, + "extended_opcode": 22, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Square Root", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fsqrt.": { + "page": "fpu/fsqrtx.md", + "family": "fsqrtx", + "variant_of": "fsqrt", + "xml_mnem": "fsqrtx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fsqrts": { + "page": "fpu/fsqrtsx.md", + "family": "fsqrtsx", + "xml_mnem": "fsqrtsx", + "opcode_hex": "0xEC00002C", + "primary_opcode": 59, + "extended_opcode": 22, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Square Root Single", + "sync": false, + "reads": [ + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fsqrts.": { + "page": "fpu/fsqrtsx.md", + "family": "fsqrtsx", + "variant_of": "fsqrts", + "xml_mnem": "fsqrtsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fsub": { + "page": "fpu/fsubx.md", + "family": "fsubx", + "xml_mnem": "fsubx", + "opcode_hex": "0xFC000028", + "primary_opcode": 63, + "extended_opcode": 20, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Subtract", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fsub.": { + "page": "fpu/fsubx.md", + "family": "fsubx", + "variant_of": "fsub", + "xml_mnem": "fsubx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "fsubs": { + "page": "fpu/fsubsx.md", + "family": "fsubsx", + "xml_mnem": "fsubsx", + "opcode_hex": "0xEC000028", + "primary_opcode": 59, + "extended_opcode": 20, + "form": "A", + "group": "fpu", + "category": "fpu", + "description": "Floating Subtract Single", + "sync": false, + "reads": [ + { + "field": "FA", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "fsubs.": { + "page": "fpu/fsubsx.md", + "family": "fsubsx", + "variant_of": "fsubs", + "xml_mnem": "fsubsx", + "flags": { + "Rc": 1 + }, + "category": "fpu" + }, + "icbi": { + "page": "memory/icbi.md", + "family": "icbi", + "xml_mnem": "icbi", + "opcode_hex": "0x7C0007AC", + "primary_opcode": 31, + "extended_opcode": 982, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Instruction Cache Block Invalidate", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "isync": { + "page": "alu/isync.md", + "family": "isync", + "xml_mnem": "isync", + "opcode_hex": "0x4C00012C", + "primary_opcode": 19, + "extended_opcode": 150, + "form": "XL", + "group": "integer", + "category": "alu", + "description": "Instruction Synchronize", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lbz": { + "page": "memory/lbz.md", + "family": "lbz", + "xml_mnem": "lbz", + "opcode_hex": "0x88000000", + "primary_opcode": 34, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Byte and Zero", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lbzu": { + "page": "memory/lbz.md", + "family": "lbz", + "xml_mnem": "lbzu", + "opcode_hex": "0x8C000000", + "primary_opcode": 35, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Byte and Zero with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lbzux": { + "page": "memory/lbz.md", + "family": "lbz", + "xml_mnem": "lbzux", + "opcode_hex": "0x7C0000EE", + "primary_opcode": 31, + "extended_opcode": 119, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Byte and Zero with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lbzx": { + "page": "memory/lbz.md", + "family": "lbz", + "xml_mnem": "lbzx", + "opcode_hex": "0x7C0000AE", + "primary_opcode": 31, + "extended_opcode": 87, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Byte and Zero Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ld": { + "page": "memory/ld.md", + "family": "ld", + "xml_mnem": "ld", + "opcode_hex": "0xE8000000", + "primary_opcode": 58, + "extended_opcode": null, + "form": "DS", + "group": "memory", + "category": "memory", + "description": "Load Doubleword", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "ds", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ldarx": { + "page": "memory/ldarx.md", + "family": "ldarx", + "xml_mnem": "ldarx", + "opcode_hex": "0x7C0000A8", + "primary_opcode": 31, + "extended_opcode": 84, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Doubleword and Reserve Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ldbrx": { + "page": "memory/ldbrx.md", + "family": "ldbrx", + "xml_mnem": "ldbrx", + "opcode_hex": "0x7C000428", + "primary_opcode": 31, + "extended_opcode": 532, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Doubleword Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ldu": { + "page": "memory/ld.md", + "family": "ld", + "xml_mnem": "ldu", + "opcode_hex": "0xE8000001", + "primary_opcode": 58, + "extended_opcode": null, + "form": "DS", + "group": "memory", + "category": "memory", + "description": "Load Doubleword with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "ds", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ldux": { + "page": "memory/ld.md", + "family": "ld", + "xml_mnem": "ldux", + "opcode_hex": "0x7C00006A", + "primary_opcode": 31, + "extended_opcode": 53, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Doubleword with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "ldx": { + "page": "memory/ld.md", + "family": "ld", + "xml_mnem": "ldx", + "opcode_hex": "0x7C00002A", + "primary_opcode": 31, + "extended_opcode": 21, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Doubleword Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfd": { + "page": "memory/lfd.md", + "family": "lfd", + "xml_mnem": "lfd", + "opcode_hex": "0xC8000000", + "primary_opcode": 50, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Double", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfdu": { + "page": "memory/lfd.md", + "family": "lfd", + "xml_mnem": "lfdu", + "opcode_hex": "0xCC000000", + "primary_opcode": 51, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Double with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfdux": { + "page": "memory/lfd.md", + "family": "lfd", + "xml_mnem": "lfdux", + "opcode_hex": "0x7C0004EE", + "primary_opcode": 31, + "extended_opcode": 631, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Double with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfdx": { + "page": "memory/lfd.md", + "family": "lfd", + "xml_mnem": "lfdx", + "opcode_hex": "0x7C0004AE", + "primary_opcode": 31, + "extended_opcode": 599, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Double Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfs": { + "page": "memory/lfs.md", + "family": "lfs", + "xml_mnem": "lfs", + "opcode_hex": "0xC0000000", + "primary_opcode": 48, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Single", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfsu": { + "page": "memory/lfs.md", + "family": "lfs", + "xml_mnem": "lfsu", + "opcode_hex": "0xC4000000", + "primary_opcode": 49, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Single with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfsux": { + "page": "memory/lfs.md", + "family": "lfs", + "xml_mnem": "lfsux", + "opcode_hex": "0x7C00046E", + "primary_opcode": 31, + "extended_opcode": 567, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Single with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lfsx": { + "page": "memory/lfs.md", + "family": "lfs", + "xml_mnem": "lfsx", + "opcode_hex": "0x7C00042E", + "primary_opcode": 31, + "extended_opcode": 535, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Floating-Point Single Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lha": { + "page": "memory/lha.md", + "family": "lha", + "xml_mnem": "lha", + "opcode_hex": "0xA8000000", + "primary_opcode": 42, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Half Word Algebraic", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhau": { + "page": "memory/lha.md", + "family": "lha", + "xml_mnem": "lhau", + "opcode_hex": "0xAC000000", + "primary_opcode": 43, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Half Word Algebraic with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhaux": { + "page": "memory/lha.md", + "family": "lha", + "xml_mnem": "lhaux", + "opcode_hex": "0x7C0002EE", + "primary_opcode": 31, + "extended_opcode": 375, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Half Word Algebraic with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhax": { + "page": "memory/lha.md", + "family": "lha", + "xml_mnem": "lhax", + "opcode_hex": "0x7C0002AE", + "primary_opcode": 31, + "extended_opcode": 343, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Half Word Algebraic Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhbrx": { + "page": "memory/lhbrx.md", + "family": "lhbrx", + "xml_mnem": "lhbrx", + "opcode_hex": "0x7C00062C", + "primary_opcode": 31, + "extended_opcode": 790, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Half Word Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhz": { + "page": "memory/lhz.md", + "family": "lhz", + "xml_mnem": "lhz", + "opcode_hex": "0xA0000000", + "primary_opcode": 40, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Half Word and Zero", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhzu": { + "page": "memory/lhz.md", + "family": "lhz", + "xml_mnem": "lhzu", + "opcode_hex": "0xA4000000", + "primary_opcode": 41, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Half Word and Zero with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhzux": { + "page": "memory/lhz.md", + "family": "lhz", + "xml_mnem": "lhzux", + "opcode_hex": "0x7C00026E", + "primary_opcode": 31, + "extended_opcode": 311, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Half Word and Zero with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lhzx": { + "page": "memory/lhz.md", + "family": "lhz", + "xml_mnem": "lhzx", + "opcode_hex": "0x7C00022E", + "primary_opcode": 31, + "extended_opcode": 279, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Half Word and Zero Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lmw": { + "page": "memory/lmw.md", + "family": "lmw", + "xml_mnem": "lmw", + "opcode_hex": "0xB8000000", + "primary_opcode": 46, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Multiple Word", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lswi": { + "page": "memory/lswi.md", + "family": "lswi", + "xml_mnem": "lswi", + "opcode_hex": "0x7C0004AA", + "primary_opcode": 31, + "extended_opcode": 597, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load String Word Immediate", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lswx": { + "page": "memory/lswx.md", + "family": "lswx", + "xml_mnem": "lswx", + "opcode_hex": "0x7C00042A", + "primary_opcode": 31, + "extended_opcode": 533, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load String Word Indexed", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvebx": { + "page": "memory/lvebx.md", + "family": "lvebx", + "xml_mnem": "lvebx", + "opcode_hex": "0x7C00000E", + "primary_opcode": 31, + "extended_opcode": 7, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Element Byte Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvehx": { + "page": "memory/lvehx.md", + "family": "lvehx", + "xml_mnem": "lvehx", + "opcode_hex": "0x7C00004E", + "primary_opcode": 31, + "extended_opcode": 39, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Element Half Word Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvewx": { + "page": "memory/lvewx.md", + "family": "lvewx", + "xml_mnem": "lvewx", + "opcode_hex": "0x7C00008E", + "primary_opcode": 31, + "extended_opcode": 71, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Element Word Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvewx128": { + "page": "memory/lvewx.md", + "family": "lvewx", + "xml_mnem": "lvewx128", + "opcode_hex": "0x10000083", + "primary_opcode": 4, + "extended_opcode": 131, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Element Word Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvlx": { + "page": "memory/lvlx.md", + "family": "lvlx", + "xml_mnem": "lvlx", + "opcode_hex": "0x7C00040E", + "primary_opcode": 31, + "extended_opcode": 519, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Left Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvlx128": { + "page": "memory/lvlx.md", + "family": "lvlx", + "xml_mnem": "lvlx128", + "opcode_hex": "0x10000403", + "primary_opcode": 4, + "extended_opcode": 1027, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Left Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvlxl": { + "page": "memory/lvlxl.md", + "family": "lvlxl", + "xml_mnem": "lvlxl", + "opcode_hex": "0x7C00060E", + "primary_opcode": 31, + "extended_opcode": 775, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Left Indexed LRU", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvlxl128": { + "page": "memory/lvlxl.md", + "family": "lvlxl", + "xml_mnem": "lvlxl128", + "opcode_hex": "0x10000603", + "primary_opcode": 4, + "extended_opcode": 1539, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Left Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvrx": { + "page": "memory/lvrx.md", + "family": "lvrx", + "xml_mnem": "lvrx", + "opcode_hex": "0x7C00044E", + "primary_opcode": 31, + "extended_opcode": 551, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Right Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvrx128": { + "page": "memory/lvrx.md", + "family": "lvrx", + "xml_mnem": "lvrx128", + "opcode_hex": "0x10000443", + "primary_opcode": 4, + "extended_opcode": 1091, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Right Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvrxl": { + "page": "memory/lvrxl.md", + "family": "lvrxl", + "xml_mnem": "lvrxl", + "opcode_hex": "0x7C00064E", + "primary_opcode": 31, + "extended_opcode": 807, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Right Indexed LRU", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvrxl128": { + "page": "memory/lvrxl.md", + "family": "lvrxl", + "xml_mnem": "lvrxl128", + "opcode_hex": "0x10000643", + "primary_opcode": 4, + "extended_opcode": 1603, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Right Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvsl": { + "page": "vmx/lvsl.md", + "family": "lvsl", + "xml_mnem": "lvsl", + "opcode_hex": "0x7C00000C", + "primary_opcode": 31, + "extended_opcode": 6, + "form": "X", + "group": "vmx", + "category": "vmx", + "description": "Load Vector for Shift Left Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvsl128": { + "page": "vmx/lvsl.md", + "family": "lvsl", + "xml_mnem": "lvsl128", + "opcode_hex": "0x10000003", + "primary_opcode": 4, + "extended_opcode": 3, + "form": "VX128_1", + "group": "vmx", + "category": "vmx", + "description": "Load Vector for Shift Left Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvsr": { + "page": "vmx/lvsr.md", + "family": "lvsr", + "xml_mnem": "lvsr", + "opcode_hex": "0x7C00004C", + "primary_opcode": 31, + "extended_opcode": 38, + "form": "X", + "group": "vmx", + "category": "vmx", + "description": "Load Vector for Shift Right Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvsr128": { + "page": "vmx/lvsr.md", + "family": "lvsr", + "xml_mnem": "lvsr128", + "opcode_hex": "0x10000043", + "primary_opcode": 4, + "extended_opcode": 67, + "form": "VX128_1", + "group": "vmx", + "category": "vmx", + "description": "Load Vector for Shift Right Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvx": { + "page": "memory/lvx.md", + "family": "lvx", + "xml_mnem": "lvx", + "opcode_hex": "0x7C0000CE", + "primary_opcode": 31, + "extended_opcode": 103, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvx128": { + "page": "memory/lvx.md", + "family": "lvx", + "xml_mnem": "lvx128", + "opcode_hex": "0x100000C3", + "primary_opcode": 4, + "extended_opcode": 195, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Indexed 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvxl": { + "page": "memory/lvxl.md", + "family": "lvxl", + "xml_mnem": "lvxl", + "opcode_hex": "0x7C0002CE", + "primary_opcode": 31, + "extended_opcode": 359, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Vector Indexed LRU", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lvxl128": { + "page": "memory/lvxl.md", + "family": "lvxl", + "xml_mnem": "lvxl128", + "opcode_hex": "0x100002C3", + "primary_opcode": 4, + "extended_opcode": 707, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Load Vector Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwa": { + "page": "memory/lwa.md", + "family": "lwa", + "xml_mnem": "lwa", + "opcode_hex": "0xE8000002", + "primary_opcode": 58, + "extended_opcode": null, + "form": "DS", + "group": "memory", + "category": "memory", + "description": "Load Word Algebraic", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "ds", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwarx": { + "page": "memory/lwarx.md", + "family": "lwarx", + "xml_mnem": "lwarx", + "opcode_hex": "0x7C000028", + "primary_opcode": 31, + "extended_opcode": 20, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word and Reserve Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwaux": { + "page": "memory/lwa.md", + "family": "lwa", + "xml_mnem": "lwaux", + "opcode_hex": "0x7C0002EA", + "primary_opcode": 31, + "extended_opcode": 373, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word Algebraic with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwax": { + "page": "memory/lwa.md", + "family": "lwa", + "xml_mnem": "lwax", + "opcode_hex": "0x7C0002AA", + "primary_opcode": 31, + "extended_opcode": 341, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word Algebraic Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwbrx": { + "page": "memory/lwbrx.md", + "family": "lwbrx", + "xml_mnem": "lwbrx", + "opcode_hex": "0x7C00042C", + "primary_opcode": 31, + "extended_opcode": 534, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwz": { + "page": "memory/lwz.md", + "family": "lwz", + "xml_mnem": "lwz", + "opcode_hex": "0x80000000", + "primary_opcode": 32, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Word and Zero", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwzu": { + "page": "memory/lwz.md", + "family": "lwz", + "xml_mnem": "lwzu", + "opcode_hex": "0x84000000", + "primary_opcode": 33, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Load Word and Zero with Update", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwzux": { + "page": "memory/lwz.md", + "family": "lwz", + "xml_mnem": "lwzux", + "opcode_hex": "0x7C00006E", + "primary_opcode": 31, + "extended_opcode": 55, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word and Zero with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "lwzx": { + "page": "memory/lwz.md", + "family": "lwz", + "xml_mnem": "lwzx", + "opcode_hex": "0x7C00002E", + "primary_opcode": 31, + "extended_opcode": 23, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Load Word and Zero Indexed", + "sync": false, + "reads": [ + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mcrf": { + "page": "control/mcrf.md", + "family": "mcrf", + "xml_mnem": "mcrf", + "opcode_hex": "0x4C000000", + "primary_opcode": 19, + "extended_opcode": 0, + "form": "XL", + "group": "control", + "category": "control", + "description": "Move Condition Register Field", + "sync": false, + "reads": [ + { + "field": "CRFS", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mcrfs": { + "page": "control/mcrfs.md", + "family": "mcrfs", + "xml_mnem": "mcrfs", + "opcode_hex": "0xFC000080", + "primary_opcode": 63, + "extended_opcode": 64, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to Condition Register from FPSCR", + "sync": false, + "reads": [ + { + "field": "CRFS", + "conditional": false + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + }, + { + "field": "FPSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mcrxr": { + "page": "control/mcrxr.md", + "family": "mcrxr", + "xml_mnem": "mcrxr", + "opcode_hex": "0x7C000400", + "primary_opcode": 31, + "extended_opcode": 512, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to Condition Register from XER", + "sync": false, + "reads": [ + { + "field": "CR", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mfcr": { + "page": "control/mfcr.md", + "family": "mfcr", + "xml_mnem": "mfcr", + "opcode_hex": "0x7C000026", + "primary_opcode": 31, + "extended_opcode": 19, + "form": "X", + "group": "control", + "category": "control", + "description": "Move from Condition Register", + "sync": false, + "reads": [ + { + "field": "CR", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mffs": { + "page": "control/mffsx.md", + "family": "mffsx", + "xml_mnem": "mffsx", + "opcode_hex": "0xFC00048E", + "primary_opcode": 63, + "extended_opcode": 583, + "form": "X", + "group": "control", + "category": "control", + "description": "Move from FPSCR", + "sync": false, + "reads": [ + { + "field": "FPSCR", + "conditional": false + } + ], + "writes": [ + { + "field": "FD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mffs.": { + "page": "control/mffsx.md", + "family": "mffsx", + "variant_of": "mffs", + "xml_mnem": "mffsx", + "flags": { + "Rc": 1 + }, + "category": "control" + }, + "mfmsr": { + "page": "control/mfmsr.md", + "family": "mfmsr", + "xml_mnem": "mfmsr", + "opcode_hex": "0x7C0000A6", + "primary_opcode": 31, + "extended_opcode": 83, + "form": "X", + "group": "control", + "category": "control", + "description": "Move from Machine State Register", + "sync": true, + "reads": [ + { + "field": "MSR", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mfspr": { + "page": "control/mfspr.md", + "family": "mfspr", + "xml_mnem": "mfspr", + "opcode_hex": "0x7C0002A6", + "primary_opcode": 31, + "extended_opcode": 339, + "form": "XFX", + "group": "control", + "category": "control", + "description": "Move from Special-Purpose Register", + "sync": false, + "reads": [ + { + "field": "SPR", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mftb": { + "page": "control/mftb.md", + "family": "mftb", + "xml_mnem": "mftb", + "opcode_hex": "0x7C0002E6", + "primary_opcode": 31, + "extended_opcode": 371, + "form": "XFX", + "group": "control", + "category": "control", + "description": "Move from Time Base", + "sync": false, + "reads": [ + { + "field": "TBR", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mfvscr": { + "page": "control/mfvscr.md", + "family": "mfvscr", + "xml_mnem": "mfvscr", + "opcode_hex": "0x10000604", + "primary_opcode": 4, + "extended_opcode": 1540, + "form": "VX", + "group": "control", + "category": "control", + "description": "Move from VSCR", + "sync": false, + "reads": [ + { + "field": "VSCR", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtcrf": { + "page": "control/mtcrf.md", + "family": "mtcrf", + "xml_mnem": "mtcrf", + "opcode_hex": "0x7C000120", + "primary_opcode": 31, + "extended_opcode": 144, + "form": "XFX", + "group": "control", + "category": "control", + "description": "Move to Condition Register Fields", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "CRM", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtfsb0": { + "page": "control/mtfsb0x.md", + "family": "mtfsb0x", + "xml_mnem": "mtfsb0x", + "opcode_hex": "0xFC00008C", + "primary_opcode": 63, + "extended_opcode": 70, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to FPSCR Bit 0", + "sync": false, + "reads": [], + "writes": [ + { + "field": "FPSCRD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtfsb0.": { + "page": "control/mtfsb0x.md", + "family": "mtfsb0x", + "variant_of": "mtfsb0", + "xml_mnem": "mtfsb0x", + "flags": { + "Rc": 1 + }, + "category": "control" + }, + "mtfsb1": { + "page": "control/mtfsb1x.md", + "family": "mtfsb1x", + "xml_mnem": "mtfsb1x", + "opcode_hex": "0xFC00004C", + "primary_opcode": 63, + "extended_opcode": 38, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to FPSCR Bit 1", + "sync": false, + "reads": [], + "writes": [ + { + "field": "FPSCRD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtfsb1.": { + "page": "control/mtfsb1x.md", + "family": "mtfsb1x", + "variant_of": "mtfsb1", + "xml_mnem": "mtfsb1x", + "flags": { + "Rc": 1 + }, + "category": "control" + }, + "mtfsf": { + "page": "control/mtfsfx.md", + "family": "mtfsfx", + "xml_mnem": "mtfsfx", + "opcode_hex": "0xFC00058E", + "primary_opcode": 63, + "extended_opcode": 711, + "form": "XFL", + "group": "control", + "category": "control", + "description": "Move to FPSCR Fields", + "sync": false, + "reads": [ + { + "field": "FM", + "conditional": false + }, + { + "field": "FB", + "conditional": false + } + ], + "writes": [ + { + "field": "FPSCR", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtfsf.": { + "page": "control/mtfsfx.md", + "family": "mtfsfx", + "variant_of": "mtfsf", + "xml_mnem": "mtfsfx", + "flags": { + "Rc": 1 + }, + "category": "control" + }, + "mtfsfi": { + "page": "control/mtfsfix.md", + "family": "mtfsfix", + "xml_mnem": "mtfsfix", + "opcode_hex": "0xFC00010C", + "primary_opcode": 63, + "extended_opcode": 134, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to FPSCR Field Immediate", + "sync": false, + "reads": [ + { + "field": "IMM", + "conditional": false + } + ], + "writes": [ + { + "field": "CRFD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtfsfi.": { + "page": "control/mtfsfix.md", + "family": "mtfsfix", + "variant_of": "mtfsfi", + "xml_mnem": "mtfsfix", + "flags": { + "Rc": 1 + }, + "category": "control" + }, + "mtmsr": { + "page": "control/mtmsr.md", + "family": "mtmsr", + "xml_mnem": "mtmsr", + "opcode_hex": "0x7C000124", + "primary_opcode": 31, + "extended_opcode": 146, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to Machine State Register", + "sync": true, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "MSR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtmsrd": { + "page": "control/mtmsrd.md", + "family": "mtmsrd", + "xml_mnem": "mtmsrd", + "opcode_hex": "0x7C000164", + "primary_opcode": 31, + "extended_opcode": 178, + "form": "X", + "group": "control", + "category": "control", + "description": "Move to Machine State Register Doubleword", + "sync": true, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "MSR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtspr": { + "page": "control/mtspr.md", + "family": "mtspr", + "xml_mnem": "mtspr", + "opcode_hex": "0x7C0003A6", + "primary_opcode": 31, + "extended_opcode": 467, + "form": "XFX", + "group": "control", + "category": "control", + "description": "Move to Special-Purpose Register", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + } + ], + "writes": [ + { + "field": "SPR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mtvscr": { + "page": "control/mtvscr.md", + "family": "mtvscr", + "xml_mnem": "mtvscr", + "opcode_hex": "0x10000644", + "primary_opcode": 4, + "extended_opcode": 1604, + "form": "VX", + "group": "control", + "category": "control", + "description": "Move to VSCR", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulhd": { + "page": "alu/mulhdx.md", + "family": "mulhdx", + "xml_mnem": "mulhdx", + "opcode_hex": "0x7C000092", + "primary_opcode": 31, + "extended_opcode": 73, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply High Doubleword", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulhd.": { + "page": "alu/mulhdx.md", + "family": "mulhdx", + "variant_of": "mulhd", + "xml_mnem": "mulhdx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mulhdu": { + "page": "alu/mulhdux.md", + "family": "mulhdux", + "xml_mnem": "mulhdux", + "opcode_hex": "0x7C000012", + "primary_opcode": 31, + "extended_opcode": 9, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply High Doubleword Unsigned", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulhdu.": { + "page": "alu/mulhdux.md", + "family": "mulhdux", + "variant_of": "mulhdu", + "xml_mnem": "mulhdux", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mulhw": { + "page": "alu/mulhwx.md", + "family": "mulhwx", + "xml_mnem": "mulhwx", + "opcode_hex": "0x7C000096", + "primary_opcode": 31, + "extended_opcode": 75, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply High Word", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulhw.": { + "page": "alu/mulhwx.md", + "family": "mulhwx", + "variant_of": "mulhw", + "xml_mnem": "mulhwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mulhwu": { + "page": "alu/mulhwux.md", + "family": "mulhwux", + "xml_mnem": "mulhwux", + "opcode_hex": "0x7C000016", + "primary_opcode": 31, + "extended_opcode": 11, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply High Word Unsigned", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulhwu.": { + "page": "alu/mulhwux.md", + "family": "mulhwux", + "variant_of": "mulhwu", + "xml_mnem": "mulhwux", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mulld": { + "page": "alu/mulldx.md", + "family": "mulldx", + "xml_mnem": "mulldx", + "opcode_hex": "0x7C0001D2", + "primary_opcode": 31, + "extended_opcode": 233, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply Low Doubleword", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mulld.": { + "page": "alu/mulldx.md", + "family": "mulldx", + "variant_of": "mulld", + "xml_mnem": "mulldx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mulldo": { + "page": "alu/mulldx.md", + "family": "mulldx", + "variant_of": "mulld", + "xml_mnem": "mulldx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "mulldo.": { + "page": "alu/mulldx.md", + "family": "mulldx", + "variant_of": "mulld", + "xml_mnem": "mulldx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "mulli": { + "page": "alu/mulli.md", + "family": "mulli", + "xml_mnem": "mulli", + "opcode_hex": "0x1C000000", + "primary_opcode": 7, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Multiply Low Immediate", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mullw": { + "page": "alu/mullwx.md", + "family": "mullwx", + "xml_mnem": "mullwx", + "opcode_hex": "0x7C0001D6", + "primary_opcode": 31, + "extended_opcode": 235, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Multiply Low Word", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "mullw.": { + "page": "alu/mullwx.md", + "family": "mullwx", + "variant_of": "mullw", + "xml_mnem": "mullwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "mullwo": { + "page": "alu/mullwx.md", + "family": "mullwx", + "variant_of": "mullw", + "xml_mnem": "mullwx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "mullwo.": { + "page": "alu/mullwx.md", + "family": "mullwx", + "variant_of": "mullw", + "xml_mnem": "mullwx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "nand": { + "page": "alu/nandx.md", + "family": "nandx", + "xml_mnem": "nandx", + "opcode_hex": "0x7C0003B8", + "primary_opcode": 31, + "extended_opcode": 476, + "form": "X", + "group": "integer", + "category": "alu", + "description": "NAND", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "nand.": { + "page": "alu/nandx.md", + "family": "nandx", + "variant_of": "nand", + "xml_mnem": "nandx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "neg": { + "page": "alu/negx.md", + "family": "negx", + "xml_mnem": "negx", + "opcode_hex": "0x7C0000D0", + "primary_opcode": 31, + "extended_opcode": 104, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Negate", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "neg.": { + "page": "alu/negx.md", + "family": "negx", + "variant_of": "neg", + "xml_mnem": "negx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "nego": { + "page": "alu/negx.md", + "family": "negx", + "variant_of": "neg", + "xml_mnem": "negx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "nego.": { + "page": "alu/negx.md", + "family": "negx", + "variant_of": "neg", + "xml_mnem": "negx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "nor": { + "page": "alu/norx.md", + "family": "norx", + "xml_mnem": "norx", + "opcode_hex": "0x7C0000F8", + "primary_opcode": 31, + "extended_opcode": 124, + "form": "X", + "group": "integer", + "category": "alu", + "description": "NOR", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "nor.": { + "page": "alu/norx.md", + "family": "norx", + "variant_of": "nor", + "xml_mnem": "norx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "or": { + "page": "alu/orx.md", + "family": "orx", + "xml_mnem": "orx", + "opcode_hex": "0x7C000378", + "primary_opcode": 31, + "extended_opcode": 444, + "form": "X", + "group": "integer", + "category": "alu", + "description": "OR", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "or.": { + "page": "alu/orx.md", + "family": "orx", + "variant_of": "or", + "xml_mnem": "orx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "orc": { + "page": "alu/orcx.md", + "family": "orcx", + "xml_mnem": "orcx", + "opcode_hex": "0x7C000338", + "primary_opcode": 31, + "extended_opcode": 412, + "form": "X", + "group": "integer", + "category": "alu", + "description": "OR with Complement", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "orc.": { + "page": "alu/orcx.md", + "family": "orcx", + "variant_of": "orc", + "xml_mnem": "orcx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "ori": { + "page": "alu/ori.md", + "family": "ori", + "xml_mnem": "ori", + "opcode_hex": "0x60000000", + "primary_opcode": 24, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "OR Immediate", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "oris": { + "page": "alu/oris.md", + "family": "oris", + "xml_mnem": "oris", + "opcode_hex": "0x64000000", + "primary_opcode": 25, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "OR Immediate Shifted", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldcl": { + "page": "alu/rldclx.md", + "family": "rldclx", + "xml_mnem": "rldclx", + "opcode_hex": "0x78000010", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MDS", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword then Clear Left", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + }, + { + "field": "MB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldcl.": { + "page": "alu/rldclx.md", + "family": "rldclx", + "variant_of": "rldcl", + "xml_mnem": "rldclx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rldcr": { + "page": "alu/rldcrx.md", + "family": "rldcrx", + "xml_mnem": "rldcrx", + "opcode_hex": "0x78000012", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MDS", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword then Clear Right", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + }, + { + "field": "ME", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldcr.": { + "page": "alu/rldcrx.md", + "family": "rldcrx", + "variant_of": "rldcr", + "xml_mnem": "rldcrx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rldic": { + "page": "alu/rldicx.md", + "family": "rldicx", + "xml_mnem": "rldicx", + "opcode_hex": "0x78000008", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MD", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword Immediate then Clear", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "MB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldic.": { + "page": "alu/rldicx.md", + "family": "rldicx", + "variant_of": "rldic", + "xml_mnem": "rldicx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rldicl": { + "page": "alu/rldiclx.md", + "family": "rldiclx", + "xml_mnem": "rldiclx", + "opcode_hex": "0x78000000", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MD", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword Immediate then Clear Left", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "MB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldicl.": { + "page": "alu/rldiclx.md", + "family": "rldiclx", + "variant_of": "rldicl", + "xml_mnem": "rldiclx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rldicr": { + "page": "alu/rldicrx.md", + "family": "rldicrx", + "xml_mnem": "rldicrx", + "opcode_hex": "0x78000004", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MD", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword Immediate then Clear Right", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "ME", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldicr.": { + "page": "alu/rldicrx.md", + "family": "rldicrx", + "variant_of": "rldicr", + "xml_mnem": "rldicrx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rldimi": { + "page": "alu/rldimix.md", + "family": "rldimix", + "xml_mnem": "rldimix", + "opcode_hex": "0x7800000C", + "primary_opcode": 30, + "extended_opcode": null, + "form": "MD", + "group": "integer", + "category": "alu", + "description": "Rotate Left Doubleword Immediate then Mask Insert", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "MB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rldimi.": { + "page": "alu/rldimix.md", + "family": "rldimix", + "variant_of": "rldimi", + "xml_mnem": "rldimix", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rlwimi": { + "page": "alu/rlwimix.md", + "family": "rlwimix", + "xml_mnem": "rlwimix", + "opcode_hex": "0x50000000", + "primary_opcode": 20, + "extended_opcode": null, + "form": "M", + "group": "integer", + "category": "alu", + "description": "Rotate Left Word Immediate then Mask Insert", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "MB", + "conditional": false + }, + { + "field": "ME", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rlwimi.": { + "page": "alu/rlwimix.md", + "family": "rlwimix", + "variant_of": "rlwimi", + "xml_mnem": "rlwimix", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rlwinm": { + "page": "alu/rlwinmx.md", + "family": "rlwinmx", + "xml_mnem": "rlwinmx", + "opcode_hex": "0x54000000", + "primary_opcode": 21, + "extended_opcode": null, + "form": "M", + "group": "integer", + "category": "alu", + "description": "Rotate Left Word Immediate then AND with Mask", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + }, + { + "field": "MB", + "conditional": false + }, + { + "field": "ME", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rlwinm.": { + "page": "alu/rlwinmx.md", + "family": "rlwinmx", + "variant_of": "rlwinm", + "xml_mnem": "rlwinmx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "rlwnm": { + "page": "alu/rlwnmx.md", + "family": "rlwnmx", + "xml_mnem": "rlwnmx", + "opcode_hex": "0x5C000000", + "primary_opcode": 23, + "extended_opcode": null, + "form": "M", + "group": "integer", + "category": "alu", + "description": "Rotate Left Word then AND with Mask", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + }, + { + "field": "MB", + "conditional": false + }, + { + "field": "ME", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "rlwnm.": { + "page": "alu/rlwnmx.md", + "family": "rlwnmx", + "variant_of": "rlwnm", + "xml_mnem": "rlwnmx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "sc": { + "page": "branch/sc.md", + "family": "sc", + "xml_mnem": "sc", + "opcode_hex": "0x44000002", + "primary_opcode": 17, + "extended_opcode": null, + "form": "SC", + "group": "branch", + "category": "branch", + "description": "System Call", + "sync": true, + "reads": [ + { + "field": "LEV", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sld": { + "page": "alu/sldx.md", + "family": "sldx", + "xml_mnem": "sldx", + "opcode_hex": "0x7C000036", + "primary_opcode": 31, + "extended_opcode": 27, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Left Doubleword", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sld.": { + "page": "alu/sldx.md", + "family": "sldx", + "variant_of": "sld", + "xml_mnem": "sldx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "slw": { + "page": "alu/slwx.md", + "family": "slwx", + "xml_mnem": "slwx", + "opcode_hex": "0x7C000030", + "primary_opcode": 31, + "extended_opcode": 24, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Left Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "slw.": { + "page": "alu/slwx.md", + "family": "slwx", + "variant_of": "slw", + "xml_mnem": "slwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "srad": { + "page": "alu/sradx.md", + "family": "sradx", + "xml_mnem": "sradx", + "opcode_hex": "0x7C000634", + "primary_opcode": 31, + "extended_opcode": 794, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Right Algebraic Doubleword", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "srad.": { + "page": "alu/sradx.md", + "family": "sradx", + "variant_of": "srad", + "xml_mnem": "sradx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "sradi": { + "page": "alu/sradix.md", + "family": "sradix", + "xml_mnem": "sradix", + "opcode_hex": "0x7C000674", + "primary_opcode": 31, + "extended_opcode": 826, + "form": "XS", + "group": "integer", + "category": "alu", + "description": "Shift Right Algebraic Doubleword Immediate", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sradi.": { + "page": "alu/sradix.md", + "family": "sradix", + "variant_of": "sradi", + "xml_mnem": "sradix", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "sraw": { + "page": "alu/srawx.md", + "family": "srawx", + "xml_mnem": "srawx", + "opcode_hex": "0x7C000630", + "primary_opcode": 31, + "extended_opcode": 792, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Right Algebraic Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sraw.": { + "page": "alu/srawx.md", + "family": "srawx", + "variant_of": "sraw", + "xml_mnem": "srawx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "srawi": { + "page": "alu/srawix.md", + "family": "srawix", + "xml_mnem": "srawix", + "opcode_hex": "0x7C000670", + "primary_opcode": 31, + "extended_opcode": 824, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Right Algebraic Word Immediate", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "SH", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "srawi.": { + "page": "alu/srawix.md", + "family": "srawix", + "variant_of": "srawi", + "xml_mnem": "srawix", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "srd": { + "page": "alu/srdx.md", + "family": "srdx", + "xml_mnem": "srdx", + "opcode_hex": "0x7C000436", + "primary_opcode": 31, + "extended_opcode": 539, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Right Doubleword", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "srd.": { + "page": "alu/srdx.md", + "family": "srdx", + "variant_of": "srd", + "xml_mnem": "srdx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "srw": { + "page": "alu/srwx.md", + "family": "srwx", + "xml_mnem": "srwx", + "opcode_hex": "0x7C000430", + "primary_opcode": 31, + "extended_opcode": 536, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Shift Right Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "srw.": { + "page": "alu/srwx.md", + "family": "srwx", + "variant_of": "srw", + "xml_mnem": "srwx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "stb": { + "page": "memory/stb.md", + "family": "stb", + "xml_mnem": "stb", + "opcode_hex": "0x98000000", + "primary_opcode": 38, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Byte", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stbu": { + "page": "memory/stb.md", + "family": "stb", + "xml_mnem": "stbu", + "opcode_hex": "0x9C000000", + "primary_opcode": 39, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Byte with Update", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stbux": { + "page": "memory/stb.md", + "family": "stb", + "xml_mnem": "stbux", + "opcode_hex": "0x7C0001EE", + "primary_opcode": 31, + "extended_opcode": 247, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Byte with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stbx": { + "page": "memory/stb.md", + "family": "stb", + "xml_mnem": "stbx", + "opcode_hex": "0x7C0001AE", + "primary_opcode": 31, + "extended_opcode": 215, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Byte Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "std": { + "page": "memory/std.md", + "family": "std", + "xml_mnem": "std", + "opcode_hex": "0xF8000000", + "primary_opcode": 62, + "extended_opcode": null, + "form": "DS", + "group": "memory", + "category": "memory", + "description": "Store Doubleword", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "ds", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stdbrx": { + "page": "memory/stdbrx.md", + "family": "stdbrx", + "xml_mnem": "stdbrx", + "opcode_hex": "0x7C000528", + "primary_opcode": 31, + "extended_opcode": 660, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Doubleword Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stdcx": { + "page": "memory/stdcx.md", + "family": "stdcx", + "xml_mnem": "stdcx", + "opcode_hex": "0x7C0001AD", + "primary_opcode": 31, + "extended_opcode": 214, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Doubleword Conditional Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "CR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": true + }, + "is_primary": true, + "flags": {} + }, + "stdu": { + "page": "memory/std.md", + "family": "std", + "xml_mnem": "stdu", + "opcode_hex": "0xF8000001", + "primary_opcode": 62, + "extended_opcode": null, + "form": "DS", + "group": "memory", + "category": "memory", + "description": "Store Doubleword with Update", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "ds", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stdux": { + "page": "memory/std.md", + "family": "std", + "xml_mnem": "stdux", + "opcode_hex": "0x7C00016A", + "primary_opcode": 31, + "extended_opcode": 181, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Doubleword with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stdx": { + "page": "memory/std.md", + "family": "std", + "xml_mnem": "stdx", + "opcode_hex": "0x7C00012A", + "primary_opcode": 31, + "extended_opcode": 149, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Doubleword Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfd": { + "page": "memory/stfd.md", + "family": "stfd", + "xml_mnem": "stfd", + "opcode_hex": "0xD8000000", + "primary_opcode": 54, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Double", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfdu": { + "page": "memory/stfd.md", + "family": "stfd", + "xml_mnem": "stfdu", + "opcode_hex": "0xDC000000", + "primary_opcode": 55, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Double with Update", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfdux": { + "page": "memory/stfd.md", + "family": "stfd", + "xml_mnem": "stfdux", + "opcode_hex": "0x7C0005EE", + "primary_opcode": 31, + "extended_opcode": 759, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Double with Update Indexed", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfdx": { + "page": "memory/stfd.md", + "family": "stfd", + "xml_mnem": "stfdx", + "opcode_hex": "0x7C0005AE", + "primary_opcode": 31, + "extended_opcode": 727, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Double Indexed", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfiwx": { + "page": "memory/stfiwx.md", + "family": "stfiwx", + "xml_mnem": "stfiwx", + "opcode_hex": "0x7C0007AE", + "primary_opcode": 31, + "extended_opcode": 983, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point as Integer Word Indexed", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfs": { + "page": "memory/stfs.md", + "family": "stfs", + "xml_mnem": "stfs", + "opcode_hex": "0xD0000000", + "primary_opcode": 52, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Single", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfsu": { + "page": "memory/stfs.md", + "family": "stfs", + "xml_mnem": "stfsu", + "opcode_hex": "0xD4000000", + "primary_opcode": 53, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Single with Update", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfsux": { + "page": "memory/stfs.md", + "family": "stfs", + "xml_mnem": "stfsux", + "opcode_hex": "0x7C00056E", + "primary_opcode": 31, + "extended_opcode": 695, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Single with Update Indexed", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stfsx": { + "page": "memory/stfs.md", + "family": "stfs", + "xml_mnem": "stfsx", + "opcode_hex": "0x7C00052E", + "primary_opcode": 31, + "extended_opcode": 663, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Floating-Point Single Indexed", + "sync": false, + "reads": [ + { + "field": "FS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sth": { + "page": "memory/sth.md", + "family": "sth", + "xml_mnem": "sth", + "opcode_hex": "0xB0000000", + "primary_opcode": 44, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Half Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sthbrx": { + "page": "memory/sthbrx.md", + "family": "sthbrx", + "xml_mnem": "sthbrx", + "opcode_hex": "0x7C00072C", + "primary_opcode": 31, + "extended_opcode": 918, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Half Word Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sthu": { + "page": "memory/sth.md", + "family": "sth", + "xml_mnem": "sthu", + "opcode_hex": "0xB4000000", + "primary_opcode": 45, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Half Word with Update", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sthux": { + "page": "memory/sth.md", + "family": "sth", + "xml_mnem": "sthux", + "opcode_hex": "0x7C00036E", + "primary_opcode": 31, + "extended_opcode": 439, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Half Word with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "sthx": { + "page": "memory/sth.md", + "family": "sth", + "xml_mnem": "sthx", + "opcode_hex": "0x7C00032E", + "primary_opcode": 31, + "extended_opcode": 407, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Half Word Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stmw": { + "page": "memory/stmw.md", + "family": "stmw", + "xml_mnem": "stmw", + "opcode_hex": "0xBC000000", + "primary_opcode": 47, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Multiple Word", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stswi": { + "page": "memory/stswi.md", + "family": "stswi", + "xml_mnem": "stswi", + "opcode_hex": "0x7C0005AA", + "primary_opcode": 31, + "extended_opcode": 725, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store String Word Immediate", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stswx": { + "page": "memory/stswx.md", + "family": "stswx", + "xml_mnem": "stswx", + "opcode_hex": "0x7C00052A", + "primary_opcode": 31, + "extended_opcode": 661, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store String Word Indexed", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvebx": { + "page": "memory/stvebx.md", + "family": "stvebx", + "xml_mnem": "stvebx", + "opcode_hex": "0x7C00010E", + "primary_opcode": 31, + "extended_opcode": 135, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Element Byte Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvehx": { + "page": "memory/stvehx.md", + "family": "stvehx", + "xml_mnem": "stvehx", + "opcode_hex": "0x7C00014E", + "primary_opcode": 31, + "extended_opcode": 167, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Element Half Word Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvewx": { + "page": "memory/stvewx.md", + "family": "stvewx", + "xml_mnem": "stvewx", + "opcode_hex": "0x7C00018E", + "primary_opcode": 31, + "extended_opcode": 199, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Element Word Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvewx128": { + "page": "memory/stvewx.md", + "family": "stvewx", + "xml_mnem": "stvewx128", + "opcode_hex": "0x10000183", + "primary_opcode": 4, + "extended_opcode": 387, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Element Word Indexed 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvlx": { + "page": "memory/stvlx.md", + "family": "stvlx", + "xml_mnem": "stvlx", + "opcode_hex": "0x7C00050E", + "primary_opcode": 31, + "extended_opcode": 647, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Left Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvlx128": { + "page": "memory/stvlx.md", + "family": "stvlx", + "xml_mnem": "stvlx128", + "opcode_hex": "0x10000503", + "primary_opcode": 4, + "extended_opcode": 1283, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Left Indexed 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvlxl": { + "page": "memory/stvlxl.md", + "family": "stvlxl", + "xml_mnem": "stvlxl", + "opcode_hex": "0x7C00070E", + "primary_opcode": 31, + "extended_opcode": 903, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Left Indexed LRU", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvlxl128": { + "page": "memory/stvlxl.md", + "family": "stvlxl", + "xml_mnem": "stvlxl128", + "opcode_hex": "0x10000703", + "primary_opcode": 4, + "extended_opcode": 1795, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Left Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvrx": { + "page": "memory/stvrx.md", + "family": "stvrx", + "xml_mnem": "stvrx", + "opcode_hex": "0x7C00054E", + "primary_opcode": 31, + "extended_opcode": 679, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Right Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvrx128": { + "page": "memory/stvrx.md", + "family": "stvrx", + "xml_mnem": "stvrx128", + "opcode_hex": "0x10000543", + "primary_opcode": 4, + "extended_opcode": 1347, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Right Indexed 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvrxl": { + "page": "memory/stvrxl.md", + "family": "stvrxl", + "xml_mnem": "stvrxl", + "opcode_hex": "0x7C00074E", + "primary_opcode": 31, + "extended_opcode": 935, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Right Indexed LRU", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvrxl128": { + "page": "memory/stvrxl.md", + "family": "stvrxl", + "xml_mnem": "stvrxl128", + "opcode_hex": "0x10000743", + "primary_opcode": 4, + "extended_opcode": 1859, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Right Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvx": { + "page": "memory/stvx.md", + "family": "stvx", + "xml_mnem": "stvx", + "opcode_hex": "0x7C0001CE", + "primary_opcode": 31, + "extended_opcode": 231, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Indexed", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvx128": { + "page": "memory/stvx.md", + "family": "stvx", + "xml_mnem": "stvx128", + "opcode_hex": "0x100001C3", + "primary_opcode": 4, + "extended_opcode": 451, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Indexed 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvxl": { + "page": "memory/stvxl.md", + "family": "stvxl", + "xml_mnem": "stvxl", + "opcode_hex": "0x7C0003CE", + "primary_opcode": 31, + "extended_opcode": 487, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Vector Indexed LRU", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stvxl128": { + "page": "memory/stvxl.md", + "family": "stvxl", + "xml_mnem": "stvxl128", + "opcode_hex": "0x100003C3", + "primary_opcode": 4, + "extended_opcode": 963, + "form": "VX128_1", + "group": "memory", + "category": "memory", + "description": "Store Vector Indexed LRU 128", + "sync": false, + "reads": [ + { + "field": "VS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stw": { + "page": "memory/stw.md", + "family": "stw", + "xml_mnem": "stw", + "opcode_hex": "0x90000000", + "primary_opcode": 36, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Word", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stwbrx": { + "page": "memory/stwbrx.md", + "family": "stwbrx", + "xml_mnem": "stwbrx", + "opcode_hex": "0x7C00052C", + "primary_opcode": 31, + "extended_opcode": 662, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Word Byte-Reverse Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stwcx": { + "page": "memory/stwcx.md", + "family": "stwcx", + "xml_mnem": "stwcx", + "opcode_hex": "0x7C00012D", + "primary_opcode": 31, + "extended_opcode": 150, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Word Conditional Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "CR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": true + }, + "is_primary": true, + "flags": {} + }, + "stwu": { + "page": "memory/stw.md", + "family": "stw", + "xml_mnem": "stwu", + "opcode_hex": "0x94000000", + "primary_opcode": 37, + "extended_opcode": null, + "form": "D", + "group": "memory", + "category": "memory", + "description": "Store Word with Update", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "d", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stwux": { + "page": "memory/stw.md", + "family": "stw", + "xml_mnem": "stwux", + "opcode_hex": "0x7C00016E", + "primary_opcode": 31, + "extended_opcode": 183, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Word with Update Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "stwx": { + "page": "memory/stw.md", + "family": "stw", + "xml_mnem": "stwx", + "opcode_hex": "0x7C00012E", + "primary_opcode": 31, + "extended_opcode": 151, + "form": "X", + "group": "memory", + "category": "memory", + "description": "Store Word Indexed", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RA0", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subf": { + "page": "alu/subfx.md", + "family": "subfx", + "xml_mnem": "subfx", + "opcode_hex": "0x7C000050", + "primary_opcode": 31, + "extended_opcode": 40, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Subtract From", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subf.": { + "page": "alu/subfx.md", + "family": "subfx", + "variant_of": "subf", + "xml_mnem": "subfx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "subfc": { + "page": "alu/subfcx.md", + "family": "subfcx", + "xml_mnem": "subfcx", + "opcode_hex": "0x7C000010", + "primary_opcode": 31, + "extended_opcode": 8, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Subtract From Carrying", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subfc.": { + "page": "alu/subfcx.md", + "family": "subfcx", + "variant_of": "subfc", + "xml_mnem": "subfcx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "subfco": { + "page": "alu/subfcx.md", + "family": "subfcx", + "variant_of": "subfc", + "xml_mnem": "subfcx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "subfco.": { + "page": "alu/subfcx.md", + "family": "subfcx", + "variant_of": "subfc", + "xml_mnem": "subfcx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "subfe": { + "page": "alu/subfex.md", + "family": "subfex", + "xml_mnem": "subfex", + "opcode_hex": "0x7C000110", + "primary_opcode": 31, + "extended_opcode": 136, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Subtract From Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subfe.": { + "page": "alu/subfex.md", + "family": "subfex", + "variant_of": "subfe", + "xml_mnem": "subfex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "subfeo": { + "page": "alu/subfex.md", + "family": "subfex", + "variant_of": "subfe", + "xml_mnem": "subfex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "subfeo.": { + "page": "alu/subfex.md", + "family": "subfex", + "variant_of": "subfe", + "xml_mnem": "subfex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "subfic": { + "page": "alu/subficx.md", + "family": "subficx", + "xml_mnem": "subficx", + "opcode_hex": "0x20000000", + "primary_opcode": 8, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "Subtract From Immediate Carrying", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subfme": { + "page": "alu/subfmex.md", + "family": "subfmex", + "xml_mnem": "subfmex", + "opcode_hex": "0x7C0001D0", + "primary_opcode": 31, + "extended_opcode": 232, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Subtract From Minus One Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subfme.": { + "page": "alu/subfmex.md", + "family": "subfmex", + "variant_of": "subfme", + "xml_mnem": "subfmex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "subfmeo": { + "page": "alu/subfmex.md", + "family": "subfmex", + "variant_of": "subfme", + "xml_mnem": "subfmex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "subfmeo.": { + "page": "alu/subfmex.md", + "family": "subfmex", + "variant_of": "subfme", + "xml_mnem": "subfmex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "subfo": { + "page": "alu/subfx.md", + "family": "subfx", + "variant_of": "subf", + "xml_mnem": "subfx", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "subfo.": { + "page": "alu/subfx.md", + "family": "subfx", + "variant_of": "subf", + "xml_mnem": "subfx", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "subfze": { + "page": "alu/subfzex.md", + "family": "subfzex", + "xml_mnem": "subfzex", + "opcode_hex": "0x7C000190", + "primary_opcode": 31, + "extended_opcode": 200, + "form": "XO", + "group": "integer", + "category": "alu", + "description": "Subtract From Zero Extended", + "sync": false, + "reads": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CA", + "conditional": false + } + ], + "writes": [ + { + "field": "RD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + }, + { + "field": "OE", + "conditional": true + }, + { + "field": "CA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": true, + "OE": true, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "subfze.": { + "page": "alu/subfzex.md", + "family": "subfzex", + "variant_of": "subfze", + "xml_mnem": "subfzex", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "subfzeo": { + "page": "alu/subfzex.md", + "family": "subfzex", + "variant_of": "subfze", + "xml_mnem": "subfzex", + "flags": { + "OE": 1 + }, + "category": "alu" + }, + "subfzeo.": { + "page": "alu/subfzex.md", + "family": "subfzex", + "variant_of": "subfze", + "xml_mnem": "subfzex", + "flags": { + "OE": 1, + "Rc": 1 + }, + "category": "alu" + }, + "sync": { + "page": "alu/sync.md", + "family": "sync", + "xml_mnem": "sync", + "opcode_hex": "0x7C0004AC", + "primary_opcode": 31, + "extended_opcode": 598, + "form": "X", + "group": "integer", + "category": "alu", + "description": "Synchronize", + "sync": false, + "reads": [], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "td": { + "page": "branch/td.md", + "family": "td", + "xml_mnem": "td", + "opcode_hex": "0x7C000088", + "primary_opcode": 31, + "extended_opcode": 68, + "form": "X", + "group": "branch", + "category": "branch", + "description": "Trap Doubleword", + "sync": false, + "reads": [ + { + "field": "TO", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "tdi": { + "page": "branch/tdi.md", + "family": "tdi", + "xml_mnem": "tdi", + "opcode_hex": "0x08000000", + "primary_opcode": 2, + "extended_opcode": null, + "form": "D", + "group": "branch", + "category": "branch", + "description": "Trap Doubleword Immediate", + "sync": false, + "reads": [ + { + "field": "TO", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "tw": { + "page": "branch/tw.md", + "family": "tw", + "xml_mnem": "tw", + "opcode_hex": "0x7C000008", + "primary_opcode": 31, + "extended_opcode": 4, + "form": "X", + "group": "branch", + "category": "branch", + "description": "Trap Word", + "sync": false, + "reads": [ + { + "field": "TO", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "twi": { + "page": "branch/twi.md", + "family": "twi", + "xml_mnem": "twi", + "opcode_hex": "0x0C000000", + "primary_opcode": 3, + "extended_opcode": null, + "form": "D", + "group": "branch", + "category": "branch", + "description": "Trap Word Immediate", + "sync": false, + "reads": [ + { + "field": "TO", + "conditional": false + }, + { + "field": "RA", + "conditional": false + }, + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddcuw": { + "page": "vmx/vaddcuw.md", + "family": "vaddcuw", + "xml_mnem": "vaddcuw", + "opcode_hex": "0x10000180", + "primary_opcode": 4, + "extended_opcode": 384, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Carryout Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddfp": { + "page": "vmx/vaddfp.md", + "family": "vaddfp", + "xml_mnem": "vaddfp", + "opcode_hex": "0x1000000A", + "primary_opcode": 4, + "extended_opcode": 10, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddfp128": { + "page": "vmx/vaddfp.md", + "family": "vaddfp", + "xml_mnem": "vaddfp128", + "opcode_hex": "0x14000010", + "primary_opcode": 5, + "extended_opcode": 16, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Add Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddsbs": { + "page": "vmx/vaddsbs.md", + "family": "vaddsbs", + "xml_mnem": "vaddsbs", + "opcode_hex": "0x10000300", + "primary_opcode": 4, + "extended_opcode": 768, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Signed Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddshs": { + "page": "vmx/vaddshs.md", + "family": "vaddshs", + "xml_mnem": "vaddshs", + "opcode_hex": "0x10000340", + "primary_opcode": 4, + "extended_opcode": 832, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddsws": { + "page": "vmx/vaddsws.md", + "family": "vaddsws", + "xml_mnem": "vaddsws", + "opcode_hex": "0x10000380", + "primary_opcode": 4, + "extended_opcode": 896, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Signed Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddubm": { + "page": "vmx/vaddubm.md", + "family": "vaddubm", + "xml_mnem": "vaddubm", + "opcode_hex": "0x10000000", + "primary_opcode": 4, + "extended_opcode": 0, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Byte Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vaddubs": { + "page": "vmx/vaddubs.md", + "family": "vaddubs", + "xml_mnem": "vaddubs", + "opcode_hex": "0x10000200", + "primary_opcode": 4, + "extended_opcode": 512, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vadduhm": { + "page": "vmx/vadduhm.md", + "family": "vadduhm", + "xml_mnem": "vadduhm", + "opcode_hex": "0x10000040", + "primary_opcode": 4, + "extended_opcode": 64, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Half Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vadduhs": { + "page": "vmx/vadduhs.md", + "family": "vadduhs", + "xml_mnem": "vadduhs", + "opcode_hex": "0x10000240", + "primary_opcode": 4, + "extended_opcode": 576, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vadduwm": { + "page": "vmx/vadduwm.md", + "family": "vadduwm", + "xml_mnem": "vadduwm", + "opcode_hex": "0x10000080", + "primary_opcode": 4, + "extended_opcode": 128, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vadduws": { + "page": "vmx/vadduws.md", + "family": "vadduws", + "xml_mnem": "vadduws", + "opcode_hex": "0x10000280", + "primary_opcode": 4, + "extended_opcode": 640, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Add Unsigned Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vand": { + "page": "vmx/vand.md", + "family": "vand", + "xml_mnem": "vand", + "opcode_hex": "0x10000404", + "primary_opcode": 4, + "extended_opcode": 1028, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Logical AND", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vand128": { + "page": "vmx/vand.md", + "family": "vand", + "xml_mnem": "vand128", + "opcode_hex": "0x14000210", + "primary_opcode": 5, + "extended_opcode": 528, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Logical AND", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vandc": { + "page": "vmx/vandc.md", + "family": "vandc", + "xml_mnem": "vandc", + "opcode_hex": "0x10000444", + "primary_opcode": 4, + "extended_opcode": 1092, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Logical AND with Complement", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vandc128": { + "page": "vmx/vandc.md", + "family": "vandc", + "xml_mnem": "vandc128", + "opcode_hex": "0x14000250", + "primary_opcode": 5, + "extended_opcode": 592, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Logical AND with Complement", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavgsb": { + "page": "vmx/vavgsb.md", + "family": "vavgsb", + "xml_mnem": "vavgsb", + "opcode_hex": "0x10000502", + "primary_opcode": 4, + "extended_opcode": 1282, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavgsh": { + "page": "vmx/vavgsh.md", + "family": "vavgsh", + "xml_mnem": "vavgsh", + "opcode_hex": "0x10000542", + "primary_opcode": 4, + "extended_opcode": 1346, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavgsw": { + "page": "vmx/vavgsw.md", + "family": "vavgsw", + "xml_mnem": "vavgsw", + "opcode_hex": "0x10000582", + "primary_opcode": 4, + "extended_opcode": 1410, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Signed Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavgub": { + "page": "vmx/vavgub.md", + "family": "vavgub", + "xml_mnem": "vavgub", + "opcode_hex": "0x10000402", + "primary_opcode": 4, + "extended_opcode": 1026, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavguh": { + "page": "vmx/vavguh.md", + "family": "vavguh", + "xml_mnem": "vavguh", + "opcode_hex": "0x10000442", + "primary_opcode": 4, + "extended_opcode": 1090, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vavguw": { + "page": "vmx/vavguw.md", + "family": "vavguw", + "xml_mnem": "vavguw", + "opcode_hex": "0x10000482", + "primary_opcode": 4, + "extended_opcode": 1154, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Average Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcfpsxws128": { + "page": "vmx128/vcfpsxws128.md", + "family": "vcfpsxws128", + "xml_mnem": "vcfpsxws128", + "opcode_hex": "0x18000230", + "primary_opcode": 6, + "extended_opcode": 560, + "form": "VX128_3", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcfpuxws128": { + "page": "vmx128/vcfpuxws128.md", + "family": "vcfpuxws128", + "xml_mnem": "vcfpuxws128", + "opcode_hex": "0x18000270", + "primary_opcode": 6, + "extended_opcode": 624, + "form": "VX128_3", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcfs": { + "page": "vmx/vcfsx.md", + "family": "vcfsx", + "xml_mnem": "vcfsx", + "opcode_hex": "0x1000034A", + "primary_opcode": 4, + "extended_opcode": 842, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Convert from Signed Fixed-Point Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcfu": { + "page": "vmx/vcfux.md", + "family": "vcfux", + "xml_mnem": "vcfux", + "opcode_hex": "0x1000030A", + "primary_opcode": 4, + "extended_opcode": 778, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Convert from Unsigned Fixed-Point Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpbfp": { + "page": "vmx/vcmpbfp.md", + "family": "vcmpbfp", + "xml_mnem": "vcmpbfp", + "opcode_hex": "0x100003C6", + "primary_opcode": 4, + "extended_opcode": 966, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Bounds Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpbfp.": { + "page": "vmx/vcmpbfp.md", + "family": "vcmpbfp", + "variant_of": "vcmpbfp", + "xml_mnem": "vcmpbfp", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpbfp128": { + "page": "vmx/vcmpbfp.md", + "family": "vcmpbfp", + "xml_mnem": "vcmpbfp128", + "opcode_hex": "0x18000180", + "primary_opcode": 6, + "extended_opcode": 384, + "form": "VX128_R", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Compare Bounds Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpbfp128.": { + "page": "vmx/vcmpbfp.md", + "family": "vcmpbfp", + "variant_of": "vcmpbfp128", + "xml_mnem": "vcmpbfp128", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpeqfp": { + "page": "vmx/vcmpeqfp.md", + "family": "vcmpeqfp", + "xml_mnem": "vcmpeqfp", + "opcode_hex": "0x100000C6", + "primary_opcode": 4, + "extended_opcode": 198, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Equal-to Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpeqfp.": { + "page": "vmx/vcmpeqfp.md", + "family": "vcmpeqfp", + "variant_of": "vcmpeqfp", + "xml_mnem": "vcmpeqfp", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpeqfp128": { + "page": "vmx/vcmpeqfp.md", + "family": "vcmpeqfp", + "xml_mnem": "vcmpeqfp128", + "opcode_hex": "0x18000000", + "primary_opcode": 6, + "extended_opcode": 0, + "form": "VX128_R", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Compare Equal-to Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpeqfp128.": { + "page": "vmx/vcmpeqfp.md", + "family": "vcmpeqfp", + "variant_of": "vcmpeqfp128", + "xml_mnem": "vcmpeqfp128", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpequb": { + "page": "vmx/vcmpequb.md", + "family": "vcmpequb", + "xml_mnem": "vcmpequb", + "opcode_hex": "0x10000006", + "primary_opcode": 4, + "extended_opcode": 6, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Equal-to Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpequb.": { + "page": "vmx/vcmpequb.md", + "family": "vcmpequb", + "variant_of": "vcmpequb", + "xml_mnem": "vcmpequb", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpequh": { + "page": "vmx/vcmpequh.md", + "family": "vcmpequh", + "xml_mnem": "vcmpequh", + "opcode_hex": "0x10000046", + "primary_opcode": 4, + "extended_opcode": 70, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Equal-to Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpequh.": { + "page": "vmx/vcmpequh.md", + "family": "vcmpequh", + "variant_of": "vcmpequh", + "xml_mnem": "vcmpequh", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpequw": { + "page": "vmx/vcmpequw.md", + "family": "vcmpequw", + "xml_mnem": "vcmpequw", + "opcode_hex": "0x10000086", + "primary_opcode": 4, + "extended_opcode": 134, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Equal-to Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpequw.": { + "page": "vmx/vcmpequw.md", + "family": "vcmpequw", + "variant_of": "vcmpequw", + "xml_mnem": "vcmpequw", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpequw128": { + "page": "vmx/vcmpequw.md", + "family": "vcmpequw", + "xml_mnem": "vcmpequw128", + "opcode_hex": "0x18000200", + "primary_opcode": 6, + "extended_opcode": 512, + "form": "VX128_R", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Compare Equal-to Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpequw128.": { + "page": "vmx/vcmpequw.md", + "family": "vcmpequw", + "variant_of": "vcmpequw128", + "xml_mnem": "vcmpequw128", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgefp": { + "page": "vmx/vcmpgefp.md", + "family": "vcmpgefp", + "xml_mnem": "vcmpgefp", + "opcode_hex": "0x100001C6", + "primary_opcode": 4, + "extended_opcode": 454, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than-or-Equal-to Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgefp.": { + "page": "vmx/vcmpgefp.md", + "family": "vcmpgefp", + "variant_of": "vcmpgefp", + "xml_mnem": "vcmpgefp", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgefp128": { + "page": "vmx/vcmpgefp.md", + "family": "vcmpgefp", + "xml_mnem": "vcmpgefp128", + "opcode_hex": "0x18000080", + "primary_opcode": 6, + "extended_opcode": 128, + "form": "VX128_R", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Compare Greater-Than-or-Equal-to Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgefp128.": { + "page": "vmx/vcmpgefp.md", + "family": "vcmpgefp", + "variant_of": "vcmpgefp128", + "xml_mnem": "vcmpgefp128", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtfp": { + "page": "vmx/vcmpgtfp.md", + "family": "vcmpgtfp", + "xml_mnem": "vcmpgtfp", + "opcode_hex": "0x100002C6", + "primary_opcode": 4, + "extended_opcode": 710, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtfp.": { + "page": "vmx/vcmpgtfp.md", + "family": "vcmpgtfp", + "variant_of": "vcmpgtfp", + "xml_mnem": "vcmpgtfp", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtfp128": { + "page": "vmx/vcmpgtfp.md", + "family": "vcmpgtfp", + "xml_mnem": "vcmpgtfp128", + "opcode_hex": "0x18000100", + "primary_opcode": 6, + "extended_opcode": 256, + "form": "VX128_R", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Compare Greater-Than Floating-Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtfp128.": { + "page": "vmx/vcmpgtfp.md", + "family": "vcmpgtfp", + "variant_of": "vcmpgtfp128", + "xml_mnem": "vcmpgtfp128", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtsb": { + "page": "vmx/vcmpgtsb.md", + "family": "vcmpgtsb", + "xml_mnem": "vcmpgtsb", + "opcode_hex": "0x10000306", + "primary_opcode": 4, + "extended_opcode": 774, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtsb.": { + "page": "vmx/vcmpgtsb.md", + "family": "vcmpgtsb", + "variant_of": "vcmpgtsb", + "xml_mnem": "vcmpgtsb", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtsh": { + "page": "vmx/vcmpgtsh.md", + "family": "vcmpgtsh", + "xml_mnem": "vcmpgtsh", + "opcode_hex": "0x10000346", + "primary_opcode": 4, + "extended_opcode": 838, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtsh.": { + "page": "vmx/vcmpgtsh.md", + "family": "vcmpgtsh", + "variant_of": "vcmpgtsh", + "xml_mnem": "vcmpgtsh", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtsw": { + "page": "vmx/vcmpgtsw.md", + "family": "vcmpgtsw", + "xml_mnem": "vcmpgtsw", + "opcode_hex": "0x10000386", + "primary_opcode": 4, + "extended_opcode": 902, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Signed Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtsw.": { + "page": "vmx/vcmpgtsw.md", + "family": "vcmpgtsw", + "variant_of": "vcmpgtsw", + "xml_mnem": "vcmpgtsw", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtub": { + "page": "vmx/vcmpgtub.md", + "family": "vcmpgtub", + "xml_mnem": "vcmpgtub", + "opcode_hex": "0x10000206", + "primary_opcode": 4, + "extended_opcode": 518, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtub.": { + "page": "vmx/vcmpgtub.md", + "family": "vcmpgtub", + "variant_of": "vcmpgtub", + "xml_mnem": "vcmpgtub", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtuh": { + "page": "vmx/vcmpgtuh.md", + "family": "vcmpgtuh", + "xml_mnem": "vcmpgtuh", + "opcode_hex": "0x10000246", + "primary_opcode": 4, + "extended_opcode": 582, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtuh.": { + "page": "vmx/vcmpgtuh.md", + "family": "vcmpgtuh", + "variant_of": "vcmpgtuh", + "xml_mnem": "vcmpgtuh", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcmpgtuw": { + "page": "vmx/vcmpgtuw.md", + "family": "vcmpgtuw", + "xml_mnem": "vcmpgtuw", + "opcode_hex": "0x10000286", + "primary_opcode": 4, + "extended_opcode": 646, + "form": "VC", + "group": "vmx", + "category": "vmx", + "description": "Vector Compare Greater-Than Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcmpgtuw.": { + "page": "vmx/vcmpgtuw.md", + "family": "vcmpgtuw", + "variant_of": "vcmpgtuw", + "xml_mnem": "vcmpgtuw", + "flags": { + "Rc": 1 + }, + "category": "vmx" + }, + "vcsxwfp128": { + "page": "vmx128/vcsxwfp128.md", + "family": "vcsxwfp128", + "xml_mnem": "vcsxwfp128", + "opcode_hex": "0x180002B0", + "primary_opcode": 6, + "extended_opcode": 688, + "form": "VX128_3", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Convert From Signed Fixed-Point Word to Floating-Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vctsxs": { + "page": "vmx/vctsxs.md", + "family": "vctsxs", + "xml_mnem": "vctsxs", + "opcode_hex": "0x100003CA", + "primary_opcode": 4, + "extended_opcode": 970, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Convert to Signed Fixed-Point Word Saturate", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vctuxs": { + "page": "vmx/vctuxs.md", + "family": "vctuxs", + "xml_mnem": "vctuxs", + "opcode_hex": "0x1000038A", + "primary_opcode": 4, + "extended_opcode": 906, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Convert to Unsigned Fixed-Point Word Saturate", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vcuxwfp128": { + "page": "vmx128/vcuxwfp128.md", + "family": "vcuxwfp128", + "xml_mnem": "vcuxwfp128", + "opcode_hex": "0x180002F0", + "primary_opcode": 6, + "extended_opcode": 752, + "form": "VX128_3", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vexptefp": { + "page": "vmx/vexptefp.md", + "family": "vexptefp", + "xml_mnem": "vexptefp", + "opcode_hex": "0x1000018A", + "primary_opcode": 4, + "extended_opcode": 394, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector 2 Raised to the Exponent Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vexptefp128": { + "page": "vmx/vexptefp.md", + "family": "vexptefp", + "xml_mnem": "vexptefp128", + "opcode_hex": "0x180006B0", + "primary_opcode": 6, + "extended_opcode": 1712, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Log2 Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vlogefp": { + "page": "vmx/vlogefp.md", + "family": "vlogefp", + "xml_mnem": "vlogefp", + "opcode_hex": "0x100001CA", + "primary_opcode": 4, + "extended_opcode": 458, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Log2 Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vlogefp128": { + "page": "vmx/vlogefp.md", + "family": "vlogefp", + "xml_mnem": "vlogefp128", + "opcode_hex": "0x180006F0", + "primary_opcode": 6, + "extended_opcode": 1776, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Log2 Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaddcfp128": { + "page": "vmx128/vmaddcfp128.md", + "family": "vmaddcfp128", + "xml_mnem": "vmaddcfp128", + "opcode_hex": "0x14000110", + "primary_opcode": 5, + "extended_opcode": 272, + "form": "VX128", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Multiply Add Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VD", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaddfp": { + "page": "vmx/vmaddfp.md", + "family": "vmaddfp", + "xml_mnem": "vmaddfp", + "opcode_hex": "0x1000002E", + "primary_opcode": 4, + "extended_opcode": 46, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Add Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VC", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaddfp128": { + "page": "vmx/vmaddfp.md", + "family": "vmaddfp", + "xml_mnem": "vmaddfp128", + "opcode_hex": "0x140000D0", + "primary_opcode": 5, + "extended_opcode": 208, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Multiply Add Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VC", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxfp": { + "page": "vmx/vmaxfp.md", + "family": "vmaxfp", + "xml_mnem": "vmaxfp", + "opcode_hex": "0x1000040A", + "primary_opcode": 4, + "extended_opcode": 1034, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxfp128": { + "page": "vmx/vmaxfp.md", + "family": "vmaxfp", + "xml_mnem": "vmaxfp128", + "opcode_hex": "0x18000280", + "primary_opcode": 6, + "extended_opcode": 640, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Maximum Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxsb": { + "page": "vmx/vmaxsb.md", + "family": "vmaxsb", + "xml_mnem": "vmaxsb", + "opcode_hex": "0x10000102", + "primary_opcode": 4, + "extended_opcode": 258, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxsh": { + "page": "vmx/vmaxsh.md", + "family": "vmaxsh", + "xml_mnem": "vmaxsh", + "opcode_hex": "0x10000142", + "primary_opcode": 4, + "extended_opcode": 322, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxsw": { + "page": "vmx/vmaxsw.md", + "family": "vmaxsw", + "xml_mnem": "vmaxsw", + "opcode_hex": "0x10000182", + "primary_opcode": 4, + "extended_opcode": 386, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Signed Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxub": { + "page": "vmx/vmaxub.md", + "family": "vmaxub", + "xml_mnem": "vmaxub", + "opcode_hex": "0x10000002", + "primary_opcode": 4, + "extended_opcode": 2, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxuh": { + "page": "vmx/vmaxuh.md", + "family": "vmaxuh", + "xml_mnem": "vmaxuh", + "opcode_hex": "0x10000042", + "primary_opcode": 4, + "extended_opcode": 66, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmaxuw": { + "page": "vmx/vmaxuw.md", + "family": "vmaxuw", + "xml_mnem": "vmaxuw", + "opcode_hex": "0x10000082", + "primary_opcode": 4, + "extended_opcode": 130, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Maximum Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmhaddshs": { + "page": "vmx/vmhaddshs.md", + "family": "vmhaddshs", + "xml_mnem": "vmhaddshs", + "opcode_hex": "0x10000020", + "primary_opcode": 4, + "extended_opcode": 32, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-High and Add Signed Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmhraddshs": { + "page": "vmx/vmhraddshs.md", + "family": "vmhraddshs", + "xml_mnem": "vmhraddshs", + "opcode_hex": "0x10000021", + "primary_opcode": 4, + "extended_opcode": 33, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-High Round and Add Signed Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminfp": { + "page": "vmx/vminfp.md", + "family": "vminfp", + "xml_mnem": "vminfp", + "opcode_hex": "0x1000044A", + "primary_opcode": 4, + "extended_opcode": 1098, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminfp128": { + "page": "vmx/vminfp.md", + "family": "vminfp", + "xml_mnem": "vminfp128", + "opcode_hex": "0x180002C0", + "primary_opcode": 6, + "extended_opcode": 704, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Minimum Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminsb": { + "page": "vmx/vminsb.md", + "family": "vminsb", + "xml_mnem": "vminsb", + "opcode_hex": "0x10000302", + "primary_opcode": 4, + "extended_opcode": 770, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminsh": { + "page": "vmx/vminsh.md", + "family": "vminsh", + "xml_mnem": "vminsh", + "opcode_hex": "0x10000342", + "primary_opcode": 4, + "extended_opcode": 834, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminsw": { + "page": "vmx/vminsw.md", + "family": "vminsw", + "xml_mnem": "vminsw", + "opcode_hex": "0x10000382", + "primary_opcode": 4, + "extended_opcode": 898, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Signed Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminub": { + "page": "vmx/vminub.md", + "family": "vminub", + "xml_mnem": "vminub", + "opcode_hex": "0x10000202", + "primary_opcode": 4, + "extended_opcode": 514, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminuh": { + "page": "vmx/vminuh.md", + "family": "vminuh", + "xml_mnem": "vminuh", + "opcode_hex": "0x10000242", + "primary_opcode": 4, + "extended_opcode": 578, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vminuw": { + "page": "vmx/vminuw.md", + "family": "vminuw", + "xml_mnem": "vminuw", + "opcode_hex": "0x10000282", + "primary_opcode": 4, + "extended_opcode": 642, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Minimum Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmladduhm": { + "page": "vmx/vmladduhm.md", + "family": "vmladduhm", + "xml_mnem": "vmladduhm", + "opcode_hex": "0x10000022", + "primary_opcode": 4, + "extended_opcode": 34, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Low and Add Unsigned Half Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrghb": { + "page": "vmx/vmrghb.md", + "family": "vmrghb", + "xml_mnem": "vmrghb", + "opcode_hex": "0x1000000C", + "primary_opcode": 4, + "extended_opcode": 12, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge High Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrghh": { + "page": "vmx/vmrghh.md", + "family": "vmrghh", + "xml_mnem": "vmrghh", + "opcode_hex": "0x1000004C", + "primary_opcode": 4, + "extended_opcode": 76, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge High Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrghw": { + "page": "vmx/vmrghw.md", + "family": "vmrghw", + "xml_mnem": "vmrghw", + "opcode_hex": "0x1000008C", + "primary_opcode": 4, + "extended_opcode": 140, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge High Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrghw128": { + "page": "vmx/vmrghw.md", + "family": "vmrghw", + "xml_mnem": "vmrghw128", + "opcode_hex": "0x18000300", + "primary_opcode": 6, + "extended_opcode": 768, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Merge High Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrglb": { + "page": "vmx/vmrglb.md", + "family": "vmrglb", + "xml_mnem": "vmrglb", + "opcode_hex": "0x1000010C", + "primary_opcode": 4, + "extended_opcode": 268, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge Low Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrglh": { + "page": "vmx/vmrglh.md", + "family": "vmrglh", + "xml_mnem": "vmrglh", + "opcode_hex": "0x1000014C", + "primary_opcode": 4, + "extended_opcode": 332, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge Low Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrglw": { + "page": "vmx/vmrglw.md", + "family": "vmrglw", + "xml_mnem": "vmrglw", + "opcode_hex": "0x1000018C", + "primary_opcode": 4, + "extended_opcode": 396, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Merge Low Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmrglw128": { + "page": "vmx/vmrglw.md", + "family": "vmrglw", + "xml_mnem": "vmrglw128", + "opcode_hex": "0x18000340", + "primary_opcode": 6, + "extended_opcode": 832, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Merge Low Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsum3fp128": { + "page": "vmx128/vmsum3fp128.md", + "family": "vmsum3fp128", + "xml_mnem": "vmsum3fp128", + "opcode_hex": "0x14000190", + "primary_opcode": 5, + "extended_opcode": 400, + "form": "VX128", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Multiply Sum 3-way Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsum4fp128": { + "page": "vmx128/vmsum4fp128.md", + "family": "vmsum4fp128", + "xml_mnem": "vmsum4fp128", + "opcode_hex": "0x140001D0", + "primary_opcode": 5, + "extended_opcode": 464, + "form": "VX128", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Multiply Sum 4-way Floating-Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsummbm": { + "page": "vmx/vmsummbm.md", + "family": "vmsummbm", + "xml_mnem": "vmsummbm", + "opcode_hex": "0x10000025", + "primary_opcode": 4, + "extended_opcode": 37, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Mixed-Sign Byte Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsumshm": { + "page": "vmx/vmsumshm.md", + "family": "vmsumshm", + "xml_mnem": "vmsumshm", + "opcode_hex": "0x10000028", + "primary_opcode": 4, + "extended_opcode": 40, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Signed Half Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsumshs": { + "page": "vmx/vmsumshs.md", + "family": "vmsumshs", + "xml_mnem": "vmsumshs", + "opcode_hex": "0x10000029", + "primary_opcode": 4, + "extended_opcode": 41, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsumubm": { + "page": "vmx/vmsumubm.md", + "family": "vmsumubm", + "xml_mnem": "vmsumubm", + "opcode_hex": "0x10000024", + "primary_opcode": 4, + "extended_opcode": 36, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Unsigned Byte Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsumuhm": { + "page": "vmx/vmsumuhm.md", + "family": "vmsumuhm", + "xml_mnem": "vmsumuhm", + "opcode_hex": "0x10000026", + "primary_opcode": 4, + "extended_opcode": 38, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Unsigned Half Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmsumuhs": { + "page": "vmx/vmsumuhs.md", + "family": "vmsumuhs", + "xml_mnem": "vmsumuhs", + "opcode_hex": "0x10000027", + "primary_opcode": 4, + "extended_opcode": 39, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply-Sum Unsigned Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulesb": { + "page": "vmx/vmulesb.md", + "family": "vmulesb", + "xml_mnem": "vmulesb", + "opcode_hex": "0x10000308", + "primary_opcode": 4, + "extended_opcode": 776, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Even Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulesh": { + "page": "vmx/vmulesh.md", + "family": "vmulesh", + "xml_mnem": "vmulesh", + "opcode_hex": "0x10000348", + "primary_opcode": 4, + "extended_opcode": 840, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Even Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmuleub": { + "page": "vmx/vmuleub.md", + "family": "vmuleub", + "xml_mnem": "vmuleub", + "opcode_hex": "0x10000208", + "primary_opcode": 4, + "extended_opcode": 520, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Even Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmuleuh": { + "page": "vmx/vmuleuh.md", + "family": "vmuleuh", + "xml_mnem": "vmuleuh", + "opcode_hex": "0x10000248", + "primary_opcode": 4, + "extended_opcode": 584, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Even Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulfp128": { + "page": "vmx128/vmulfp128.md", + "family": "vmulfp128", + "xml_mnem": "vmulfp128", + "opcode_hex": "0x14000090", + "primary_opcode": 5, + "extended_opcode": 144, + "form": "VX128", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Multiply Floating-Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulosb": { + "page": "vmx/vmulosb.md", + "family": "vmulosb", + "xml_mnem": "vmulosb", + "opcode_hex": "0x10000108", + "primary_opcode": 4, + "extended_opcode": 264, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Odd Signed Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulosh": { + "page": "vmx/vmulosh.md", + "family": "vmulosh", + "xml_mnem": "vmulosh", + "opcode_hex": "0x10000148", + "primary_opcode": 4, + "extended_opcode": 328, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Odd Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmuloub": { + "page": "vmx/vmuloub.md", + "family": "vmuloub", + "xml_mnem": "vmuloub", + "opcode_hex": "0x10000008", + "primary_opcode": 4, + "extended_opcode": 8, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Odd Unsigned Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vmulouh": { + "page": "vmx/vmulouh.md", + "family": "vmulouh", + "xml_mnem": "vmulouh", + "opcode_hex": "0x10000048", + "primary_opcode": 4, + "extended_opcode": 72, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Multiply Odd Unsigned Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vnmsubfp": { + "page": "vmx/vnmsubfp.md", + "family": "vnmsubfp", + "xml_mnem": "vnmsubfp", + "opcode_hex": "0x1000002F", + "primary_opcode": 4, + "extended_opcode": 47, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Negative Multiply-Subtract Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VC", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vnmsubfp128": { + "page": "vmx/vnmsubfp.md", + "family": "vnmsubfp", + "xml_mnem": "vnmsubfp128", + "opcode_hex": "0x14000150", + "primary_opcode": 5, + "extended_opcode": 336, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Negative Multiply-Subtract Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VD", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vnor": { + "page": "vmx/vnor.md", + "family": "vnor", + "xml_mnem": "vnor", + "opcode_hex": "0x10000504", + "primary_opcode": 4, + "extended_opcode": 1284, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Logical NOR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vnor128": { + "page": "vmx/vnor.md", + "family": "vnor", + "xml_mnem": "vnor128", + "opcode_hex": "0x14000290", + "primary_opcode": 5, + "extended_opcode": 656, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Logical NOR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vor": { + "page": "vmx/vor.md", + "family": "vor", + "xml_mnem": "vor", + "opcode_hex": "0x10000484", + "primary_opcode": 4, + "extended_opcode": 1156, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Logical OR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vor128": { + "page": "vmx/vor.md", + "family": "vor", + "xml_mnem": "vor128", + "opcode_hex": "0x140002D0", + "primary_opcode": 5, + "extended_opcode": 720, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Logical OR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vperm": { + "page": "vmx/vperm.md", + "family": "vperm", + "xml_mnem": "vperm", + "opcode_hex": "0x1000002B", + "primary_opcode": 4, + "extended_opcode": 43, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Permute", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vperm128": { + "page": "vmx/vperm.md", + "family": "vperm", + "xml_mnem": "vperm128", + "opcode_hex": "0x14000000", + "primary_opcode": 5, + "extended_opcode": 0, + "form": "VX128_2", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Permute", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpermwi128": { + "page": "vmx128/vpermwi128.md", + "family": "vpermwi128", + "xml_mnem": "vpermwi128", + "opcode_hex": "0x18000210", + "primary_opcode": 6, + "extended_opcode": 528, + "form": "VX128_P", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Permutate Word Immediate", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkd3d128": { + "page": "vmx128/vpkd3d128.md", + "family": "vpkd3d128", + "xml_mnem": "vpkd3d128", + "opcode_hex": "0x18000610", + "primary_opcode": 6, + "extended_opcode": 1552, + "form": "VX128_4", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkp": { + "page": "vmx/vpkpx.md", + "family": "vpkpx", + "xml_mnem": "vpkpx", + "opcode_hex": "0x1000030E", + "primary_opcode": 4, + "extended_opcode": 782, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Pixel", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkshss": { + "page": "vmx/vpkshss.md", + "family": "vpkshss", + "xml_mnem": "vpkshss", + "opcode_hex": "0x1000018E", + "primary_opcode": 4, + "extended_opcode": 398, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Signed Half Word Signed Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkshss128": { + "page": "vmx/vpkshss.md", + "family": "vpkshss", + "xml_mnem": "vpkshss128", + "opcode_hex": "0x14000200", + "primary_opcode": 5, + "extended_opcode": 512, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Signed Half Word Signed Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkshus": { + "page": "vmx/vpkshus.md", + "family": "vpkshus", + "xml_mnem": "vpkshus", + "opcode_hex": "0x1000010E", + "primary_opcode": 4, + "extended_opcode": 270, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Signed Half Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkshus128": { + "page": "vmx/vpkshus.md", + "family": "vpkshus", + "xml_mnem": "vpkshus128", + "opcode_hex": "0x14000240", + "primary_opcode": 5, + "extended_opcode": 576, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Signed Half Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkswss": { + "page": "vmx/vpkswss.md", + "family": "vpkswss", + "xml_mnem": "vpkswss", + "opcode_hex": "0x100001CE", + "primary_opcode": 4, + "extended_opcode": 462, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Signed Word Signed Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkswss128": { + "page": "vmx/vpkswss.md", + "family": "vpkswss", + "xml_mnem": "vpkswss128", + "opcode_hex": "0x14000280", + "primary_opcode": 5, + "extended_opcode": 640, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Signed Word Signed Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkswus": { + "page": "vmx/vpkswus.md", + "family": "vpkswus", + "xml_mnem": "vpkswus", + "opcode_hex": "0x1000014E", + "primary_opcode": 4, + "extended_opcode": 334, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Signed Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkswus128": { + "page": "vmx/vpkswus.md", + "family": "vpkswus", + "xml_mnem": "vpkswus128", + "opcode_hex": "0x140002C0", + "primary_opcode": 5, + "extended_opcode": 704, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Signed Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuhum": { + "page": "vmx/vpkuhum.md", + "family": "vpkuhum", + "xml_mnem": "vpkuhum", + "opcode_hex": "0x1000000E", + "primary_opcode": 4, + "extended_opcode": 14, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Unsigned Half Word Unsigned Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuhum128": { + "page": "vmx/vpkuhum.md", + "family": "vpkuhum", + "xml_mnem": "vpkuhum128", + "opcode_hex": "0x14000300", + "primary_opcode": 5, + "extended_opcode": 768, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Unsigned Half Word Unsigned Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuhus": { + "page": "vmx/vpkuhus.md", + "family": "vpkuhus", + "xml_mnem": "vpkuhus", + "opcode_hex": "0x1000008E", + "primary_opcode": 4, + "extended_opcode": 142, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Unsigned Half Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuhus128": { + "page": "vmx/vpkuhus.md", + "family": "vpkuhus", + "xml_mnem": "vpkuhus128", + "opcode_hex": "0x14000340", + "primary_opcode": 5, + "extended_opcode": 832, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Unsigned Half Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuwum": { + "page": "vmx/vpkuwum.md", + "family": "vpkuwum", + "xml_mnem": "vpkuwum", + "opcode_hex": "0x1000004E", + "primary_opcode": 4, + "extended_opcode": 78, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Unsigned Word Unsigned Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuwum128": { + "page": "vmx/vpkuwum.md", + "family": "vpkuwum", + "xml_mnem": "vpkuwum128", + "opcode_hex": "0x14000380", + "primary_opcode": 5, + "extended_opcode": 896, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Unsigned Word Unsigned Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuwus": { + "page": "vmx/vpkuwus.md", + "family": "vpkuwus", + "xml_mnem": "vpkuwus", + "opcode_hex": "0x100000CE", + "primary_opcode": 4, + "extended_opcode": 206, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Pack Unsigned Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vpkuwus128": { + "page": "vmx/vpkuwus.md", + "family": "vpkuwus", + "xml_mnem": "vpkuwus128", + "opcode_hex": "0x140003C0", + "primary_opcode": 5, + "extended_opcode": 960, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Pack Unsigned Word Unsigned Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrefp": { + "page": "vmx/vrefp.md", + "family": "vrefp", + "xml_mnem": "vrefp", + "opcode_hex": "0x1000010A", + "primary_opcode": 4, + "extended_opcode": 266, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Reciprocal Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrefp128": { + "page": "vmx/vrefp.md", + "family": "vrefp", + "xml_mnem": "vrefp128", + "opcode_hex": "0x18000630", + "primary_opcode": 6, + "extended_opcode": 1584, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Reciprocal Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfim": { + "page": "vmx/vrfim.md", + "family": "vrfim", + "xml_mnem": "vrfim", + "opcode_hex": "0x100002CA", + "primary_opcode": 4, + "extended_opcode": 714, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Round to Floating-Point Integer toward -Infinity", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfim128": { + "page": "vmx/vrfim.md", + "family": "vrfim", + "xml_mnem": "vrfim128", + "opcode_hex": "0x18000330", + "primary_opcode": 6, + "extended_opcode": 816, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Round to Floating-Point Integer toward -Infinity", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfin": { + "page": "vmx/vrfin.md", + "family": "vrfin", + "xml_mnem": "vrfin", + "opcode_hex": "0x1000020A", + "primary_opcode": 4, + "extended_opcode": 522, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Round to Floating-Point Integer Nearest", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfin128": { + "page": "vmx/vrfin.md", + "family": "vrfin", + "xml_mnem": "vrfin128", + "opcode_hex": "0x18000370", + "primary_opcode": 6, + "extended_opcode": 880, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Round to Floating-Point Integer Nearest", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfip": { + "page": "vmx/vrfip.md", + "family": "vrfip", + "xml_mnem": "vrfip", + "opcode_hex": "0x1000028A", + "primary_opcode": 4, + "extended_opcode": 650, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Round to Floating-Point Integer toward +Infinity", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfip128": { + "page": "vmx/vrfip.md", + "family": "vrfip", + "xml_mnem": "vrfip128", + "opcode_hex": "0x180003B0", + "primary_opcode": 6, + "extended_opcode": 944, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Round to Floating-Point Integer toward +Infinity", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfiz": { + "page": "vmx/vrfiz.md", + "family": "vrfiz", + "xml_mnem": "vrfiz", + "opcode_hex": "0x1000024A", + "primary_opcode": 4, + "extended_opcode": 586, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Round to Floating-Point Integer toward Zero", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrfiz128": { + "page": "vmx/vrfiz.md", + "family": "vrfiz", + "xml_mnem": "vrfiz128", + "opcode_hex": "0x180003F0", + "primary_opcode": 6, + "extended_opcode": 1008, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Round to Floating-Point Integer toward Zero", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrlb": { + "page": "vmx/vrlb.md", + "family": "vrlb", + "xml_mnem": "vrlb", + "opcode_hex": "0x10000004", + "primary_opcode": 4, + "extended_opcode": 4, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Rotate Left Integer Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrlh": { + "page": "vmx/vrlh.md", + "family": "vrlh", + "xml_mnem": "vrlh", + "opcode_hex": "0x10000044", + "primary_opcode": 4, + "extended_opcode": 68, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Rotate Left Integer Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrlimi128": { + "page": "vmx128/vrlimi128.md", + "family": "vrlimi128", + "xml_mnem": "vrlimi128", + "opcode_hex": "0x18000710", + "primary_opcode": 6, + "extended_opcode": 1808, + "form": "VX128_4", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Rotate Left Immediate and Mask Insert", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrlw": { + "page": "vmx/vrlw.md", + "family": "vrlw", + "xml_mnem": "vrlw", + "opcode_hex": "0x10000084", + "primary_opcode": 4, + "extended_opcode": 132, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Rotate Left Integer Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrlw128": { + "page": "vmx/vrlw.md", + "family": "vrlw", + "xml_mnem": "vrlw128", + "opcode_hex": "0x18000050", + "primary_opcode": 6, + "extended_opcode": 80, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Rotate Left Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrsqrtefp": { + "page": "vmx/vrsqrtefp.md", + "family": "vrsqrtefp", + "xml_mnem": "vrsqrtefp", + "opcode_hex": "0x1000014A", + "primary_opcode": 4, + "extended_opcode": 330, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Reciprocal Square Root Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vrsqrtefp128": { + "page": "vmx/vrsqrtefp.md", + "family": "vrsqrtefp", + "xml_mnem": "vrsqrtefp128", + "opcode_hex": "0x18000670", + "primary_opcode": 6, + "extended_opcode": 1648, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Reciprocal Square Root Estimate Floating Point", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsel": { + "page": "vmx/vsel.md", + "family": "vsel", + "xml_mnem": "vsel", + "opcode_hex": "0x1000002A", + "primary_opcode": 4, + "extended_opcode": 42, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Conditional Select", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VC", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsel128": { + "page": "vmx/vsel.md", + "family": "vsel", + "xml_mnem": "vsel128", + "opcode_hex": "0x14000350", + "primary_opcode": 5, + "extended_opcode": 848, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Conditional Select", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "VD", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsl": { + "page": "vmx/vsl.md", + "family": "vsl", + "xml_mnem": "vsl", + "opcode_hex": "0x100001C4", + "primary_opcode": 4, + "extended_opcode": 452, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslb": { + "page": "vmx/vslb.md", + "family": "vslb", + "xml_mnem": "vslb", + "opcode_hex": "0x10000104", + "primary_opcode": 4, + "extended_opcode": 260, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left Integer Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsldoi": { + "page": "vmx/vsldoi.md", + "family": "vsldoi", + "xml_mnem": "vsldoi", + "opcode_hex": "0x1000002C", + "primary_opcode": 4, + "extended_opcode": 44, + "form": "VA", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left Double by Octet Immediate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "SHB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsldoi128": { + "page": "vmx/vsldoi.md", + "family": "vsldoi", + "xml_mnem": "vsldoi128", + "opcode_hex": "0x10000010", + "primary_opcode": 4, + "extended_opcode": 16, + "form": "VX128_5", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Left Double by Octet Immediate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + }, + { + "field": "SHB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslh": { + "page": "vmx/vslh.md", + "family": "vslh", + "xml_mnem": "vslh", + "opcode_hex": "0x10000144", + "primary_opcode": 4, + "extended_opcode": 324, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left Integer Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslo": { + "page": "vmx/vslo.md", + "family": "vslo", + "xml_mnem": "vslo", + "opcode_hex": "0x1000040C", + "primary_opcode": 4, + "extended_opcode": 1036, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left by Octet", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslo128": { + "page": "vmx/vslo.md", + "family": "vslo", + "xml_mnem": "vslo128", + "opcode_hex": "0x14000390", + "primary_opcode": 5, + "extended_opcode": 912, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Left Octet", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslw": { + "page": "vmx/vslw.md", + "family": "vslw", + "xml_mnem": "vslw", + "opcode_hex": "0x10000184", + "primary_opcode": 4, + "extended_opcode": 388, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Left Integer Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vslw128": { + "page": "vmx/vslw.md", + "family": "vslw", + "xml_mnem": "vslw128", + "opcode_hex": "0x180000D0", + "primary_opcode": 6, + "extended_opcode": 208, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Left Integer Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltb": { + "page": "vmx/vspltb.md", + "family": "vspltb", + "xml_mnem": "vspltb", + "opcode_hex": "0x1000020C", + "primary_opcode": 4, + "extended_opcode": 524, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Byte", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsplth": { + "page": "vmx/vsplth.md", + "family": "vsplth", + "xml_mnem": "vsplth", + "opcode_hex": "0x1000024C", + "primary_opcode": 4, + "extended_opcode": 588, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Half Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltisb": { + "page": "vmx/vspltisb.md", + "family": "vspltisb", + "xml_mnem": "vspltisb", + "opcode_hex": "0x1000030C", + "primary_opcode": 4, + "extended_opcode": 780, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Immediate Signed Byte", + "sync": false, + "reads": [ + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltish": { + "page": "vmx/vspltish.md", + "family": "vspltish", + "xml_mnem": "vspltish", + "opcode_hex": "0x1000034C", + "primary_opcode": 4, + "extended_opcode": 844, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Immediate Signed Half Word", + "sync": false, + "reads": [ + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltisw": { + "page": "vmx/vspltisw.md", + "family": "vspltisw", + "xml_mnem": "vspltisw", + "opcode_hex": "0x1000038C", + "primary_opcode": 4, + "extended_opcode": 908, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Immediate Signed Word", + "sync": false, + "reads": [ + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltisw128": { + "page": "vmx/vspltisw.md", + "family": "vspltisw", + "xml_mnem": "vspltisw128", + "opcode_hex": "0x18000770", + "primary_opcode": 6, + "extended_opcode": 1904, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Splat Immediate Signed Word", + "sync": false, + "reads": [ + { + "field": "SIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltw": { + "page": "vmx/vspltw.md", + "family": "vspltw", + "xml_mnem": "vspltw", + "opcode_hex": "0x1000028C", + "primary_opcode": 4, + "extended_opcode": 652, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Splat Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vspltw128": { + "page": "vmx/vspltw.md", + "family": "vspltw", + "xml_mnem": "vspltw128", + "opcode_hex": "0x18000730", + "primary_opcode": 6, + "extended_opcode": 1840, + "form": "VX128_3", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Splat Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsr": { + "page": "vmx/vsr.md", + "family": "vsr", + "xml_mnem": "vsr", + "opcode_hex": "0x100002C4", + "primary_opcode": 4, + "extended_opcode": 708, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrab": { + "page": "vmx/vsrab.md", + "family": "vsrab", + "xml_mnem": "vsrab", + "opcode_hex": "0x10000304", + "primary_opcode": 4, + "extended_opcode": 772, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Algebraic Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrah": { + "page": "vmx/vsrah.md", + "family": "vsrah", + "xml_mnem": "vsrah", + "opcode_hex": "0x10000344", + "primary_opcode": 4, + "extended_opcode": 836, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Algebraic Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsraw": { + "page": "vmx/vsraw.md", + "family": "vsraw", + "xml_mnem": "vsraw", + "opcode_hex": "0x10000384", + "primary_opcode": 4, + "extended_opcode": 900, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Algebraic Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsraw128": { + "page": "vmx/vsraw.md", + "family": "vsraw", + "xml_mnem": "vsraw128", + "opcode_hex": "0x18000150", + "primary_opcode": 6, + "extended_opcode": 336, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Right Arithmetic Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrb": { + "page": "vmx/vsrb.md", + "family": "vsrb", + "xml_mnem": "vsrb", + "opcode_hex": "0x10000204", + "primary_opcode": 4, + "extended_opcode": 516, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Byte", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrh": { + "page": "vmx/vsrh.md", + "family": "vsrh", + "xml_mnem": "vsrh", + "opcode_hex": "0x10000244", + "primary_opcode": 4, + "extended_opcode": 580, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Half Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsro": { + "page": "vmx/vsro.md", + "family": "vsro", + "xml_mnem": "vsro", + "opcode_hex": "0x1000044C", + "primary_opcode": 4, + "extended_opcode": 1100, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Octet", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsro128": { + "page": "vmx/vsro.md", + "family": "vsro", + "xml_mnem": "vsro128", + "opcode_hex": "0x140003D0", + "primary_opcode": 5, + "extended_opcode": 976, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Right Octet", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrw": { + "page": "vmx/vsrw.md", + "family": "vsrw", + "xml_mnem": "vsrw", + "opcode_hex": "0x10000284", + "primary_opcode": 4, + "extended_opcode": 644, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Shift Right Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsrw128": { + "page": "vmx/vsrw.md", + "family": "vsrw", + "xml_mnem": "vsrw128", + "opcode_hex": "0x180001D0", + "primary_opcode": 6, + "extended_opcode": 464, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Shift Right Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubcuw": { + "page": "vmx/vsubcuw.md", + "family": "vsubcuw", + "xml_mnem": "vsubcuw", + "opcode_hex": "0x10000580", + "primary_opcode": 4, + "extended_opcode": 1408, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Carryout Unsigned Word", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubfp": { + "page": "vmx/vsubfp.md", + "family": "vsubfp", + "xml_mnem": "vsubfp", + "opcode_hex": "0x1000004A", + "primary_opcode": 4, + "extended_opcode": 74, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubfp128": { + "page": "vmx/vsubfp.md", + "family": "vsubfp", + "xml_mnem": "vsubfp128", + "opcode_hex": "0x14000050", + "primary_opcode": 5, + "extended_opcode": 80, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Subtract Floating Point", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubsbs": { + "page": "vmx/vsubsbs.md", + "family": "vsubsbs", + "xml_mnem": "vsubsbs", + "opcode_hex": "0x10000700", + "primary_opcode": 4, + "extended_opcode": 1792, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Signed Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubshs": { + "page": "vmx/vsubshs.md", + "family": "vsubshs", + "xml_mnem": "vsubshs", + "opcode_hex": "0x10000740", + "primary_opcode": 4, + "extended_opcode": 1856, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubsws": { + "page": "vmx/vsubsws.md", + "family": "vsubsws", + "xml_mnem": "vsubsws", + "opcode_hex": "0x10000780", + "primary_opcode": 4, + "extended_opcode": 1920, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Signed Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsububm": { + "page": "vmx/vsububm.md", + "family": "vsububm", + "xml_mnem": "vsububm", + "opcode_hex": "0x10000400", + "primary_opcode": 4, + "extended_opcode": 1024, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Byte Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsububs": { + "page": "vmx/vsububs.md", + "family": "vsububs", + "xml_mnem": "vsububs", + "opcode_hex": "0x10000600", + "primary_opcode": 4, + "extended_opcode": 1536, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubuhm": { + "page": "vmx/vsubuhm.md", + "family": "vsubuhm", + "xml_mnem": "vsubuhm", + "opcode_hex": "0x10000440", + "primary_opcode": 4, + "extended_opcode": 1088, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Half Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubuhs": { + "page": "vmx/vsubuhs.md", + "family": "vsubuhs", + "xml_mnem": "vsubuhs", + "opcode_hex": "0x10000640", + "primary_opcode": 4, + "extended_opcode": 1600, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubuwm": { + "page": "vmx/vsubuwm.md", + "family": "vsubuwm", + "xml_mnem": "vsubuwm", + "opcode_hex": "0x10000480", + "primary_opcode": 4, + "extended_opcode": 1152, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Word Modulo", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsubuws": { + "page": "vmx/vsubuws.md", + "family": "vsubuws", + "xml_mnem": "vsubuws", + "opcode_hex": "0x10000680", + "primary_opcode": 4, + "extended_opcode": 1664, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Subtract Unsigned Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsum2sws": { + "page": "vmx/vsum2sws.md", + "family": "vsum2sws", + "xml_mnem": "vsum2sws", + "opcode_hex": "0x10000688", + "primary_opcode": 4, + "extended_opcode": 1672, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Sum Across Partial (1/2) Signed Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsum4sbs": { + "page": "vmx/vsum4sbs.md", + "family": "vsum4sbs", + "xml_mnem": "vsum4sbs", + "opcode_hex": "0x10000708", + "primary_opcode": 4, + "extended_opcode": 1800, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Sum Across Partial (1/4) Signed Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsum4shs": { + "page": "vmx/vsum4shs.md", + "family": "vsum4shs", + "xml_mnem": "vsum4shs", + "opcode_hex": "0x10000648", + "primary_opcode": 4, + "extended_opcode": 1608, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Sum Across Partial (1/4) Signed Half Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsum4ubs": { + "page": "vmx/vsum4ubs.md", + "family": "vsum4ubs", + "xml_mnem": "vsum4ubs", + "opcode_hex": "0x10000608", + "primary_opcode": 4, + "extended_opcode": 1544, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Sum Across Partial (1/4) Unsigned Byte Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vsumsws": { + "page": "vmx/vsumsws.md", + "family": "vsumsws", + "xml_mnem": "vsumsws", + "opcode_hex": "0x10000788", + "primary_opcode": 4, + "extended_opcode": 1928, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Sum Across Signed Word Saturate", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + }, + { + "field": "VSCR", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupkd3d128": { + "page": "vmx128/vupkd3d128.md", + "family": "vupkd3d128", + "xml_mnem": "vupkd3d128", + "opcode_hex": "0x180007F0", + "primary_opcode": 6, + "extended_opcode": 2032, + "form": "VX128_3", + "group": "vmx", + "category": "vmx128", + "description": "Vector128 Unpack D3Dtype", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupkhp": { + "page": "vmx/vupkhpx.md", + "family": "vupkhpx", + "xml_mnem": "vupkhpx", + "opcode_hex": "0x1000034E", + "primary_opcode": 4, + "extended_opcode": 846, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack High Pixel", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupkhsb": { + "page": "vmx/vupkhsb.md", + "family": "vupkhsb", + "xml_mnem": "vupkhsb", + "opcode_hex": "0x1000020E", + "primary_opcode": 4, + "extended_opcode": 526, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack High Signed Byte", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupkhsb128": { + "page": "vmx/vupkhsb.md", + "family": "vupkhsb", + "xml_mnem": "vupkhsb128", + "opcode_hex": "0x18000380", + "primary_opcode": 6, + "extended_opcode": 896, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Unpack High Signed Byte", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupkhsh": { + "page": "vmx/vupkhsh.md", + "family": "vupkhsh", + "xml_mnem": "vupkhsh", + "opcode_hex": "0x1000024E", + "primary_opcode": 4, + "extended_opcode": 590, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack High Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupklp": { + "page": "vmx/vupklpx.md", + "family": "vupklpx", + "xml_mnem": "vupklpx", + "opcode_hex": "0x100003CE", + "primary_opcode": 4, + "extended_opcode": 974, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack Low Pixel", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupklsb": { + "page": "vmx/vupklsb.md", + "family": "vupklsb", + "xml_mnem": "vupklsb", + "opcode_hex": "0x1000028E", + "primary_opcode": 4, + "extended_opcode": 654, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack Low Signed Byte", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupklsb128": { + "page": "vmx/vupklsb.md", + "family": "vupklsb", + "xml_mnem": "vupklsb128", + "opcode_hex": "0x180003C0", + "primary_opcode": 6, + "extended_opcode": 960, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Unpack Low Signed Byte", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vupklsh": { + "page": "vmx/vupklsh.md", + "family": "vupklsh", + "xml_mnem": "vupklsh", + "opcode_hex": "0x100002CE", + "primary_opcode": 4, + "extended_opcode": 718, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Unpack Low Signed Half Word", + "sync": false, + "reads": [ + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vxor": { + "page": "vmx/vxor.md", + "family": "vxor", + "xml_mnem": "vxor", + "opcode_hex": "0x100004C4", + "primary_opcode": 4, + "extended_opcode": 1220, + "form": "VX", + "group": "vmx", + "category": "vmx", + "description": "Vector Logical XOR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "vxor128": { + "page": "vmx/vxor.md", + "family": "vxor", + "xml_mnem": "vxor128", + "opcode_hex": "0x14000310", + "primary_opcode": 5, + "extended_opcode": 784, + "form": "VX128", + "group": "vmx", + "category": "vmx", + "description": "Vector128 Logical XOR", + "sync": false, + "reads": [ + { + "field": "VA", + "conditional": false + }, + { + "field": "VB", + "conditional": false + } + ], + "writes": [ + { + "field": "VD", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "xor": { + "page": "alu/xorx.md", + "family": "xorx", + "xml_mnem": "xorx", + "opcode_hex": "0x7C000278", + "primary_opcode": 31, + "extended_opcode": 316, + "form": "X", + "group": "integer", + "category": "alu", + "description": "XOR", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "RB", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + }, + { + "field": "CR", + "conditional": true + } + ], + "runtime_flags": { + "Rc": true, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "xor.": { + "page": "alu/xorx.md", + "family": "xorx", + "variant_of": "xor", + "xml_mnem": "xorx", + "flags": { + "Rc": 1 + }, + "category": "alu" + }, + "xori": { + "page": "alu/xori.md", + "family": "xori", + "xml_mnem": "xori", + "opcode_hex": "0x68000000", + "primary_opcode": 26, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "XOR Immediate", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + }, + "xoris": { + "page": "alu/xoris.md", + "family": "xoris", + "xml_mnem": "xoris", + "opcode_hex": "0x6C000000", + "primary_opcode": 27, + "extended_opcode": null, + "form": "D", + "group": "integer", + "category": "alu", + "description": "XOR Immediate Shifted", + "sync": false, + "reads": [ + { + "field": "RS", + "conditional": false + }, + { + "field": "UIMM", + "conditional": false + } + ], + "writes": [ + { + "field": "RA", + "conditional": false + } + ], + "runtime_flags": { + "Rc": false, + "OE": false, + "LK": false, + "Rc_mandatory": false + }, + "is_primary": true, + "flags": {} + } + } +} diff --git a/migration/project-root/ppc-manual/memory/dcbf.md b/migration/project-root/ppc-manual/memory/dcbf.md new file mode 100644 index 0000000..c863c03 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbf.md @@ -0,0 +1,118 @@ +# `dcbf` — Data Cache Block Flush + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0000ac` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbf` | `dcbf` | — | Data Cache Block Flush | + +## Syntax + +```asm +dcbf [RA0], [RB] +``` + +## Encoding + +### `dcbf` — form `X` + +- **Opcode word:** `0x7c0000ac` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `86` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbf: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbf: read | Source GPR. | + +## Register Effects + +### `dcbf` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbf`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbf"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1125`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1125) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:773`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L773) + + + +## Special Cases & Edge Conditions + +- **Flush = write-back + invalidate.** If the addressed line is dirty in the data cache, it is written to memory; whether dirty or clean, the line is then removed from the cache. Subsequent loads must refill from memory. +- **Cache line size.** Xenon's L1/L2 lines are **128 bytes**. The hardware ignores the low seven bits of `EA`, so `dcbf RA, RB` flushes the line containing `EA` regardless of where in that line `EA` lies. There is no `dcbf128` variant — the hint is sized to the architectural line. +- **`RA0` semantics.** When `RA = 0`, the base is the literal zero — `dcbf 0, RB` flushes the line containing address `RB`. The instruction has no destination register. +- **Xenia models a no-op.** Xenia-rs's emulator does not maintain a coherent cache model; the decode entry exists but the interpreter typically advances PC without further effect, since target memory is always coherent on the host. This is correct behaviour for an emulator. +- **Unprivileged.** `dcbf` is a problem-state instruction — usable from user code. Storage protection still applies; flushing an unmapped page raises a DSI exception. +- **Pair with `sync`.** Hardware `dcbf` does not by itself impose ordering; software that needs the flushed data visible to other masters (DMA, GPU) issues a [`sync`](sync.md) afterwards. +- **Self-modifying code companion.** When patching code, the recipe is `dcbst` (push dirty data through to memory) → `sync` → [`icbi`](icbi.md) (invalidate I-cache) → [`isync`](isync.md). `dcbf` is the heavier alternative when the writer also wants the line out of D-cache. + +## Related Instructions + +- [`dcbst`](dcbst.md) — write-back without invalidate (lighter than `dcbf`). +- [`dcbi`](dcbi.md) — invalidate without write-back (privileged; loses dirty data). +- [`dcbt`](dcbt.md), [`dcbtst`](dcbtst.md) — touch hints to bring lines in. +- [`dcbz`](dcbz.md), `dcbz128` — allocate-and-zero a line. +- [`icbi`](icbi.md) — instruction-cache invalidate, used together for self-modifying code. +- [`sync`](sync.md) — full memory barrier, typically follows `dcbf`. + +## IBM Reference + +- [AIX 7.3 — `dcbf` (Data Cache Block Flush)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbf-data-cache-block-flush-instruction) +- `PowerISA v2.07B Book II` § "Storage Control Instructions" for cache-coherence semantics. diff --git a/migration/project-root/ppc-manual/memory/dcbi.md b/migration/project-root/ppc-manual/memory/dcbi.md new file mode 100644 index 0000000..e59abc6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbi.md @@ -0,0 +1,117 @@ +# `dcbi` — Data Cache Block Invalidate + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0003ac` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbi` | `dcbi` | — | Data Cache Block Invalidate | + +## Syntax + +```asm +dcbi [RA0], [RB] +``` + +## Encoding + +### `dcbi` — form `X` + +- **Opcode word:** `0x7c0003ac` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `470` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbi: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbi: read | Source GPR. | + +## Register Effects + +### `dcbi` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:811`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L811) + + + +## Special Cases & Edge Conditions + +- **Privileged.** Unlike `dcbf` and `dcbst`, `dcbi` is supervisor-only. Executing in problem state raises a privileged-instruction (program) interrupt. Game code never issues `dcbi` directly; only the kernel. +- **Drops dirty data.** The line is removed from cache **without** writing back, so any modifications that have not already been pushed to memory are lost. Used only when the underlying memory is being repurposed (e.g. DMA window flip, page demap) and stale dirty data would be incorrect. +- **Cache line size.** Xenon lines are 128 bytes. The low seven bits of `EA` are ignored — the operation targets the cache line that contains `EA`. +- **`RA0` semantics.** When `RA = 0`, base is literal zero, so `dcbi 0, RB` invalidates the line containing address `RB`. +- **Xenia treats it as a no-op.** With no modelled cache, the emulator decodes and advances PC; memory is already authoritative. +- **Sequencing.** Not synchronising. Pair with [`sync`](sync.md) when invalidation must precede a subsequent load on another thread. +- **Architecturally subsumed by `dcbf` for problem state.** Userspace that wants "this line is no longer valuable" must use [`dcbf`](dcbf.md), accepting the write-back cost. + +## Related Instructions + +- [`dcbf`](dcbf.md) — flush (write-back + invalidate); the unprivileged alternative. +- [`dcbst`](dcbst.md) — write-back without invalidate. +- [`dcbz`](dcbz.md), `dcbz128` — allocate-and-zero a line. +- [`dcbt`](dcbt.md), [`dcbtst`](dcbtst.md) — prefetch hints. +- [`icbi`](icbi.md) — instruction-cache analog (also problem-state, not privileged). +- [`sync`](sync.md), [`isync`](isync.md) — pair with cache-control ops for ordering. + +## IBM Reference + +- [AIX 7.3 — `dcbi` (Data Cache Block Invalidate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbi-data-cache-block-invalidate-instruction) +- `PowerISA v2.07B Book II` for the privilege model and cache-coherence rules. diff --git a/migration/project-root/ppc-manual/memory/dcbst.md b/migration/project-root/ppc-manual/memory/dcbst.md new file mode 100644 index 0000000..05126a8 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbst.md @@ -0,0 +1,118 @@ +# `dcbst` — Data Cache Block Store + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00006c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbst` | `dcbst` | — | Data Cache Block Store | + +## Syntax + +```asm +dcbst [RA0], [RB] +``` + +## Encoding + +### `dcbst` — form `X` + +- **Opcode word:** `0x7c00006c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `54` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbst: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbst: read | Source GPR. | + +## Register Effects + +### `dcbst` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbst`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbst"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1134`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1134) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:765`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L765) + + + +## Special Cases & Edge Conditions + +- **Write-through, no invalidate.** If the addressed line is dirty, it is written back to memory; the line itself remains in the cache (clean afterwards). Lighter than `dcbf` — the cache stays warm. +- **Cache line size.** Xenon's line is 128 bytes; the low seven bits of `EA` are ignored. There is no `dcbst128`; the operation is sized to the architectural line. +- **`RA0` semantics.** `RA = 0` selects literal zero as base. `dcbst 0, RB` pushes the line containing address `RB` to memory. +- **Self-modifying code stage 1.** The canonical "patch then run" sequence is `stw` (modify) → `dcbst` (push dirty data to memory) → [`sync`](sync.md) → [`icbi`](icbi.md) (invalidate I-cache for the same address) → [`isync`](isync.md). `dcbst` is preferred over `dcbf` here because it leaves the data in D-cache for any subsequent normal reads. +- **DMA hand-off.** Used before initiating a GPU or DMA read of a buffer the CPU has just written, to ensure memory holds the latest data. +- **Unprivileged.** Available from problem state. +- **Xenia models as no-op.** No cache state is simulated; PC advances and memory is already authoritative. + +## Related Instructions + +- [`dcbf`](dcbf.md) — flush + invalidate (heavier alternative). +- [`dcbi`](dcbi.md) — invalidate without write-back (privileged). +- [`dcbz`](dcbz.md), `dcbz128` — allocate-and-zero. +- [`dcbt`](dcbt.md), [`dcbtst`](dcbtst.md) — prefetch hints. +- [`icbi`](icbi.md) — instruction-cache invalidate, sequenced after `dcbst` in self-modifying-code recipes. +- [`sync`](sync.md), [`isync`](isync.md) — ordering primitives that bracket cache control. + +## IBM Reference + +- [AIX 7.3 — `dcbst` (Data Cache Block Store)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbst-data-cache-block-store-instruction) +- `PowerISA v2.07B Book II` § "Storage Control Instructions". diff --git a/migration/project-root/ppc-manual/memory/dcbt.md b/migration/project-root/ppc-manual/memory/dcbt.md new file mode 100644 index 0000000..d3085df --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbt.md @@ -0,0 +1,117 @@ +# `dcbt` — Data Cache Block Touch + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00022c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbt` | `dcbt` | — | Data Cache Block Touch | + +## Syntax + +```asm +dcbt [RA0], [RB] +``` + +## Encoding + +### `dcbt` — form `X` + +- **Opcode word:** `0x7c00022c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `278` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbt: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbt: read | Source GPR. | + +## Register Effects + +### `dcbt` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbt`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbt"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1142`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1142) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:794`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L794) + + + +## Special Cases & Edge Conditions + +- **Hint, not a guarantee.** `dcbt` requests that the cache line containing `EA` be brought into L1 in anticipation of a future load. The processor is free to ignore the hint (e.g. under cache pressure or for non-cacheable storage). +- **Read-intent prefetch.** Pair-mate of [`dcbtst`](dcbtst.md) (which signals write intent and may prefer an exclusive cache state). Use `dcbt` when the next access is a read. +- **No exception on bad address.** Unlike a real load, `dcbt` to an unmapped or protected page does not raise; the hint is silently dropped. This makes it safe to "speculatively" prefetch one line past the end of a buffer. +- **Cache line size.** Xenon line is 128 bytes; low seven bits of `EA` are ignored. +- **`RA0` semantics.** `RA = 0` selects literal zero — `dcbt 0, RB` prefetches the line containing address `RB`. +- **Stream-engine hints.** The Xenon supports up to four hardware data-streams set up by sequences of `dcbt` with a stride; refer to the XDK for the stream-engine encoding (uses bits ignored by the architectural decode). +- **Xenia treats as no-op.** Hints have no observable effect under the emulated memory model. +- **Unprivileged.** Always available. + +## Related Instructions + +- [`dcbtst`](dcbtst.md) — write-intent prefetch. +- [`dcbf`](dcbf.md), [`dcbst`](dcbst.md), [`dcbi`](dcbi.md) — push / invalidate counterparts. +- [`dcbz`](dcbz.md), `dcbz128` — allocate-and-zero (a stronger "I want this line" signal). +- [`icbi`](icbi.md) — instruction-cache analog (no instruction-cache prefetch in PowerPC). + +## IBM Reference + +- [AIX 7.3 — `dcbt` (Data Cache Block Touch)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbt-data-cache-block-touch-instruction) +- `PowerISA v2.07B Book II` § "Storage Control Instructions" for hint semantics. diff --git a/migration/project-root/ppc-manual/memory/dcbtst.md b/migration/project-root/ppc-manual/memory/dcbtst.md new file mode 100644 index 0000000..92e1e46 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbtst.md @@ -0,0 +1,116 @@ +# `dcbtst` — Data Cache Block Touch for Store + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0001ec` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbtst` | `dcbtst` | — | Data Cache Block Touch for Store | + +## Syntax + +```asm +dcbtst [RA0], [RB] +``` + +## Encoding + +### `dcbtst` — form `X` + +- **Opcode word:** `0x7c0001ec` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `246` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbtst: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbtst: read | Source GPR. | + +## Register Effects + +### `dcbtst` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbtst`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbtst"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1150`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1150) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:792`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L792) + + + +## Special Cases & Edge Conditions + +- **Hint, not a guarantee.** `dcbtst` requests the addressed cache line be brought into L1 in anticipation of a future **store**. Hardware may treat this as a hint to fetch in an exclusive coherence state to avoid a follow-up upgrade. +- **Pair of [`dcbt`](dcbt.md).** `dcbt` signals read intent; `dcbtst` signals write intent. Use `dcbtst` before a planned store loop to avoid stalling on cache-line acquisition. +- **No exception on bad address.** Like `dcbt`, prefetch hints to unmapped or protected pages are silently dropped — no DSI exception. Safe to issue speculatively. +- **Cache line size.** Xenon line is 128 bytes; the low seven bits of `EA` are ignored. +- **`RA0` semantics.** `RA = 0` selects literal zero — `dcbtst 0, RB` prefetches the line containing address `RB`. +- **Often replaced by `dcbz128`.** When code knows it will write the **entire** line, `dcbz128` is preferable: it allocates the line and zeros it without reading from memory at all, beating `dcbtst` + first-store. +- **Xenia treats as no-op.** Hints have no observable effect under the emulated memory model. + +## Related Instructions + +- [`dcbt`](dcbt.md) — read-intent prefetch. +- [`dcbz`](dcbz.md), `dcbz128` — allocate-and-zero (skip the read entirely when writing the whole line). +- [`dcbf`](dcbf.md), [`dcbst`](dcbst.md), [`dcbi`](dcbi.md) — push / invalidate counterparts. +- [`icbi`](icbi.md) — instruction-cache invalidate. + +## IBM Reference + +- [AIX 7.3 — `dcbtst` (Data Cache Block Touch for Store)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbtst-data-cache-block-touch-store-instruction) +- `PowerISA v2.07B Book II` § "Storage Control Instructions". diff --git a/migration/project-root/ppc-manual/memory/dcbz.md b/migration/project-root/ppc-manual/memory/dcbz.md new file mode 100644 index 0000000..c879da6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/dcbz.md @@ -0,0 +1,185 @@ +# `dcbz` — Data Cache Block Clear to Zero + +> **Category:** [Memory](../categories/memory.md) · **Form:** [DCBZ](../forms/DCBZ.md) · **Opcode:** `0x7c0007ec` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `dcbz` | `dcbz` | — | Data Cache Block Clear to Zero | +| `dcbz128` | `dcbz128` | — | Data Cache Block Clear to Zero 128 | + +## Syntax + +```asm +dcbz [RA0], [RB] +dcbz128 [RA0], [RB] +``` + +## Encoding + +### `dcbz` — form `DCBZ` + +- **Opcode word:** `0x7c0007ec` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `1014` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `—` | reserved | +| 11–15 | `RA` | base register (0 ⇒ literal 0) | +| 16–20 | `RB` | offset register | +| 21–30 | `XO` | extended opcode (1014 for dcbz / 1010 for dcbz128) | +| 31 | `—` | reserved | + +### `dcbz128` — form `DCBZ` + +- **Opcode word:** `0x7c2007ec` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `1014` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (31) | +| 6–10 | `—` | reserved | +| 11–15 | `RA` | base register (0 ⇒ literal 0) | +| 16–20 | `RB` | offset register | +| 21–30 | `XO` | extended opcode (1014 for dcbz / 1010 for dcbz128) | +| 31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | dcbz: read; dcbz128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | dcbz: read; dcbz128: read | Source GPR. | + +## Register Effects + +### `dcbz` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `dcbz128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`dcbz`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbz"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1159`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1159) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:886`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L886) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1694-1705`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1694-L1705) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::dcbz => { + // Zero 32 bytes at effective address + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) as u32) & !31; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + for i in 0..8 { + mem.write_u32(ea + i * 4, 0); + } + ctx.pc += 4; + } +``` +
+ +**`dcbz128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="dcbz128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1171`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1171) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:19`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L19) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:887`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L887) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1706-1717`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1706-L1717) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::dcbz128 => { + // Zero 128 bytes + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) as u32) & !127; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + for i in 0..32 { + mem.write_u32(ea + i * 4, 0); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Cache-line size mismatch.** Stock PowerPC `dcbz` zeroes one architectural cache line — 32 bytes on classic POWER, but the **Xenon's L1 line is 128 bytes**. Microsoft added `dcbz128` (encoded with bit-9 set so `RT` field reads as `1`) to clear a true Xenon line in one instruction. Most Xbox 360 code therefore emits `dcbz128`; a stray `dcbz` only zeroes 32 bytes and silently leaves the rest of the line uncleared. +- **Alignment is forced via mask.** The effective address is masked by `~31` (`dcbz`) or `~127` (`dcbz128`) before writing — the low bits are dropped, not validated. Calling `dcbz r0, r3` with `r3 = 0x10037` writes zeros to `0x10000..0x1007F`, not `0x10037..0x100B6`. +- **No memory read; pure write.** Real hardware allocates the line in cache and may skip a read-from-memory fill ("cache-line zero" optimisation). Xenia simulates the architectural effect — 32 (or 128) bytes of zero in target memory — without modelling cache state. +- **`RA0` semantics.** `RA = 0` selects literal zero as the base, so `dcbz128 0, RB` zeros the line containing address `RB`. The update form does not exist for cache-control instructions. +- **Block-fill idiom.** Compilers and hand-written copy loops pair `dcbz128` with `stvx` / `stw` sequences to avoid the cache-line read-allocate that a cold store would trigger. Skipping the read is the entire point. +- **Privilege.** `dcbz` is unprivileged (problem-state); does not require supervisor mode. It can fault on protection or unmapped memory like an ordinary store. +- **Sequencing.** Not synchronising. Pair with [`sync`](sync.md) / [`lwsync`](sync.md) when the zeros must be visible before subsequent loads on another thread. + +## Related Instructions + +- [`dcbf`](dcbf.md) — flush a line back to memory. +- [`dcbst`](dcbst.md) — store-through (write-back without invalidate). +- [`dcbi`](dcbi.md) — invalidate (privileged on most cores). +- [`dcbt`](dcbt.md), [`dcbtst`](dcbtst.md) — touch / touch-for-store hints. +- [`icbi`](icbi.md) — instruction-cache invalidate (companion to data-cache control). +- [`stvx`](stvx.md), [`stw`](stw.md) — typical pair-mates in block-fill loops. + +## IBM Reference + +- [AIX 7.3 — `dcbz` (Data Cache Block Set to Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-dcbz-data-cache-block-set-zero-instruction) +- Microsoft Xbox 360 XDK / `Xenon Programming Guide` — for `dcbz128` specifics; `PowerISA v2.07B Book II` § "Storage Control Instructions" for the architectural baseline. diff --git a/migration/project-root/ppc-manual/memory/icbi.md b/migration/project-root/ppc-manual/memory/icbi.md new file mode 100644 index 0000000..eb8105b --- /dev/null +++ b/migration/project-root/ppc-manual/memory/icbi.md @@ -0,0 +1,117 @@ +# `icbi` — Instruction Cache Block Invalidate + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0007ac` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `icbi` | `icbi` | — | Instruction Cache Block Invalidate | + +## Syntax + +```asm +icbi [RA], [RB] +``` + +## Encoding + +### `icbi` — form `X` + +- **Opcode word:** `0x7c0007ac` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `982` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | icbi: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | icbi: read | Source GPR. | + +## Register Effects + +### `icbi` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`icbi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="icbi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1183`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1183) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:32`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L32) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:850`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L850) + + + +## Special Cases & Edge Conditions + +- **Self-modifying code primitive.** Removes the line containing `EA` from the instruction cache so a subsequent fetch reads from memory. Required after writing new instructions because the I-cache is not coherent with the D-cache or with main memory. +- **Standard recipe.** The full sequence is: `stw` (write new code) → [`dcbst`](dcbst.md) (push dirty data through D-cache to memory) → [`sync`](sync.md) (wait for memory) → `icbi` (drop stale I-cache line) → [`isync`](isync.md) (drain prefetch / refetch). Skipping any of these can leave the CPU executing stale instructions. +- **Cache line size.** Xenon's I-cache line is 128 bytes; the low seven bits of `EA` are ignored. +- **`RA0` semantics.** When `RA = 0`, base is the literal zero. `icbi 0, RB` invalidates the line containing address `RB`. +- **Unprivileged.** `icbi` is problem-state, unlike its data-side cousin [`dcbi`](dcbi.md). +- **No exception on bad address.** Treated as a hint at the hardware level — invalidating an absent line is harmless. +- **Per-thread effect.** On the multithreaded Xenon core, `icbi` propagates across hardware threads sharing the same L1 I-cache; cross-core invalidation requires bus broadcast handled implicitly by the cache coherence protocol. +- **Xenia models as no-op.** No I-cache is simulated; rebuilds of generated code (when applicable) are triggered by the JIT cache-watcher, not by `icbi` itself. + +## Related Instructions + +- [`dcbst`](dcbst.md) — D-cache write-back (paired step before `icbi`). +- [`dcbf`](dcbf.md), [`dcbi`](dcbi.md) — D-cache push / invalidate. +- [`isync`](isync.md) — instruction-stream barrier (paired step after `icbi`). +- [`sync`](sync.md) — full memory barrier between `dcbst` and `icbi`. + +## IBM Reference + +- [AIX 7.3 — `icbi` (Instruction Cache Block Invalidate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-icbi-instruction-cache-block-invalidate-instruction) +- `PowerISA v2.07B Book II` § "Instruction Storage" for the canonical self-modifying-code sequence. diff --git a/migration/project-root/ppc-manual/memory/lbz.md b/migration/project-root/ppc-manual/memory/lbz.md new file mode 100644 index 0000000..1242cb6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lbz.md @@ -0,0 +1,248 @@ +# `lbz` — Load Byte and Zero + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x88000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lbz` | `lbz` | — | Load Byte and Zero | +| `lbzu` | `lbzu` | — | Load Byte and Zero with Update | +| `lbzux` | `lbzux` | — | Load Byte and Zero with Update Indexed | +| `lbzx` | `lbzx` | — | Load Byte and Zero Indexed | + +## Syntax + +```asm +lbz [RD], [d]([RA0]) +lbzu [RD], [d]([RA]) +lbzux [RD], [RA], [RB] +lbzx [RD], [RA0], [RB] +``` + +## Encoding + +### `lbz` — form `D` + +- **Opcode word:** `0x88000000` +- **Primary opcode (bits 0–5):** `34` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lbzu` — form `D` + +- **Opcode word:** `0x8c000000` +- **Primary opcode (bits 0–5):** `35` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lbzux` — form `X` + +- **Opcode word:** `0x7c0000ee` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `119` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lbzx` — form `X` + +- **Opcode word:** `0x7c0000ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `87` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lbz: read; lbzx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lbz: read; lbzu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RD` | lbz: write; lbzu: write; lbzux: write; lbzx: write | Destination GPR. | +| `RA` | lbzu: read; lbzu: write; lbzux: read; lbzux: write | Source GPR (`r0`–`r31`). | +| `RB` | lbzux: read; lbzx: read | Source GPR. | + +## Register Effects + +### `lbz` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `lbzu` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lbzux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lbzx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +RT <- 0x00000000_000000_00 || MEM(EA, 1) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lbz`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lbz"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:72`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L72) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:34`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L34) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:357`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L357) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1024-1029`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1024-L1029) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lbz => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u8(ea) as u64; + ctx.pc += 4; + } +``` +
+ +**`lbzu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lbzu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:92`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L92) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:34`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L34) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:358`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L358) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1030-1035`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1030-L1035) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lbzu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u8(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lbzux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lbzux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:104`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L104) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:34`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L34) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:776`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L776) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1042-1047`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1042-L1047) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lbzux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u8(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lbzx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lbzx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:115`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L115) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:34`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L34) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:774`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L774) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1036-1041`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1036-L1041) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lbzx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u8(ea) as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single-byte read.** The smallest scalar load. No endian concerns at the byte level — `MEM(EA, 1)` returns the literal byte at address `EA`, regardless of host or target byte order. +- **Zero-extension to 64 bits.** The high 56 bits of `RT` become zero. Use [`lha`](lha.md) / [`lhax`](lha.md) family for sign-extending byte-equivalent semantics; there is no PowerPC "load byte sign-extended" — you must `lbz` then `extsb` (or use `lha` on a half). +- **`RA0` (non-update forms).** When `RA = 0` in `lbz` / `lbzx`, the base is the literal zero, so `lbz RT, 0x4000(0)` reads from absolute address `0x4000`. Update forms `lbzu` / `lbzux` invoke `RA = 0` (and `RA = RT`) as invalid forms; xenia's interpreter does not check, so well-formed compiler output is assumed. +- **Update-form post-write.** `lbzu` / `lbzux` write the computed `EA` back to `RA` after the load; the snapshot first reads, then assigns `RA ← EA`, matching IBM's "the load and update happen as one operation" wording. +- **No alignment requirement.** A byte load is intrinsically aligned. Xenon does not raise alignment exceptions for any byte access. +- **Common in string and table-lookup code.** Most uses are character-string scans, jump-table dispatches, and packed-bool reads. Compilers also use `lbz` to materialise small immediate constants stored in `.rodata`. + +## Related Instructions + +- [`lhz`](lhz.md), [`lwz`](lwz.md), [`ld`](ld.md) — wider zero-extending loads in the same family. +- [`lha`](lha.md), [`lwa`](lwa.md) — sign-extending siblings (no `lba` exists; use `lbz` + `extsb`). +- [`stb`](stb.md), [`stbu`](stb.md), [`stbx`](stb.md), [`stbux`](stb.md) — the corresponding stores. +- [`lwbrx`](lwbrx.md), [`lhbrx`](lhbrx.md) — byte-reversed multi-byte loads (no byte-equivalent needed). +- [`lmw`](lmw.md), [`lswi`](lswi.md), [`lswx`](lswx.md) — multi-word / string loads for bulk transfer. + +## IBM Reference + +- [AIX 7.3 — `lbz` (Load Byte and Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lbz-load-byte-zero-instruction) +- [AIX 7.3 — `lbzu` (Load Byte and Zero with Update)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lbzu-load-byte-zero-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/ld.md b/migration/project-root/ppc-manual/memory/ld.md new file mode 100644 index 0000000..129e7e0 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/ld.md @@ -0,0 +1,251 @@ +# `ld` — Load Doubleword + +> **Category:** [Memory](../categories/memory.md) · **Form:** [DS](../forms/DS.md) · **Opcode:** `0xe8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `ld` | `ld` | — | Load Doubleword | +| `ldu` | `ldu` | — | Load Doubleword with Update | +| `ldux` | `ldux` | — | Load Doubleword with Update Indexed | +| `ldx` | `ldx` | — | Load Doubleword Indexed | + +## Syntax + +```asm +ld [RD], [ds]([RA0]) +ldu [RD], [ds]([RA]) +ldux [RD], [RA], [RB] +ldx [RD], [RA0], [RB] +``` + +## Encoding + +### `ld` — form `DS` + +- **Opcode word:** `0xe8000000` +- **Primary opcode (bits 0–5):** `58` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +### `ldu` — form `DS` + +- **Opcode word:** `0xe8000001` +- **Primary opcode (bits 0–5):** `58` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +### `ldux` — form `X` + +- **Opcode word:** `0x7c00006a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `53` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `ldx` — form `X` + +- **Opcode word:** `0x7c00002a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `21` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | ld: read; ldx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `ds` | ld: read; ldu: read | 14-bit signed word-aligned displacement (`DS << 2`). | +| `RD` | ld: write; ldu: write; ldux: write; ldx: write | Destination GPR. | +| `RA` | ldu: read; ldu: write; ldux: read; ldux: write | Source GPR (`r0`–`r31`). | +| `RB` | ldux: read; ldx: read | Source GPR. | + +## Register Effects + +### `ld` + +- **Reads (always):** `RA0`, `ds` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `ldu` + +- **Reads (always):** `RA`, `ds` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `ldux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `ldx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(ds || 0b00) +RT <- MEM(EA, 8) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`ld`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ld"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:347`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L347) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:380`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L380) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1096-1101`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1096-L1101) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ld => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.ds() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u64(ea); + ctx.pc += 4; + } +``` +
+ +**`ldu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ldu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:367`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L367) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:381`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L381) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1126-1131`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1126-L1131) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ldu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.ds() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u64(ea); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`ldux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ldux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:378`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L378) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:764`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L764) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1132-1137`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1132-L1137) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ldux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u64(ea); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`ldx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ldx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:389`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L389) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:755`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L755) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1102-1107`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1102-L1107) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ldx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u64(ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **DS-form, not D-form.** The displacement is 14 bits scaled by 4 (`EXTS(ds || 0b00)`), giving a signed range of ±32 KiB in 4-byte steps. Bits 30–31 are the extended opcode used to distinguish `ld` (XO=0) from `ldu` (XO=1). The assembler accepts a normal byte displacement and verifies divisibility by 4. +- **Big-endian read.** The 64 bits at `EA..EA+7` form the loaded value, most-significant byte first. Xenia-rs's `mem.read_u64` returns the host-native value of that big-endian doubleword. +- **No zero/sign-extension question.** `ld` already fills the entire 64-bit register; there is no `lda` (load doubleword algebraic) — the doubleword is the architectural maximum. +- **`RA0` (non-update forms).** `RA = 0` in `ld` and `ldx` means base is literal zero. `ld RT, 0x100(0)` reads from absolute `0x100`. +- **Update-form invalid forms.** `ldu` / `ldux` invoke "RA = 0" and "RA = RT" as invalid forms. AIX docs say results are undefined; xenia performs the read first, then writes back `RA ← EA`, which would silently destroy the loaded value if `RA == RT`. +- **Alignment.** Xenon does not enforce doubleword alignment for `ld` itself — unaligned 8-byte loads are tolerated. However, real POWER cores may take an alignment exception on some implementations; portable code keeps doublewords 8-byte aligned. +- **64-bit pointer / counter loads.** Although Xbox 360 user code is 32-bit, kernel structures and TOC entries are doublewords; `ld` is the standard load for them. + +## Related Instructions + +- [`lwz`](lwz.md), [`lhz`](lhz.md), [`lbz`](lbz.md) — narrower zero-extending loads. +- [`lwa`](lwa.md), [`lha`](lha.md) — sign-extending loads (no `lda` exists; `ld` already fills the register). +- [`ldbrx`](ldbrx.md) — byte-reversed doubleword load. +- [`ldarx`](ldarx.md) / [`stdcx`](stdcx.md) — load-reserve / store-conditional doubleword pair. +- [`std`](std.md), [`stdu`](std.md), [`stdx`](std.md), [`stdux`](std.md) — corresponding stores. + +## IBM Reference + +- [AIX 7.3 — `ld` (Load Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-ld-load-doubleword-instruction) +- [AIX 7.3 — `ldu` / `ldux` / `ldx`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-ldu-load-doubleword-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/ldarx.md b/migration/project-root/ppc-manual/memory/ldarx.md new file mode 100644 index 0000000..f70716a --- /dev/null +++ b/migration/project-root/ppc-manual/memory/ldarx.md @@ -0,0 +1,138 @@ +# `ldarx` — Load Doubleword and Reserve Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0000a8` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `ldarx` | `ldarx` | — | Load Doubleword and Reserve Indexed | + +## Syntax + +```asm +ldarx [RD], [RA0], [RB] +``` + +## Encoding + +### `ldarx` — form `X` + +- **Opcode word:** `0x7c0000a8` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `84` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | ldarx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | ldarx: read | Source GPR. | +| `RD` | ldarx: write | Destination GPR. | + +## Register Effects + +### `ldarx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`ldarx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ldarx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:765`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L765) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:772`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L772) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4559-4573`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4559-L4573) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ldarx => { + let ea = ea_indexed(ctx, instr); + let val = mem.read_u64(ea); + ctx.gpr[instr.rd()] = val; + ctx.reserved_line = ea & !RESERVATION_MASK; + ctx.reserved_val = val; + ctx.has_reservation = true; + ctx.reservation_width = 8; // PPCBUG-151: doubleword reservation + if let Some(t) = &ctx.reservation_table { + if t.is_enabled() { + ctx.reserved_generation = t.reserve(ea, ctx.hw_id); + } + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reservation set.** Loads the doubleword at `EA` and atomically establishes a *reservation* on that address. A subsequent [`stdcx`](stdcx.md) at the same address completes only if the reservation is still valid. Together they form a standard load-linked / store-conditional pair for lock-free updates. +- **One reservation per thread.** Xenia tracks `reserved_addr` / `reserved_val` / `has_reservation` per-context (see snapshot). Hardware behaves the same: each hardware thread holds at most one reservation at a time. A new `ldarx` (or `lwarx`) discards the prior reservation. +- **Granule.** Architecturally the reservation covers a single naturally-aligned doubleword (8 bytes). On Xenon the practical reservation granule is one **cache line** (128 bytes) — any store to that line by another agent loses the reservation. Xenia simplifies to per-address tracking. +- **Alignment requirement.** `EA` must be 8-byte aligned. An unaligned `ldarx` raises an alignment exception on hardware. Xenia does not check; pass aligned addresses. +- **`RA0` semantics.** When `RA = 0`, base is literal zero — `ldarx RT, 0, RB` reads at exact `RB`. Used in synthetic-zero atomic-init idioms, but rare. +- **Reservation-loss events.** Any exception, context switch, or store by another thread to the reserved line clears the reservation. Application code must treat the `stdcx` failure as a normal retry condition, not as an error. +- **Pair atomically.** Code must be `ldarx ... do work ... stdcx.` with no intervening loads or stores that could be re-ordered. Optionally fence with [`lwsync`](sync.md) inside the loop. The conditional store sets `CR0[EQ]` to report success. + +## Related Instructions + +- [`stdcx`](stdcx.md) — store-conditional doubleword (the matching half of the pair). +- [`lwarx`](lwarx.md) / [`stwcx`](stwcx.md) — 32-bit reservation pair. +- [`ld`](ld.md), [`ldx`](ld.md) — non-reserving doubleword loads. +- [`sync`](sync.md), [`lwsync`](sync.md) — barriers commonly placed around reservation pairs. + +## IBM Reference + +- [AIX 7.3 — `ldarx` (Load Doubleword and Reserve Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-ldarx-load-double-word-reserve-indexed-instruction) +- `PowerISA v2.07B Book II` § "Atomic Update Primitives" for full reservation semantics and granule rules. diff --git a/migration/project-root/ppc-manual/memory/ldbrx.md b/migration/project-root/ppc-manual/memory/ldbrx.md new file mode 100644 index 0000000..6c88dfc --- /dev/null +++ b/migration/project-root/ppc-manual/memory/ldbrx.md @@ -0,0 +1,128 @@ +# `ldbrx` — Load Doubleword Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000428` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `ldbrx` | `ldbrx` | — | Load Doubleword Byte-Reverse Indexed | + +## Syntax + +```asm +ldbrx [RD], [RA0], [RB] +``` + +## Encoding + +### `ldbrx` — form `X` + +- **Opcode word:** `0x7c000428` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `532` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | ldbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | ldbrx: read | Source GPR. | +| `RD` | ldbrx: write | Destination GPR. | + +## Register Effects + +### `ldbrx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`ldbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="ldbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:654`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L654) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:36`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L36) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:816`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L816) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4627-4631`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4627-L4631) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::ldbrx => { + let ea = ea_indexed(ctx, instr); + ctx.gpr[instr.rd()] = mem.read_u64(ea).swap_bytes(); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reads little-endian.** `ldbrx` loads 8 bytes and reverses byte order before placing them in `RT`. With Xenon's PowerPC big-endian world view, the architectural effect is "load a little-endian doubleword as if it were big-endian" — useful when consuming network buffers, file headers (PNG IHDR, ZIP CRC32, etc.), or PC-side data structures that store little-endian. +- **Implementation detail.** The xenia snapshot calls `mem.read_u64(ea).swap_bytes()`. `read_u64` already returns the host-native value of the big-endian doubleword at `EA`; `swap_bytes` then flips it, giving the little-endian interpretation. Equivalent to four sequential `lbz` plus shifts, but issued as one micro-op. +- **No update form, X-form only.** PowerPC byte-reverse loads come in indexed form only (no `ldbrxu` or DS-form). `EA = (RA|0) + RB`. To increment a pointer, fold the increment into `RB` or use a separate `addi`. +- **`RA0` semantics.** When `RA = 0`, base is the literal zero; `ldbrx RT, 0, RB` reads at exact `RB`. +- **Alignment.** Like the rest of the byte-reverse family, `ldbrx` does **not** require natural alignment on hardware; the load is done as eight byte reads internally. Xenon may take an alignment exception on cache-inhibited storage. +- **No corresponding sign-extension.** The output is the literal byte-reversed bit pattern; it occupies the full 64-bit register. Use shifts or `extsw`/`extsh` afterwards if a sign-extended narrower datum is desired. +- **Pair with [`stdbrx`](stdbrx.md).** The store side performs the inverse: takes the GPR value, reverses, writes 8 bytes. + +## Related Instructions + +- [`stdbrx`](stdbrx.md) — store doubleword byte-reverse indexed. +- [`lwbrx`](lwbrx.md), [`lhbrx`](lhbrx.md) — narrower byte-reverse loads (word, halfword). +- [`stwbrx`](stwbrx.md), [`sthbrx`](sthbrx.md) — narrower byte-reverse stores. +- [`ld`](ld.md), [`ldx`](ld.md) — non-reversing doubleword loads. + +## IBM Reference + +- [AIX 7.3 — `ldbrx` (Load Doubleword Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-ldbrx-load-double-word-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/lfd.md b/migration/project-root/ppc-manual/memory/lfd.md new file mode 100644 index 0000000..e6a87b1 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lfd.md @@ -0,0 +1,248 @@ +# `lfd` — Load Floating-Point Double + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xc8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lfd` | `lfd` | — | Load Floating-Point Double | +| `lfdu` | `lfdu` | — | Load Floating-Point Double with Update | +| `lfdux` | `lfdux` | — | Load Floating-Point Double with Update Indexed | +| `lfdx` | `lfdx` | — | Load Floating-Point Double Indexed | + +## Syntax + +```asm +lfd [FD], [d]([RA0]) +lfdu [FD], [d]([RA]) +lfdux [FD], [RA], [RB] +lfdx [FD], [RA0], [RB] +``` + +## Encoding + +### `lfd` — form `D` + +- **Opcode word:** `0xc8000000` +- **Primary opcode (bits 0–5):** `50` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lfdu` — form `D` + +- **Opcode word:** `0xcc000000` +- **Primary opcode (bits 0–5):** `51` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lfdux` — form `X` + +- **Opcode word:** `0x7c0004ee` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `631` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lfdx` — form `X` + +- **Opcode word:** `0x7c0004ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `599` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lfd: read; lfdx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lfd: read; lfdu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `FD` | lfd: write; lfdu: write; lfdux: write; lfdx: write | Destination floating-point register. | +| `RA` | lfdu: read; lfdu: write; lfdux: read; lfdux: write | Source GPR (`r0`–`r31`). | +| `RB` | lfdux: read; lfdx: read | Source GPR. | + +## Register Effects + +### `lfd` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** _none_ + +### `lfdu` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `RA` +- **Writes (conditional):** _none_ + +### `lfdux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `RA` +- **Writes (conditional):** _none_ + +### `lfdx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +FRT <- MEM(EA, 8) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lfd`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfd"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:912`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L912) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:373`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L373) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1152-1157`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1152-L1157) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfd => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.fpr[instr.rd()] = mem.read_f64(ea); + ctx.pc += 4; + } +``` +
+ +**`lfdu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfdu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:925`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L925) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:374`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L374) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1176-1181`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1176-L1181) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfdu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.fpr[instr.rd()] = mem.read_f64(ea); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lfdux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfdux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:936`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L936) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:827`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L827) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1182-1187`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1182-L1187) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfdux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.fpr[instr.rd()] = mem.read_f64(ea); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lfdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:947`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L947) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:826`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L826) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1158-1163`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1158-L1163) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfdx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.fpr[instr.rd()] = mem.read_f64(ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-exact double load.** Reads 8 bytes and places them directly into `FRT` as IEEE-754 binary64. No format conversion is performed (contrast `lfs`, which expands single→double). +- **No FPSCR side effects.** `lfd` cannot raise IEEE exceptions: it neither rounds nor inspects the value. A signalling NaN read this way stays a signalling NaN until it is consumed by an arithmetic op. +- **`RA0` semantics.** In the non-update forms (`lfd`, `lfdx`), `RA = 0` selects literal zero — `lfd FT, 0(0)` loads from absolute address 0. Update forms `lfdu` / `lfdux` invoke `RA = 0` and `RA = RT` (here `RA` is GPR; `RT` is FPR, so the latter cannot collide) as invalid forms when `RA = 0`. +- **Alignment.** Xenon tolerates unaligned 8-byte FP loads; PowerISA technically permits implementations to raise alignment exceptions for FP loads, so portable code uses 8-byte aligned addresses. +- **Big-endian read.** Bytes are interpreted big-endian: byte at `EA` is bits 0–7 of the IEEE pattern (sign + part of exponent), byte at `EA+7` is bits 56–63 of the mantissa. `mem.read_f64` in xenia handles the host-side byte-swap. +- **MSR[FP] required.** Like all FP-register accesses, `lfd` requires the FP unit be enabled (MSR[FP]=1). Otherwise a Floating-Point Unavailable interrupt is raised. Xenia assumes FP is always enabled in user code. +- **Pair with [`stfd`](stfd.md).** Store-double is the symmetric counterpart. + +## Related Instructions + +- [`lfs`](lfs.md) — single-precision load with format conversion to double. +- [`stfd`](stfd.md), [`stfdu`](stfd.md), [`stfdx`](stfd.md), [`stfdux`](stfd.md) — corresponding stores. +- [`stfiwx`](stfiwx.md) — store-FP-as-integer-word (the asymmetric oddity in the FP load/store family). +- [`ld`](ld.md) — integer doubleword load (same width, GPR target). + +## IBM Reference + +- [AIX 7.3 — `lfd` (Load Floating-Point Double)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lfd-load-floating-point-double-instruction) +- [AIX 7.3 — `lfdu` / `lfdx` / `lfdux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lfdu-load-floating-point-double-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/lfs.md b/migration/project-root/ppc-manual/memory/lfs.md new file mode 100644 index 0000000..e021a31 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lfs.md @@ -0,0 +1,249 @@ +# `lfs` — Load Floating-Point Single + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xc0000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lfs` | `lfs` | — | Load Floating-Point Single | +| `lfsu` | `lfsu` | — | Load Floating-Point Single with Update | +| `lfsux` | `lfsux` | — | Load Floating-Point Single with Update Indexed | +| `lfsx` | `lfsx` | — | Load Floating-Point Single Indexed | + +## Syntax + +```asm +lfs [FD], [d]([RA0]) +lfsu [FD], [d]([RA]) +lfsux [FD], [RA], [RB] +lfsx [FD], [RA0], [RB] +``` + +## Encoding + +### `lfs` — form `D` + +- **Opcode word:** `0xc0000000` +- **Primary opcode (bits 0–5):** `48` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lfsu` — form `D` + +- **Opcode word:** `0xc4000000` +- **Primary opcode (bits 0–5):** `49` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lfsux` — form `X` + +- **Opcode word:** `0x7c00046e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `567` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lfsx` — form `X` + +- **Opcode word:** `0x7c00042e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `535` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lfs: read; lfsx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lfs: read; lfsu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `FD` | lfs: write; lfsu: write; lfsux: write; lfsx: write | Destination floating-point register. | +| `RA` | lfsu: read; lfsu: write; lfsux: read; lfsux: write | Source GPR (`r0`–`r31`). | +| `RB` | lfsux: read; lfsx: read | Source GPR. | + +## Register Effects + +### `lfs` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** _none_ + +### `lfsu` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `RA` +- **Writes (conditional):** _none_ + +### `lfsux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD`, `RA` +- **Writes (conditional):** _none_ + +### `lfsx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `FD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +FRT <- DoubleFromSingle(MEM(EA, 4)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lfs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:960`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L960) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:371`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L371) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1140-1145`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1140-L1145) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfs => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.fpr[instr.rd()] = mem.read_f32(ea) as f64; + ctx.pc += 4; + } +``` +
+ +**`lfsu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfsu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:974`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L974) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:372`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L372) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1164-1169`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1164-L1169) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfsu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.fpr[instr.rd()] = mem.read_f32(ea) as f64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lfsux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfsux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:986`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L986) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:823`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L823) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1170-1175`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1170-L1175) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfsux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.fpr[instr.rd()] = mem.read_f32(ea) as f64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lfsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lfsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:998`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L998) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:38`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L38) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:819`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L819) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1146-1151`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1146-L1151) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lfsx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.fpr[instr.rd()] = mem.read_f32(ea) as f64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single → double in-register.** Reads 4 bytes as IEEE binary32, then exactly converts to binary64 (every binary32 has a representation in binary64). The result occupies all 64 bits of the FPR; subsequent FP arithmetic operates in double regardless of the value's origin. +- **No FPSCR side effects.** The single→double widening is exact, so `lfs` cannot raise inexact, overflow, underflow, or invalid. A signalling NaN passes through unchanged into the FPR — it will signal at the next FP arithmetic instruction. +- **Subnormals.** A binary32 subnormal expands to a binary64 normal — `lfs` quietly normalises. There is no "FPSCR[NI] non-IEEE mode" subnormal-to-zero behaviour applied at this stage on Xenon (NI affects arithmetic, not loads). +- **`RA0` semantics.** In `lfs` / `lfsx`, `RA = 0` selects literal zero. Update forms `lfsu` / `lfsux` are invalid with `RA = 0`. +- **Alignment.** Xenon tolerates unaligned 4-byte loads; PowerISA permits implementations to raise alignment exceptions for FP loads on cache-inhibited storage. +- **Big-endian read.** Bytes `EA..EA+3` form the binary32 pattern, sign bit at `EA[7]`. Xenia's `mem.read_f32` handles host byte-swap. +- **MSR[FP] required.** Disabled FP unit raises Floating-Point Unavailable. +- **Pair with [`stfs`](stfs.md).** Store-single performs the inverse double→single rounding (which **can** raise FPSCR exceptions because that direction may be inexact). + +## Related Instructions + +- [`lfd`](lfd.md) — double-precision load (no format conversion). +- [`stfs`](stfs.md), [`stfsu`](stfs.md), [`stfsx`](stfs.md), [`stfsux`](stfs.md) — corresponding stores; these can round. +- [`stfiwx`](stfiwx.md) — store-FP-as-integer-word. +- [`lwz`](lwz.md) — integer word load (same width, GPR target). + +## IBM Reference + +- [AIX 7.3 — `lfs` (Load Floating-Point Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lfs-load-floating-point-single-instruction) +- [AIX 7.3 — `lfsu` / `lfsx` / `lfsux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lfsu-load-floating-point-single-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/lha.md b/migration/project-root/ppc-manual/memory/lha.md new file mode 100644 index 0000000..52c20d6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lha.md @@ -0,0 +1,249 @@ +# `lha` — Load Half Word Algebraic + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xa8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lha` | `lha` | — | Load Half Word Algebraic | +| `lhau` | `lhau` | — | Load Half Word Algebraic with Update | +| `lhaux` | `lhaux` | — | Load Half Word Algebraic with Update Indexed | +| `lhax` | `lhax` | — | Load Half Word Algebraic Indexed | + +## Syntax + +```asm +lha [RD], [d]([RA0]) +lhau [RD], [d]([RA]) +lhaux [RD], [RA], [RB] +lhax [RD], [RA0], [RB] +``` + +## Encoding + +### `lha` — form `D` + +- **Opcode word:** `0xa8000000` +- **Primary opcode (bits 0–5):** `42` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lhau` — form `D` + +- **Opcode word:** `0xac000000` +- **Primary opcode (bits 0–5):** `43` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lhaux` — form `X` + +- **Opcode word:** `0x7c0002ee` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `375` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lhax` — form `X` + +- **Opcode word:** `0x7c0002ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `343` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lha: read; lhax: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lha: read; lhau: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RD` | lha: write; lhau: write; lhaux: write; lhax: write | Destination GPR. | +| `RA` | lhau: read; lhau: write; lhaux: read; lhaux: write | Source GPR (`r0`–`r31`). | +| `RB` | lhaux: read; lhax: read | Source GPR. | + +## Register Effects + +### `lha` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `lhau` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lhaux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lhax` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +RT <- SEXT16_to_64(MEM(EA, 2)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lha`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lha"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:128`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L128) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:365`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L365) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1066-1071`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1066-L1071) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lha => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64; + ctx.pc += 4; + } +``` +
+ +**`lhau`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhau"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:149`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L149) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:366`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L366) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1084-1089`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1084-L1089) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhau => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lhaux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhaux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:162`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L162) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:805`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L805) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1090-1095`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1090-L1095) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhaux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lhax`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhax"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:173`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L173) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:801`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L801) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1072-1077`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1072-L1077) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhax => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as i16 as i32 as u32 as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extending half-word load.** Reads 2 bytes big-endian, treats them as a signed 16-bit integer, sign-extends to 64 bits. Compare with [`lhz`](lhz.md), which zero-extends. Xenia's snapshot does the cast chain `u16 -> i16 -> i64 -> u64` to obtain the canonical sign-extended bit pattern. +- **Big-endian read.** Byte at `EA` is the most-significant 8 bits of the half; byte at `EA+1` is the least-significant. On little-endian hosts `mem.read_u16` returns the big-endian word in host-native form already. +- **`RA0` (non-update forms).** `RA = 0` in `lha` and `lhax` selects literal zero — useful for absolute-address access patterns. +- **Update-form invalid forms.** `lhau` / `lhaux` invoke `RA = 0` and `RA = RT` as invalid forms; xenia performs the load before writing back `RA ← EA`, so an `RA = RT` collision silently destroys the loaded value. +- **No alignment requirement.** Xenon executes unaligned half-word loads without a fault. +- **Common in audio / graphics code.** `lha` is the standard load for signed 16-bit PCM samples and signed 16-bit packed vertex deltas. +- **Use `lha` rather than `lhz` + `extsh`.** Both produce the same result, but `lha` is one fused instruction and the compiler will pick it whenever the source type is `int16_t` / `short`. + +## Related Instructions + +- [`lhz`](lhz.md), [`lhzu`](lhz.md), [`lhzx`](lhz.md), [`lhzux`](lhz.md) — zero-extending counterparts. +- [`lwa`](lwa.md), [`lwax`](lwa.md), [`lwaux`](lwaux.md) — sign-extending word loads (32→64). +- [`lbz`](lbz.md) — byte load (no sign-extending byte load exists; use `lbz` + `extsb`). +- [`lhbrx`](lhbrx.md) — byte-reversed half-word load (zero-extending). +- [`sth`](sth.md), [`sthu`](sth.md), [`sthx`](sth.md), [`sthux`](sth.md) — corresponding stores. + +## IBM Reference + +- [AIX 7.3 — `lha` (Load Half Algebraic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lha-load-half-algebraic-instruction) +- [AIX 7.3 — `lhau` / `lhax` / `lhaux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lhau-load-half-algebraic-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/lhbrx.md b/migration/project-root/ppc-manual/memory/lhbrx.md new file mode 100644 index 0000000..433c083 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lhbrx.md @@ -0,0 +1,129 @@ +# `lhbrx` — Load Half Word Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00062c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lhbrx` | `lhbrx` | — | Load Half Word Byte-Reverse Indexed | + +## Syntax + +```asm +lhbrx [RD], [RA0], [RB] +``` + +## Encoding + +### `lhbrx` — form `X` + +- **Opcode word:** `0x7c00062c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `790` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lhbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lhbrx: read | Source GPR. | +| `RD` | lhbrx: write | Destination GPR. | + +## Register Effects + +### `lhbrx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lhbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:628`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L628) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:839`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L839) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1806-1812`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1806-L1812) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhbrx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let val = mem.read_u16(ea); + ctx.gpr[instr.rd()] = val.swap_bytes() as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reads little-endian half.** Loads 2 bytes and swaps them: byte at `EA` becomes the low 8 bits of `RT[16:23]`, byte at `EA+1` becomes the upper 8 bits. The xenia snapshot does `mem.read_u16(ea).swap_bytes()`. Effective for parsing little-endian on-disk or network half-word fields. +- **Zero-extension to 64 bits.** Result occupies the full 64-bit GPR; high 48 bits are zero. There is no sign-extending byte-reverse load (`lhbrx` + `extsh` if you need one). +- **X-form only — no update form.** Like all byte-reverse loads, only the indexed form exists. `EA = (RA|0) + RB`. Pointer-bumping requires a separate `addi`. +- **`RA0` semantics.** When `RA = 0`, base is the literal zero — `lhbrx RT, 0, RB` reads at exact `RB`. +- **Alignment.** Hardware tolerates unaligned half-word reads. Xenon may take alignment exceptions on cache-inhibited storage. +- **Common in stream parsers.** PNG, ZIP, BMP, WAV chunk decoders use `lhbrx` to read little-endian length fields. + +## Related Instructions + +- [`lwbrx`](lwbrx.md), [`ldbrx`](ldbrx.md) — wider byte-reverse loads. +- [`sthbrx`](sthbrx.md) — store-half byte-reverse counterpart. +- [`lhz`](lhz.md), [`lhzx`](lhz.md) — non-reversing zero-extending half loads. +- [`lha`](lha.md) — non-reversing sign-extending half load. + +## IBM Reference + +- [AIX 7.3 — `lhbrx` (Load Half Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lhbrx-load-half-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/lhz.md b/migration/project-root/ppc-manual/memory/lhz.md new file mode 100644 index 0000000..f21537e --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lhz.md @@ -0,0 +1,248 @@ +# `lhz` — Load Half Word and Zero + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xa0000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lhz` | `lhz` | — | Load Half Word and Zero | +| `lhzu` | `lhzu` | — | Load Half Word and Zero with Update | +| `lhzux` | `lhzux` | — | Load Half Word and Zero with Update Indexed | +| `lhzx` | `lhzx` | — | Load Half Word and Zero Indexed | + +## Syntax + +```asm +lhz [RD], [d]([RA0]) +lhzu [RD], [d]([RA]) +lhzux [RD], [RA], [RB] +lhzx [RD], [RA0], [RB] +``` + +## Encoding + +### `lhz` — form `D` + +- **Opcode word:** `0xa0000000` +- **Primary opcode (bits 0–5):** `40` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lhzu` — form `D` + +- **Opcode word:** `0xa4000000` +- **Primary opcode (bits 0–5):** `41` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lhzux` — form `X` + +- **Opcode word:** `0x7c00026e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `311` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lhzx` — form `X` + +- **Opcode word:** `0x7c00022e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `279` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lhz: read; lhzx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lhz: read; lhzu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RD` | lhz: write; lhzu: write; lhzux: write; lhzx: write | Destination GPR. | +| `RA` | lhzu: read; lhzu: write; lhzux: read; lhzux: write | Source GPR (`r0`–`r31`). | +| `RB` | lhzux: read; lhzx: read | Source GPR. | + +## Register Effects + +### `lhz` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `lhzu` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lhzux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lhzx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +RT <- ZEXT16_to_64(MEM(EA, 2)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lhz`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhz"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:186`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L186) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:363`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L363) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1048-1053`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1048-L1053) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhz => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as u64; + ctx.pc += 4; + } +``` +
+ +**`lhzu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhzu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:207`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L207) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:364`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L364) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1054-1059`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1054-L1059) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhzu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lhzux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhzux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:220`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L220) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:797`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L797) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1078-1083`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1078-L1083) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhzux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lhzx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lhzx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:231`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L231) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:40`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L40) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:795`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L795) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1060-1065`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1060-L1065) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lhzx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u16(ea) as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Big-endian read, zero-extension.** Reads 2 bytes big-endian, treats them as an unsigned 16-bit integer, zero-extends to 64 bits. The high 48 bits of `RT` become zero. Compare with [`lha`](lha.md), which sign-extends. +- **`RA0` (non-update forms).** `RA = 0` in `lhz` / `lhzx` selects literal zero for absolute-address access. Update forms `lhzu` / `lhzux` invoke `RA = 0` and `RA = RT` as invalid forms. +- **Update-form ordering.** Xenia computes `EA`, performs the load, then writes `RA ← EA`. If `RA == RT` (an invalid form per IBM), the load result is overwritten by `EA` immediately. +- **No alignment requirement.** Xenon executes unaligned half-word loads without faulting. `MEM(EA, 2)` reads the two consecutive bytes at `EA`. +- **Common as Unicode codepoint loader.** Xbox 360 system strings are UTF-16; `lhz` is the canonical load for a single 16-bit codepoint. +- **Use `lhz` rather than `lbz` × 2 + shift.** One fused instruction is faster and lets the load-store unit handle alignment. +- **Indexed variant operand order.** `lhzx RT, RA, RB` — `RA` is the base (with `RA0` semantics), `RB` is the offset. + +## Related Instructions + +- [`lha`](lha.md), [`lhau`](lha.md), [`lhax`](lha.md), [`lhaux`](lha.md) — sign-extending counterparts. +- [`lbz`](lbz.md), [`lwz`](lwz.md), [`ld`](ld.md) — narrower / wider zero-extending loads. +- [`lhbrx`](lhbrx.md) — byte-reversed half load (little-endian half). +- [`sth`](sth.md), [`sthu`](sth.md), [`sthx`](sth.md), [`sthux`](sth.md) — corresponding stores. + +## IBM Reference + +- [AIX 7.3 — `lhz` (Load Half and Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lhz-load-half-zero-instruction) +- [AIX 7.3 — `lhzu` / `lhzx` / `lhzux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lhzu-load-half-zero-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/lmw.md b/migration/project-root/ppc-manual/memory/lmw.md new file mode 100644 index 0000000..b19cb3e --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lmw.md @@ -0,0 +1,133 @@ +# `lmw` — Load Multiple Word + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xb8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lmw` | `lmw` | — | Load Multiple Word | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `lmw` — form `D` + +- **Opcode word:** `0xb8000000` +- **Primary opcode (bits 0–5):** `46` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `lmw` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lmw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lmw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:705`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L705) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:42`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L42) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:369`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L369) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1720-1734`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1720-L1734) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lmw => { + // PPCBUG-125: PowerISA marks `lmw` invalid when rA is in [rT..31]; + // canary skips the write to rA in that case to preserve the EA base. + let mut ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + ea = ea.wrapping_add(instr.d() as i64 as u64); + for r in instr.rd()..32 { + if r == instr.ra() { + ea = ea.wrapping_add(4); + continue; + } + ctx.gpr[r] = mem.read_u32(ea as u32) as u64; + ea = ea.wrapping_add(4); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bulk register restore.** Loads `(32 - RT)` consecutive 32-bit words starting at `EA` into `RT`, `RT+1`, …, `r31`. Used by AIX/PowerPC ABI prologues/epilogues to restore non-volatile GPRs in one instruction. Modern compilers prefer multiple `lwz` for scheduling; `lmw` survives in older code and hand-rolled context-switch routines. +- **Loop bound from encoding.** Xenia's snapshot iterates `for r in instr.rd()..32`, exactly matching IBM's "load until r31 inclusive" semantic. With `RT = 28`, four registers (r28..r31) are loaded. +- **Each word is zero-extended.** Like `lwz`, every loaded 32-bit word zero-extends into the destination's 64-bit GPR. The high 32 bits of each `r[k]` become zero. +- **Big-endian read.** Word at `EA` goes to `r[RT]`, word at `EA+4` goes to `r[RT+1]`, etc. Each word is itself loaded most-significant-byte-first. +- **`RA0` semantics.** When `RA = 0`, base is literal zero. Useful for absolute-address restoration. +- **Invalid forms.** AIX docs declare it invalid for `RA` to be in the destination range `[RT, 31]` — a load could overwrite the base register mid-sequence. Xenia performs loads in order without this check. +- **Alignment.** PowerISA requires word-aligned `EA`; an unaligned `lmw` may raise an alignment exception on real hardware. Xenia tolerates it. +- **Performance trap.** On modern PowerPC implementations `lmw` is microcoded — slower than the equivalent sequence of `lwz`. Compilers avoid it. + +## Related Instructions + +- [`stmw`](stmw.md) — symmetric "store multiple words" (the matching epilogue/prologue partner). +- [`lwz`](lwz.md), [`lwzx`](lwz.md) — single-word loads; the modern preferred form. +- [`lswi`](lswi.md), [`lswx`](lswx.md) — load string (byte-granular bulk transfer). + +## IBM Reference + +- [AIX 7.3 — `lmw` (Load Multiple Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lmw-load-multiple-word-instruction) +- `PowerISA v2.07B Book II` § "Load and Store Multiple" for invalid-form rules. diff --git a/migration/project-root/ppc-manual/memory/lswi.md b/migration/project-root/ppc-manual/memory/lswi.md new file mode 100644 index 0000000..2e29c22 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lswi.md @@ -0,0 +1,139 @@ +# `lswi` — Load String Word Immediate + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0004aa` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lswi` | `lswi` | — | Load String Word Immediate | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `lswi` — form `X` + +- **Opcode word:** `0x7c0004aa` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `597` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `lswi` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lswi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lswi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:727`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L727) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:42`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L42) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:824`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L824) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1521-1539`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1521-L1539) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lswi => { + let mut ea = if instr.ra() == 0 { 0u32 } else { ctx.gpr[instr.ra()] as u32 }; + let nb = if instr.nb() == 0 { 32 } else { instr.nb() }; + let mut rd = instr.rd(); + let mut bytes_left = nb; + while bytes_left > 0 { + let mut val = 0u32; + for byte_idx in 0..4 { + if bytes_left == 0 { break; } + let b = mem.read_u8(ea) as u32; + val |= b << (24 - byte_idx * 8); + ea = ea.wrapping_add(1); + bytes_left -= 1; + } + ctx.gpr[rd] = val as u64; + rd = (rd + 1) % 32; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Byte-granular bulk load.** Reads `NB` bytes starting at `EA` and packs them, big-endian, into successive GPRs starting at `RT`. Each filled GPR holds 4 bytes in its low word; partial last words are left- (most-significant-byte-) aligned with trailing zero bytes. The byte count `NB` is held in the `RB` field of the instruction encoding (1..31), with the special case `NB = 0` meaning "32 bytes". +- **Register wraparound at r31 → r0.** The snapshot uses `rd = (rd + 1) % 32`. If the byte count is large enough to spill past `r31`, the next register is `r0`, then `r1`, etc. AIX docs flag the "RA in destination range" and "RB in destination range" cases as invalid; xenia does not check. +- **`RA0` semantics.** `RA = 0` selects literal zero. There is no `RA` post-write — `lswi` is not an update form. +- **Big-endian byte ordering inside each word.** First byte read goes into bits 0–7 of the destination GPR (most-significant byte). Xenia's loop builds `val |= b << (24 - byte_idx * 8)`, matching that bit position. +- **Last partial word.** When `NB` is not a multiple of 4, the final GPR's unused low bytes are zero. The high bits remain whatever the load placed there. +- **Alignment.** The architecture allows arbitrary alignment, but real implementations may take alignment exceptions on cache-inhibited storage; xenia tolerates any address. +- **Vanishingly rare in compiled code.** Compilers don't emit `lswi`. Hand-written `memcpy` cores from the PowerPC SDK era used it for short copies; otherwise it appears mostly in byte-string init helpers. + +## Related Instructions + +- [`lswx`](lswx.md) — register-supplied byte-count variant. +- [`stswi`](stswi.md), [`stswx`](stswx.md) — symmetric stores. +- [`lmw`](lmw.md) — word-granular bulk load (multiple of 4 bytes only, no register wrap). +- [`lwz`](lwz.md), [`lbz`](lbz.md) — scalar loads that compilers emit instead. + +## IBM Reference + +- [AIX 7.3 — `lswi` (Load String Word Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lswi-load-string-word-immediate-instruction) +- `PowerISA v2.07B Book II` § "Load and Store String" for the invalid-form checks. diff --git a/migration/project-root/ppc-manual/memory/lswx.md b/migration/project-root/ppc-manual/memory/lswx.md new file mode 100644 index 0000000..f51e6cc --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lswx.md @@ -0,0 +1,139 @@ +# `lswx` — Load String Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00042a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lswx` | `lswx` | — | Load String Word Indexed | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `lswx` — form `X` + +- **Opcode word:** `0x7c00042a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `533` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `lswx` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lswx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lswx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:732`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L732) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:42`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L42) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:817`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L817) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4644-4662`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4644-L4662) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lswx => { + let mut ea = ea_indexed(ctx, instr); + let nb = ctx.xer() & 0x7F; // XER[25..31] + let mut rd = instr.rd(); + let mut bytes_left = nb; + while bytes_left > 0 { + let mut val = 0u32; + for byte_idx in 0..4 { + if bytes_left == 0 { break; } + let b = mem.read_u8(ea) as u32; + val |= b << (24 - byte_idx * 8); + ea = ea.wrapping_add(1); + bytes_left -= 1; + } + ctx.gpr[rd] = val as u64; + rd = (rd + 1) % 32; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Byte count from `XER[25..31]`.** Unlike `lswi` (where the count is encoded as `RB`), `lswx` reads `XER[25..31]` for the byte count `NB` (0..127). Xenia's snapshot does `let nb = (ctx.xer() & 0x7F) as u32;`. `NB = 0` is **not** the "32 bytes" special case here — zero means literally zero bytes, no registers touched. +- **Register packing identical to `lswi`.** Bytes are packed big-endian into successive GPRs starting at `RT`, four bytes per register, with wraparound `r31 → r0`. Trailing bytes in the last register are zero-padded on the right. +- **`RA0` semantics.** `RA = 0` selects literal zero. The instruction has no update form — `RA` is not modified. +- **Invalid forms.** AIX flags as invalid: `RT` collides with `RA` or `RB` within the destination range; `XER[25..31]` and `NB` byte stream wraps around through both `RA` and `RB`. Xenia performs writes regardless, with last-write-wins semantics. +- **Used for non-multiple-of-4 copies.** Together with `lswi`, gives a way to load a runtime-determined byte count without per-byte loops. Compilers don't emit it; rare hand-written copy primitives may. +- **Alignment.** Architecture allows arbitrary alignment; cache-inhibited storage may raise alignment exceptions on real hardware. +- **No FPSCR / CR effects.** Pure data movement. + +## Related Instructions + +- [`lswi`](lswi.md) — sibling with byte count encoded as `RB` field (immediate-style). +- [`stswx`](stswx.md), [`stswi`](stswi.md) — symmetric stores. +- [`lmw`](lmw.md) — word-granular bulk load (no byte tail handling). +- [`lwz`](lwz.md), [`lbz`](lbz.md) — scalar loads compilers actually emit. + +## IBM Reference + +- [AIX 7.3 — `lswx` (Load String Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lswx-load-string-word-indexed-instruction) +- `PowerISA v2.07B Book II` § "Load and Store String" for invalid-form rules and `XER` interaction. diff --git a/migration/project-root/ppc-manual/memory/lvebx.md b/migration/project-root/ppc-manual/memory/lvebx.md new file mode 100644 index 0000000..711cd16 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvebx.md @@ -0,0 +1,135 @@ +# `lvebx` — Load Vector Element Byte Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00000e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvebx` | `lvebx` | — | Load Vector Element Byte Indexed | + +## Syntax + +```asm +lvebx [VD], [RA0], [RB] +``` + +## Encoding + +### `lvebx` — form `X` + +- **Opcode word:** `0x7c00000e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `7` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvebx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvebx: read | Source GPR. | +| `VD` | lvebx: write | Destination vector register. | + +## Register Effects + +### `lvebx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvebx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvebx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:73`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L73) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:752`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L752) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1872-1883`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1872-L1883) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvebx => { + // Load 1 byte from EA into vD[EA & 0xF]. PowerISA marks the + // other lanes as "undefined" but real Xenon (and Canary) + // preserve their prior contents, so seed from vD. + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let slot = (ea & 0xF) as usize; + let mut bytes = ctx.vr[instr.rd()].as_bytes(); + bytes[slot] = mem.read_u8(ea); + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single-byte element load.** Architecturally `lvebx` loads exactly **one** byte from `EA` and places it in lane `EA mod 16` of the destination vector; the other 15 lanes are *undefined* (PowerISA permits implementations to leave them as garbage). Real hardware: lane `EA mod 16` gets the byte, others are unspecified. +- **Xenia simplification — full-line read.** The xenia snapshot is shared with `lvehx` / `lvewx` and reads the **entire 16-byte aligned line** (`ea & ~0xF`, then 16 bytes), placing it in `VD`. This is stronger than the architectural guarantee — every lane is filled with whatever happened to be at the line — but matches the practical idiom of using these single-element loads to assemble a vector. Code that depends on undefined-lane behaviour will still produce well-defined output under xenia. +- **Operand order subtle.** Unlike `lvx`, the architectural EA is **not** masked. The lane is `EA & 0xF`. Xenia's force-align mask (`& !0xF`) is a deliberate emulator simplification. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `lvebx VD, 0, RB` reads the byte at `RB` (and, in xenia, the surrounding aligned line). +- **No update form.** No `lvebux` exists. Pointer-bumping requires a separate `addi`. +- **No VMX128 sibling.** There is no `lvebx128` — the single-byte load family was kept Altivec-only in the Xbox 360 VMX128 extension, since 16-byte aligned loads (`lvx128`) plus `vperm`/`vsel` are usually faster. +- **Common idiom.** Pair with `vperm` or `vsplt*` to broadcast the loaded byte to all lanes, or with `vinsertb` / shifts to assemble a vector from non-adjacent memory locations. + +## Related Instructions + +- [`lvehx`](lvehx.md), [`lvewx`](lvewx.md) — half-word and word element loads. +- [`lvx`](lvx.md), [`lvxl`](lvxl.md) — full 16-byte aligned vector loads. +- [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — load-left / load-right partial-vector ops for unaligned vector I/O. +- [`stvebx`](stvebx.md) — symmetric single-byte store. + +## IBM Reference + +- [AIX 7.3 — `lvebx` (Load Vector Element Byte Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvebx-load-vector-element-byte-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" § "Vector Load and Store" for lane-placement rules. diff --git a/migration/project-root/ppc-manual/memory/lvehx.md b/migration/project-root/ppc-manual/memory/lvehx.md new file mode 100644 index 0000000..e24cc81 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvehx.md @@ -0,0 +1,138 @@ +# `lvehx` — Load Vector Element Half Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00004e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvehx` | `lvehx` | — | Load Vector Element Half Word Indexed | + +## Syntax + +```asm +lvehx [VD], [RA0], [RB] +``` + +## Encoding + +### `lvehx` — form `X` + +- **Opcode word:** `0x7c00004e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `39` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvehx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvehx: read | Source GPR. | +| `VD` | lvehx: write | Destination vector register. | + +## Register Effects + +### `lvehx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvehx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvehx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:81`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L81) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:763`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L763) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1884-1897`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1884-L1897) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvehx => { + // Load a halfword from (EA & ~1) into vD at halfword slot + // (EA & 0xF) >> 1. Other halfword lanes preserved (see lvebx). + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea_unaligned = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let ea = ea_unaligned & !0x1u32; + let slot = ((ea_unaligned & 0xF) >> 1) as usize; + let mut bytes = ctx.vr[instr.rd()].as_bytes(); + let h = mem.read_u16(ea); + bytes[slot * 2] = (h >> 8) as u8; + bytes[slot * 2 + 1] = (h & 0xFF) as u8; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single half-word element load.** Architecturally `lvehx` loads exactly **two** bytes from `EA` (which must be 2-byte aligned) and places them in the half-word lane `(EA mod 16) >> 1` of the destination vector; the other 7 half-word lanes are *undefined*. +- **EA must be half-aligned.** The low bit of `EA` is masked by hardware to align to 2 — an odd `EA` rounds down. Xenia's shared snapshot rounds further, masking to 16-byte alignment. +- **Xenia simplification — full-line read.** The xenia snapshot is shared with `lvebx` / `lvewx`: `ea & ~0xF` then a full 16-byte read into `VD`. Architectural undefined lanes are filled in deterministically, which is stronger than hardware guarantees but practically convenient. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `lvehx VD, 0, RB` reads at `RB` (and, in xenia, the surrounding aligned line). +- **No update form.** No `lvehux` exists. +- **No VMX128 sibling.** No `lvehx128` — Xbox 360 code prefers `lvx128` plus `vperm`. +- **Big-endian half within the lane.** The byte at the lower address is the most-significant byte of the half-word lane. +- **Common idiom.** Pair with `vsplth` to broadcast or with `vperm` to assemble a vector from sparse memory. + +## Related Instructions + +- [`lvebx`](lvebx.md), [`lvewx`](lvewx.md) — byte and word element loads. +- [`lvx`](lvx.md), [`lvxl`](lvxl.md) — full 16-byte aligned vector loads. +- [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — load-left / load-right partial-vector ops. +- [`stvehx`](stvehx.md) — symmetric single-half store. + +## IBM Reference + +- [AIX 7.3 — `lvehx` (Load Vector Element Half Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvehx-load-vector-element-half-word-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" § "Vector Load and Store" for lane-placement rules. diff --git a/migration/project-root/ppc-manual/memory/lvewx.md b/migration/project-root/ppc-manual/memory/lvewx.md new file mode 100644 index 0000000..5b2eebd --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvewx.md @@ -0,0 +1,185 @@ +# `lvewx` — Load Vector Element Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00008e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvewx` | `lvewx` | — | Load Vector Element Word Indexed | +| `lvewx128` | `lvewx128` | — | Load Vector Element Word Indexed 128 | + +## Syntax + +```asm +lvewx [VD], [RA0], [RB] +lvewx128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvewx` — form `X` + +- **Opcode word:** `0x7c00008e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `71` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvewx128` — form `VX128_1` + +- **Opcode word:** `0x10000083` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `131` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvewx: read; lvewx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvewx: read; lvewx128: read | Source GPR. | +| `VD` | lvewx: write; lvewx128: write | Destination vector register. | + +## Register Effects + +### `lvewx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvewx128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvewx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvewx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:96`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L96) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:770`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L770) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1898-1913`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1898-L1913) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvewx => { + // Load a word from (EA & ~3) into vD at word slot + // (EA & 0xF) >> 2. Other word lanes preserved (see lvebx). + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea_unaligned = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let ea = ea_unaligned & !0x3u32; + let slot = ((ea_unaligned & 0xF) >> 2) as usize; + let mut bytes = ctx.vr[instr.rd()].as_bytes(); + let w = mem.read_u32(ea); + bytes[slot * 4] = (w >> 24) as u8; + bytes[slot * 4 + 1] = (w >> 16) as u8; + bytes[slot * 4 + 2] = (w >> 8) as u8; + bytes[slot * 4 + 3] = (w & 0xFF) as u8; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ +**`lvewx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvewx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:99`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L99) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:414`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L414) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3168-3174`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3168-L3174) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvewx128 => { + let ea = ea_indexed(ctx, instr) & !0xF; + let mut bytes = [0u8; 16]; + for i in 0..16 { bytes[i] = mem.read_u8(ea + i as u32); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single word element load.** Architecturally `lvewx` loads exactly **four** bytes from `EA` (which must be 4-byte aligned) and places them in the word lane `(EA mod 16) >> 2` of the destination vector; the other 3 word lanes are *undefined*. +- **EA must be word-aligned.** The low two bits of `EA` are masked by hardware. Xenia's shared snapshot rounds further to 16-byte alignment for both `lvewx` and `lvewx128`. +- **Xenia simplification — full-line read.** Both `lvewx` and `lvewx128` snapshots load the full aligned 16 bytes from `ea & ~0xF` into the destination vector. Architectural undefined lanes are filled deterministically. +- **`RA0` semantics.** When `RA = 0`, base is literal zero. +- **No update form.** No `lvewux` exists. +- **VMX128 sibling.** `lvewx128` shares semantics; the only difference is the operand encoding. VMX128 uses a 7-bit register index split across `VD128l ‖ VD128h` so it can address `v0..v127` instead of the 32-register Altivec space. +- **Big-endian word within the lane.** The byte at the lower address is the most-significant byte of the word lane. +- **Common idiom.** Pair with `vspltw` to broadcast the loaded word to all four lanes, or with `vperm` to gather words from sparse memory into one vector. + +## Related Instructions + +- [`lvebx`](lvebx.md), [`lvehx`](lvehx.md) — byte and half element loads. +- [`lvx`](lvx.md), [`lvxl`](lvxl.md) — full 16-byte aligned vector loads. +- [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — load-left / load-right partial-vector ops. +- [`stvewx`](stvewx.md) — symmetric single-word store. + +## IBM Reference + +- [AIX 7.3 — `lvewx` (Load Vector Element Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvewx-load-vector-element-word-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" § "Vector Load and Store" for lane-placement rules; Microsoft XDK for `lvewx128`. diff --git a/migration/project-root/ppc-manual/memory/lvlx.md b/migration/project-root/ppc-manual/memory/lvlx.md new file mode 100644 index 0000000..188b50d --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvlx.md @@ -0,0 +1,172 @@ +# `lvlx` — Load Vector Left Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00040e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvlx` | `lvlx` | — | Load Vector Left Indexed | +| `lvlx128` | `lvlx128` | — | Load Vector Left Indexed 128 | + +## Syntax + +```asm +lvlx [VD], [RA0], [RB] +lvlx128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvlx` — form `X` + +- **Opcode word:** `0x7c00040e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `519` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvlx128` — form `VX128_1` + +- **Opcode word:** `0x10000403` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1027` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvlx: read; lvlx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvlx: read; lvlx128: read | Source GPR. | +| `VD` | lvlx: write; lvlx128: write | Destination vector register. | + +## Register Effects + +### `lvlx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvlx128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvlx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvlx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:216`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L216) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:815`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L815) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3083-3087`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3083-L3087) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvlx | PpcOpcode::lvlxl => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.rd()] = crate::vmx::load_vector_left(mem, ea); + ctx.pc += 4; + } +``` +
+ +**`lvlx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvlx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:219`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L219) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:420`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L420) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3088-3092`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3088-L3092) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvlx128 | PpcOpcode::lvlxl128 => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.vd128()] = crate::vmx::load_vector_left(mem, ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Load-left half of an unaligned vector.** `lvlx` reads `(16 - (EA mod 16))` bytes starting at the **exact** `EA` and places them in the **left** (high-address-byte → low-lane) of the destination vector; the remaining lanes on the right are zero-filled. Combine with `lvrx` at `EA + 15` to assemble a full unaligned vector across an alignment boundary. +- **Companion idiom.** `lvlx VD, RA, RB ; lvrx Vtemp, RA, RB ; vor VD, VD, Vtemp` produces the unaligned 16 bytes at `EA` regardless of alignment. This was the canonical unaligned-vector-read recipe before `lvsl`/`vperm` shuffles became the more common idiom. +- **No alignment masking.** Unlike `lvx`, the EA is **not** rounded down. `EA mod 16` controls how the data is shifted into the destination. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** `lvlx` and `lvrx` are not in the standard Altivec specification — they are part of Microsoft's VMX128 / Cell BE-style extension, defined in PowerPC Cell and later VMX. The Xbox 360 Xenon supports them (decoder + xenia entry confirm). +- **Implementation in xenia.** The shared snapshot calls `vmx::load_vector_left(mem, ea)`, which performs the unaligned partial-byte read and zero-fills the right side. +- **VMX128 sibling (`lvlx128`).** Same semantics; different operand encoding (7-bit register field, addressing `v0..v127`). +- **`lvlxl` is the LRU-hint variant.** Same data behaviour, hint ignored under emulation. + +## Related Instructions + +- [`lvrx`](lvrx.md), [`lvrx128`](lvrx.md) — load-right partner; combine to read unaligned 16 bytes. +- [`lvlxl`](lvlxl.md), [`lvlxl128`](lvlxl.md) — LRU-hint variants. +- [`lvx`](lvx.md), [`lvx128`](lvx.md) — aligned load (the EA-masking sibling). +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — symmetric unaligned stores. + +## IBM Reference + +- [AIX 7.3 — `lvlx` (Load Vector Left Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvlx-load-vector-left-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 details. diff --git a/migration/project-root/ppc-manual/memory/lvlxl.md b/migration/project-root/ppc-manual/memory/lvlxl.md new file mode 100644 index 0000000..762c45a --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvlxl.md @@ -0,0 +1,171 @@ +# `lvlxl` — Load Vector Left Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00060e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvlxl` | `lvlxl` | — | Load Vector Left Indexed LRU | +| `lvlxl128` | `lvlxl128` | — | Load Vector Left Indexed LRU 128 | + +## Syntax + +```asm +lvlxl [VD], [RA0], [RB] +lvlxl128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvlxl` — form `X` + +- **Opcode word:** `0x7c00060e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `775` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvlxl128` — form `VX128_1` + +- **Opcode word:** `0x10000603` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1539` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvlxl: read; lvlxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvlxl: read; lvlxl128: read | Source GPR. | +| `VD` | lvlxl: write; lvlxl128: write | Destination vector register. | + +## Register Effects + +### `lvlxl` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvlxl128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvlxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvlxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:222`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L222) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:838`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L838) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3083-3087`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3083-L3087) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvlx | PpcOpcode::lvlxl => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.rd()] = crate::vmx::load_vector_left(mem, ea); + ctx.pc += 4; + } +``` +
+ +**`lvlxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvlxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:225`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L225) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:44`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L44) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:424`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L424) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3088-3092`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3088-L3092) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvlx128 | PpcOpcode::lvlxl128 => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.vd128()] = crate::vmx::load_vector_left(mem, ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`lvlx`](lvlx.md), with LRU cache hint.** Reads `(16 - (EA mod 16))` bytes starting at `EA` into the left side of `VD`; right side zero-filled. The `l` suffix tells the cache the line is least-recently-used — likely streaming, evict early under pressure. +- **Hint ignored under emulation.** Xenia's snapshot is shared with `lvlx` (`PpcOpcode::lvlx | PpcOpcode::lvlxl => …`). Functional behaviour is identical to `lvlx`. +- **No alignment masking.** Like `lvlx`, the exact `EA` controls how data shifts into the vector. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of the VMX128 / Cell BE extended set, not in baseline Altivec. +- **Used in single-pass streaming reads.** Decoder loops that consume each vector once benefit from the LRU hint on real hardware; xenia gains nothing from it. +- **VMX128 sibling (`lvlxl128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. + +## Related Instructions + +- [`lvlx`](lvlx.md), [`lvlx128`](lvlx.md) — non-hint load-left variants. +- [`lvrx`](lvrx.md), [`lvrxl`](lvrxl.md) — load-right partner. +- [`stvlxl`](stvlxl.md), [`stvrxl`](stvrxl.md) — symmetric stores. +- [`lvx`](lvx.md), [`lvxl`](lvxl.md) — aligned vector load family. + +## IBM Reference + +- [AIX 7.3 — `lvlxl` (Load Vector Left Indexed Last)](https://www.ibm.com/docs/en/aix/7.3.0?topic=reference-instruction-set) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 cache-hint deltas. diff --git a/migration/project-root/ppc-manual/memory/lvrx.md b/migration/project-root/ppc-manual/memory/lvrx.md new file mode 100644 index 0000000..d58e0bc --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvrx.md @@ -0,0 +1,173 @@ +# `lvrx` — Load Vector Right Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00044e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvrx` | `lvrx` | — | Load Vector Right Indexed | +| `lvrx128` | `lvrx128` | — | Load Vector Right Indexed 128 | + +## Syntax + +```asm +lvrx [VD], [RA0], [RB] +lvrx128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvrx` — form `X` + +- **Opcode word:** `0x7c00044e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `551` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvrx128` — form `VX128_1` + +- **Opcode word:** `0x10000443` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1091` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvrx: read; lvrx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvrx: read; lvrx128: read | Source GPR. | +| `VD` | lvrx: write; lvrx128: write | Destination vector register. | + +## Register Effects + +### `lvrx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvrx128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:241`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L241) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:45`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L45) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:822`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L822) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3093-3097`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3093-L3097) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvrx | PpcOpcode::lvrxl => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.rd()] = crate::vmx::load_vector_right(mem, ea); + ctx.pc += 4; + } +``` +
+ +**`lvrx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvrx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:244`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L244) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:45`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L45) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:421`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L421) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3098-3102`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3098-L3102) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvrx128 | PpcOpcode::lvrxl128 => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.vd128()] = crate::vmx::load_vector_right(mem, ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Load-right half of an unaligned vector.** `lvrx` reads `(EA mod 16)` bytes at the addresses *just below* `EA & ~0xF` (i.e., the bytes from the previous aligned line that fall on the right side of the unaligned vector) and places them in the **right** (low-address-byte → high-lane) of the destination; the left lanes are zero-filled. +- **Standard pair-mate of [`lvlx`](lvlx.md).** The recipe `lvlx VD, RA, RB ; lvrx Vtmp, RA, (RB+16) ; vor VD, VD, Vtmp` (or some alignment-aware variant) reconstructs the unaligned 16 bytes spanning the boundary at `EA`. +- **Right vs. left semantics.** "Right" refers to lower-numbered (high-significance) lanes after rotation, not in any byte-address sense — see PowerISA Cell BE addenda for the exact bit-position formulas. +- **No alignment masking.** Like `lvlx`, the exact `EA` is used; the value `EA mod 16` controls how data is rotated. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Implementation in xenia.** The shared snapshot calls `vmx::load_vector_right(mem, ea)`, returning a zero-filled left side and the requested right-bytes payload. +- **Microsoft Xbox 360 specific.** Part of VMX128 / Cell BE, not in baseline Altivec. +- **VMX128 sibling (`lvrx128`).** Identical semantics; alternative operand encoding. +- **`lvrxl` is the LRU-hint variant.** Same data; cache hint ignored under emulation. + +## Related Instructions + +- [`lvlx`](lvlx.md), [`lvlx128`](lvlx.md) — load-left partner. +- [`lvrxl`](lvrxl.md), [`lvrxl128`](lvrxl.md) — LRU-hint variants. +- [`lvx`](lvx.md), [`lvx128`](lvx.md) — aligned vector load. +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — symmetric unaligned stores. + +## IBM Reference + +- [AIX 7.3 — `lvrx` (Load Vector Right Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvrx-load-vector-right-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 unaligned-vector idioms. diff --git a/migration/project-root/ppc-manual/memory/lvrxl.md b/migration/project-root/ppc-manual/memory/lvrxl.md new file mode 100644 index 0000000..b82c570 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvrxl.md @@ -0,0 +1,171 @@ +# `lvrxl` — Load Vector Right Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00064e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvrxl` | `lvrxl` | — | Load Vector Right Indexed LRU | +| `lvrxl128` | `lvrxl128` | — | Load Vector Right Indexed LRU 128 | + +## Syntax + +```asm +lvrxl [VD], [RA0], [RB] +lvrxl128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvrxl` — form `X` + +- **Opcode word:** `0x7c00064e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `807` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvrxl128` — form `VX128_1` + +- **Opcode word:** `0x10000643` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1603` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvrxl: read; lvrxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvrxl: read; lvrxl128: read | Source GPR. | +| `VD` | lvrxl: write; lvrxl128: write | Destination vector register. | + +## Register Effects + +### `lvrxl` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvrxl128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvrxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvrxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:247`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L247) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:45`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L45) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:842`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L842) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3093-3097`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3093-L3097) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvrx | PpcOpcode::lvrxl => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.rd()] = crate::vmx::load_vector_right(mem, ea); + ctx.pc += 4; + } +``` +
+ +**`lvrxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvrxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:250`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L250) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:45`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L45) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:425`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L425) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3098-3102`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3098-L3102) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvrx128 | PpcOpcode::lvrxl128 => { + let ea = ea_indexed(ctx, instr); + ctx.vr[instr.vd128()] = crate::vmx::load_vector_right(mem, ea); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`lvrx`](lvrx.md), with LRU cache hint.** Reads `(EA mod 16)` bytes from the previous aligned line into the right half of `VD`; left half zero-filled. The `l` suffix tells the cache the line is least-recently-used. +- **Hint ignored under emulation.** Xenia's snapshot is shared with `lvrx` (`PpcOpcode::lvrx | PpcOpcode::lvrxl => …`). +- **No alignment masking.** The exact `EA` controls how data shifts. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of the VMX128 / Cell BE extended set. +- **Streaming-read use case.** Pair with [`lvlxl`](lvlxl.md) when iterating across a buffer that will not be revisited; the LRU hint frees cache capacity for the next line. +- **VMX128 sibling (`lvrxl128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. + +## Related Instructions + +- [`lvrx`](lvrx.md), [`lvrx128`](lvrx.md) — non-hint variants. +- [`lvlxl`](lvlxl.md), [`lvlxl128`](lvlxl.md) — load-left LRU partner. +- [`lvxl`](lvxl.md), [`lvxl128`](lvxl.md) — aligned LRU vector load. +- [`stvrxl`](stvrxl.md), [`stvlxl`](stvlxl.md) — symmetric LRU stores. + +## IBM Reference + +- [AIX 7.3 — `lvrxl` (Load Vector Right Indexed Last)](https://www.ibm.com/docs/en/aix/7.3.0?topic=reference-instruction-set) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 cache-hint behaviour. diff --git a/migration/project-root/ppc-manual/memory/lvx.md b/migration/project-root/ppc-manual/memory/lvx.md new file mode 100644 index 0000000..8263776 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvx.md @@ -0,0 +1,164 @@ +# `lvx` — Load Vector Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0000ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvx` | `lvx` | — | Load Vector Indexed | +| `lvx128` | `lvx128` | — | Load Vector Indexed 128 | + +## Syntax + +```asm +lvx [VD], [RA0], [RB] +lvx128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvx` — form `X` + +- **Opcode word:** `0x7c0000ce` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `103` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvx128` — form `VX128_1` + +- **Opcode word:** `0x100000c3` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `195` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvx: read; lvx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvx: read; lvx128: read | Source GPR. | +| `VD` | lvx: write; lvx128: write | Destination vector register. | + +## Register Effects + +### `lvx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvx128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- ((RA|0) + (RB)) & ~0xF ; align to 16 +VD <- byteswap(MEM(EA, 16)) +``` + +## C Translation Example + +```c +/* lvx VD, RA, RB — 16-byte aligned load of a vector register */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +uint32_t ea = (uint32_t)((base + r[insn.RB]) & ~(uint64_t)0xF); +v[insn.VD] = mem_read_vec128_be(ea); +``` + +## Implementation References + +**`lvx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:139`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L139) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:47`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L47) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:775`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L775) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1833-1840`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1833-L1840) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; // aligned + let mut bytes = [0u8; 16]; + for i in 0..16 { bytes[i] = mem.read_u8(ea + i as u32); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ +**`lvx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:142`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L142) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:47`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L47) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:415`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L415) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1841-1848`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1841-L1848) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvx128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + let mut bytes = [0u8; 16]; + for i in 0..16 { bytes[i] = mem.read_u8(ea + i as u32); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Alignment is forced, not checked.** The low four bits of the effective address are **cleared** before the load — passing an unaligned `EA` silently reads from `EA & ~0xF` rather than trapping. This differs from scalar loads (no alignment enforcement) and from `lvewx` etc. (which architecturally use the exact `EA` for lane placement). +- **Big-endian lane layout.** The byte at the aligned base goes into vector lane 0 (most-significant byte); the byte at base+15 lands in lane 15. On little-endian hosts the 16-byte block is byte-swapped at the memory boundary so the PowerPC-visible layout is preserved. +- **`RA0` semantics.** When `RA = 0`, the base is the literal zero. Combined with the alignment mask this lets `lvx VD, 0, RB` load from `RB & ~0xF`. +- **No update form.** Unlike scalar loads, VMX loads have no `u` variant that post-writes the base. Use [`lvxl`](lvxl.md) for the cache-hint variant ("last" — the line is not expected to be reused soon). +- **VMX128 sibling (`lvx128`).** Identical semantics; the only difference is the operand encoding. VMX128 uses a 7-bit register index split across three non-contiguous bit fields (`VD128l ‖ VD128h`), addressing `v0..v127`. +- **Atomic 16 bytes.** The read is a single conceptual load; observers see either all 16 old bytes or all 16 new bytes (to the extent the surrounding cache coherency model allows). +- **Cache-line behaviour.** A 16-byte aligned load fits within one Xenon 128-byte cache line; cold-line cost is one fill. + +## Related Instructions + +- [`stvx`](stvx.md), [`stvx128`](stvx.md) — the store counterparts. +- [`lvxl`](lvxl.md), [`lvxl128`](lvxl.md) — cache-hint "last-use" load variants. +- [`lvebx`](lvebx.md), [`lvehx`](lvehx.md), [`lvewx`](lvewx.md) — single-element loads at the exact (sub-aligned) address. +- [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — load-left / load-right for unaligned vector I/O (combine to read across alignment). + +## IBM Reference + +- [AIX 7.3 — `lvx` (Load Vector Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvx-load-vector-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" for full vector-load semantics; `lvx128` is documented in the Xbox 360 XDK. diff --git a/migration/project-root/ppc-manual/memory/lvxl.md b/migration/project-root/ppc-manual/memory/lvxl.md new file mode 100644 index 0000000..f7f3fa8 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lvxl.md @@ -0,0 +1,182 @@ +# `lvxl` — Load Vector Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0002ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvxl` | `lvxl` | — | Load Vector Indexed LRU | +| `lvxl128` | `lvxl128` | — | Load Vector Indexed LRU 128 | + +## Syntax + +```asm +lvslx [VD], [RA0], [RB] +lvxl128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvxl` — form `X` + +- **Opcode word:** `0x7c0002ce` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `359` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvxl128` — form `VX128_1` + +- **Opcode word:** `0x100002c3` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `707` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvxl: read; lvxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvxl: read; lvxl128: read | Source GPR. | +| `VD` | lvxl: write; lvxl128: write | Destination vector register. | + +## Register Effects + +### `lvxl` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvxl128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:145`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L145) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:47`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L47) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:802`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L802) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1960-1969`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1960-L1969) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvxl | PpcOpcode::lvxl128 => { + // Same as lvx but with cache hint (ignored) + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + let mut bytes = [0u8; 16]; + for i in 0..16 { bytes[i] = mem.read_u8(ea + i as u32); } + let vd = if matches!(instr.opcode, PpcOpcode::lvxl128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ +**`lvxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:148`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L148) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:47`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L47) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:418`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L418) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1960-1969`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1960-L1969) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvxl | PpcOpcode::lvxl128 => { + // Same as lvx but with cache hint (ignored) + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + let mut bytes = [0u8; 16]; + for i in 0..16 { bytes[i] = mem.read_u8(ea + i as u32); } + let vd = if matches!(instr.opcode, PpcOpcode::lvxl128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(bytes); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`lvx`](lvx.md), but with cache hint.** Loads 16 bytes from `EA & ~0xF` into `VD`. The `l` suffix signals to the cache hardware that the line is **least-recently-used** — a hint that the line will not be reused soon, allowing the cache to evict it preferentially under pressure. Useful in streaming reads (e.g. once-through vertex transforms, decode passes). +- **Hint ignored under emulation.** Xenia's snapshot comment is explicit: "Same as lvx but with cache hint (ignored)". The functional behaviour is identical to `lvx` — only real hardware acts on the hint. +- **Alignment is forced, not checked.** Like `lvx`, the low four bits of `EA` are masked. Unaligned `EA` silently rounds down to the 16-byte boundary. +- **Big-endian lane layout.** Byte at the aligned base goes into lane 0; byte at base+15 into lane 15. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **No update form.** `lvxl` has no `u`-suffix variant. +- **VMX128 sibling (`lvxl128`).** Identical semantics; the only difference is the operand encoding using the split-field 7-bit register index addressing `v0..v127`. Xenia's snapshot dispatches on the opcode to decide which decode helper to use. +- **Note: assembler typo.** The Syntax block above shows `lvslx` for the non-128 variant — that is a transcription artefact of the source XML. The real mnemonic is `lvxl`. + +## Related Instructions + +- [`lvx`](lvx.md), [`lvx128`](lvx.md) — same load without the LRU hint. +- [`stvxl`](stvxl.md), [`stvxl128`](stvxl.md) — symmetric "store last" variants. +- [`stvx`](stvx.md) — non-hint store. +- [`dcbt`](dcbt.md), [`dcbtst`](dcbtst.md) — explicit prefetch hints (the hint family). + +## IBM Reference + +- [AIX 7.3 — `lvxl` (Load Vector Indexed LRU)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvxl-load-vector-indexed-last-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" for canonical hint semantics. diff --git a/migration/project-root/ppc-manual/memory/lwa.md b/migration/project-root/ppc-manual/memory/lwa.md new file mode 100644 index 0000000..30a0bd4 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lwa.md @@ -0,0 +1,209 @@ +# `lwa` — Load Word Algebraic + +> **Category:** [Memory](../categories/memory.md) · **Form:** [DS](../forms/DS.md) · **Opcode:** `0xe8000002` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lwa` | `lwa` | — | Load Word Algebraic | +| `lwaux` | `lwaux` | — | Load Word Algebraic with Update Indexed | +| `lwax` | `lwax` | — | Load Word Algebraic Indexed | + +## Syntax + +```asm +lwa [RD], [ds]([RA0]) +lwaux [RD], [RA], [RB] +lwax [RD], [RA0], [RB] +``` + +## Encoding + +### `lwa` — form `DS` + +- **Opcode word:** `0xe8000002` +- **Primary opcode (bits 0–5):** `58` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +### `lwaux` — form `X` + +- **Opcode word:** `0x7c0002ea` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `373` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lwax` — form `X` + +- **Opcode word:** `0x7c0002aa` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `341` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lwa: read; lwax: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `ds` | lwa: read | 14-bit signed word-aligned displacement (`DS << 2`). | +| `RD` | lwa: write; lwaux: write; lwax: write | Destination GPR. | +| `RA` | lwaux: read; lwaux: write | Source GPR (`r0`–`r31`). | +| `RB` | lwaux: read; lwax: read | Source GPR. | + +## Register Effects + +### `lwa` + +- **Reads (always):** `RA0`, `ds` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `lwaux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lwax` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(ds || 0b00) +RT <- SEXT32_to_64(MEM(EA, 4)) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lwa`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwa"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:244`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L244) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:382`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L382) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1108-1113`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1108-L1113) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwa => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.ds() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.pc += 4; + } +``` +
+ +**`lwaux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwaux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:265`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L265) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:804`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L804) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1120-1125`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1120-L1125) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwaux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lwax`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwax"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:276`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L276) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:800`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L800) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1114-1119`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1114-L1119) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwax => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extending word load (32→64).** Reads 4 bytes big-endian, treats them as a signed int32, sign-extends to 64 bits. The xenia snapshot does the cast chain `u32 -> i32 -> i64 -> u64` to materialise the canonical sign-extended bit pattern. +- **No `lwau` (D-form-update) in PowerISA.** Only `lwa` (DS-form), `lwax` (X-form), and `lwaux` (X-form-update) exist. The D-form-update slot is occupied by something else in the encoding space — to update with a 16-bit immediate you must use a separate `addi` plus `lwa`. +- **DS-form displacement.** Like [`ld`](ld.md), `lwa` uses a 14-bit signed displacement scaled by 4 (`EXTS(ds || 0b00)`). The two encoding bits 30–31 distinguish `lwa` (XO=10) from `ld` (XO=00) and `ldu` (XO=01). +- **`RA0` semantics.** `RA = 0` in `lwa` and `lwax` selects literal zero. `lwaux` invokes `RA = 0` and `RA = RT` as invalid forms; xenia performs the load before writing back `RA`, so an `RA = RT` collision destroys the loaded value. +- **Alignment.** Xenon tolerates unaligned 4-byte loads. PowerISA permits but does not require an alignment exception; some implementations may raise one for cache-inhibited storage. +- **Use `lwa` rather than `lwz` + `extsw`.** When the source type is `int32_t`, `lwa` is one fused instruction. +- **Common in 64-bit code.** Sign-extending 32-bit fields out of structures (e.g. signed file offsets) into 64-bit GPRs uses this family. + +## Related Instructions + +- [`lwz`](lwz.md), [`lwzu`](lwz.md), [`lwzx`](lwz.md), [`lwzux`](lwz.md) — zero-extending counterparts. +- [`ld`](ld.md), [`ldu`](ld.md), [`ldx`](ld.md), [`ldux`](ld.md) — 64-bit doubleword loads. +- [`lha`](lha.md), [`lhax`](lha.md) — 16-bit sign-extending loads. +- [`lwbrx`](lwbrx.md) — byte-reversed word load (zero-extending only). +- [`stw`](stw.md) — corresponding store (no separate "store sign-extended" — narrow stores discard the high bits). + +## IBM Reference + +- [AIX 7.3 — `lwa` (Load Word Algebraic)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwa-load-word-algebraic-instruction) +- [AIX 7.3 — `lwax` / `lwaux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwax-load-word-algebraic-indexed-instruction) diff --git a/migration/project-root/ppc-manual/memory/lwarx.md b/migration/project-root/ppc-manual/memory/lwarx.md new file mode 100644 index 0000000..a9eeb36 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lwarx.md @@ -0,0 +1,139 @@ +# `lwarx` — Load Word and Reserve Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000028` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lwarx` | `lwarx` | — | Load Word and Reserve Indexed | + +## Syntax + +```asm +lwarx [RD], [RA0], [RB] +``` + +## Encoding + +### `lwarx` — form `X` + +- **Opcode word:** `0x7c000028` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `20` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lwarx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lwarx: read | Source GPR. | +| `RD` | lwarx: write | Destination GPR. | + +## Register Effects + +### `lwarx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lwarx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwarx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:795`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L795) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:754`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L754) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1207-1222`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1207-L1222) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwarx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let val = mem.read_u32(ea); + ctx.gpr[instr.rd()] = val as u64; + ctx.reserved_line = ea & !RESERVATION_MASK; + ctx.reserved_val = val as u64; + ctx.has_reservation = true; + ctx.reservation_width = 4; // PPCBUG-151: word reservation + if let Some(t) = &ctx.reservation_table { + if t.is_enabled() { + ctx.reserved_generation = t.reserve(ea, ctx.hw_id); + } + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reservation set on the addressed word.** Loads `MEM(EA, 4)` zero-extended to 64 bits and atomically establishes a *reservation* on `EA`. A subsequent [`stwcx`](stwcx.md) at the same address completes only if the reservation is still valid. Together they form the canonical 32-bit load-linked / store-conditional pair for lock-free updates and futexes. +- **One reservation per thread.** Xenia's snapshot writes `reserved_addr`, `reserved_val`, and `has_reservation` in the per-context state. Hardware behaves the same: each hardware thread has at most one reservation. A second `lwarx` (or `ldarx`) discards the previous reservation. +- **Granule.** Architecturally one naturally-aligned word; on Xenon the practical reservation granule is one **cache line** (128 bytes) — any store to that line by another agent invalidates the reservation. Xenia simplifies to per-address tracking, which can let real-hardware-failing pairs succeed under emulation. +- **Alignment requirement.** `EA` must be 4-byte aligned. An unaligned `lwarx` raises an alignment exception on hardware; xenia does not check. +- **`RA0` semantics.** When `RA = 0`, base is literal zero — `lwarx RT, 0, RB` reads at exact `RB`. +- **Reservation-loss events.** Any exception, context switch, or store by another thread to the reserved line clears the reservation. Application code treats `stwcx.` failure (CR0[EQ]=0) as a normal retry condition. +- **Pair atomically.** Code must be `lwarx ... do work ... stwcx.` with no intervening loads/stores that could reorder. Optionally fence with [`lwsync`](sync.md) inside the loop. + +## Related Instructions + +- [`stwcx`](stwcx.md) — store-conditional word (the matching half of the pair). +- [`ldarx`](ldarx.md) / [`stdcx`](stdcx.md) — 64-bit reservation pair. +- [`lwz`](lwz.md), [`lwzx`](lwz.md) — non-reserving word loads. +- [`sync`](sync.md), [`lwsync`](sync.md), [`isync`](isync.md) — barriers commonly placed around reservation pairs. + +## IBM Reference + +- [AIX 7.3 — `lwarx` (Load Word and Reserve Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwarx-load-word-reserve-indexed-instruction) +- `PowerISA v2.07B Book II` § "Atomic Update Primitives" for the reservation model and granule rules. diff --git a/migration/project-root/ppc-manual/memory/lwbrx.md b/migration/project-root/ppc-manual/memory/lwbrx.md new file mode 100644 index 0000000..9f89c7d --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lwbrx.md @@ -0,0 +1,130 @@ +# `lwbrx` — Load Word Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00042c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lwbrx` | `lwbrx` | — | Load Word Byte-Reverse Indexed | + +## Syntax + +```asm +lwbrx [RD], [RA0], [RB] +``` + +## Encoding + +### `lwbrx` — form `X` + +- **Opcode word:** `0x7c00042c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `534` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lwbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lwbrx: read | Source GPR. | +| `RD` | lwbrx: write | Destination GPR. | + +## Register Effects + +### `lwbrx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lwbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:641`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L641) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:818`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L818) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1799-1805`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1799-L1805) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwbrx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let val = mem.read_u32(ea); + ctx.gpr[instr.rd()] = val.swap_bytes() as u64; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Reads little-endian word.** Loads 4 bytes and reverses byte order. With Xenon's big-endian world view, the architectural effect is "load a little-endian word as if it were big-endian" — the standard parser instruction for PNG/ZIP/RIFF/TGA chunk fields, network protocol fields, and PC-side-formatted data. +- **Implementation detail.** The xenia snapshot calls `mem.read_u32(ea).swap_bytes()`. `read_u32` already returns the host-native value of the big-endian word at `EA`; `swap_bytes` then flips it. +- **X-form only — no update form.** Only the indexed form exists. `EA = (RA|0) + RB`. Pointer-bumping requires a separate `addi`. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `lwbrx RT, 0, RB` reads at exact `RB`. +- **Zero-extension to 64 bits.** Result occupies the full 64-bit GPR; high 32 bits zero. There is no sign-extending byte-reverse load — combine with `extsw` if needed. +- **Alignment.** Hardware tolerates unaligned 4-byte reads. Cache-inhibited storage may raise alignment exceptions on real Xenon. +- **Pair with [`stwbrx`](stwbrx.md).** Symmetric byte-reverse store. + +## Related Instructions + +- [`lhbrx`](lhbrx.md), [`ldbrx`](ldbrx.md) — narrower / wider byte-reverse loads. +- [`stwbrx`](stwbrx.md) — store-word byte-reverse counterpart. +- [`lwz`](lwz.md), [`lwzx`](lwz.md) — non-reversing zero-extending word loads. +- [`lwa`](lwa.md), [`lwax`](lwa.md) — non-reversing sign-extending word loads. + +## IBM Reference + +- [AIX 7.3 — `lwbrx` (Load Word Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwbrx-load-word-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/lwz.md b/migration/project-root/ppc-manual/memory/lwz.md new file mode 100644 index 0000000..9552570 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/lwz.md @@ -0,0 +1,267 @@ +# `lwz` — Load Word and Zero + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x80000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lwz` | `lwz` | — | Load Word and Zero | +| `lwzu` | `lwzu` | — | Load Word and Zero with Update | +| `lwzux` | `lwzux` | — | Load Word and Zero with Update Indexed | +| `lwzx` | `lwzx` | — | Load Word and Zero Indexed | + +## Syntax + +```asm +lwz [RD], [d]([RA0]) +lwzu [RD], [d]([RA]) +lwzux [RD], [RA], [RB] +lwzx [RD], [RA0], [RB] +``` + +## Encoding + +### `lwz` — form `D` + +- **Opcode word:** `0x80000000` +- **Primary opcode (bits 0–5):** `32` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lwzu` — form `D` + +- **Opcode word:** `0x84000000` +- **Primary opcode (bits 0–5):** `33` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `lwzux` — form `X` + +- **Opcode word:** `0x7c00006e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `55` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lwzx` — form `X` + +- **Opcode word:** `0x7c00002e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `23` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lwz: read; lwzx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | lwz: read; lwzu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RD` | lwz: write; lwzu: write; lwzux: write; lwzx: write | Destination GPR. | +| `RA` | lwzu: read; lwzu: write; lwzux: read; lwzux: write | Source GPR (`r0`–`r31`). | +| `RB` | lwzux: read; lwzx: read | Source GPR. | + +## Register Effects + +### `lwz` + +- **Reads (always):** `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +### `lwzu` + +- **Reads (always):** `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lwzux` + +- **Reads (always):** `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD`, `RA` +- **Writes (conditional):** _none_ + +### `lwzx` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +RT <- ZEXT32_to_64(MEM(EA, 4)) +``` + +## C Translation Example + +```c +/* lwz RT, d(RA) */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +uint32_t ea = (uint32_t)(base + (int64_t)(int16_t)insn.D); +r[insn.RT] = (uint64_t)mem_read_u32_be(ea); /* zero-extend */ +``` + +## Implementation References + +**`lwz`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwz"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:289`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L289) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:355`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L355) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1000-1005`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1000-L1005) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwz => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.pc += 4; + } +``` +
+ +**`lwzu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwzu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:310`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L310) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:356`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L356) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1006-1011`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1006-L1011) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwzu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lwzux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwzux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:323`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L323) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:766`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L766) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1018-1023`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1018-L1023) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwzux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`lwzx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lwzx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:334`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L334) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:49`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L49) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:756`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L756) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1012-1017`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1012-L1017) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lwzx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + ctx.gpr[instr.rd()] = mem.read_u32(ea) as u64; + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +; lwz — D-form plain + EA <- (RA|0) + EXTS(d) + RT <- 0x0000_0000 || MEM(EA, 4) ; zero-extend 32→64 + +; lwzu — D-form with update (base-register post-write) + EA <- (RA) + EXTS(d) ; RA ≠ 0 required + RT <- 0x0000_0000 || MEM(EA, 4) + RA <- EA + +; lwzx — X-form indexed + EA <- (RA|0) + (RB) + RT <- 0x0000_0000 || MEM(EA, 4) + +; lwzux — X-form indexed with update + EA <- (RA) + (RB) ; RA ≠ 0 required + RT <- 0x0000_0000 || MEM(EA, 4) + RA <- EA +``` + +## Special Cases & Edge Conditions + +- **Big-endian memory.** The Xenon reads memory big-endian. Translating to little-endian hosts requires a byte-swap on the 32-bit read (or calling a `mem_read_u32_be` helper as in the C example). Matching byte-order helpers in xenia: `mem.read_u32(...)` already returns a host-native `u32` of the big-endian word. +- **Zero-extension to 64 bits.** The result occupies the full 64-bit GPR; the high 32 bits are zero. This is semantically distinct from [`lwa`](lwa.md) / [`lwax`](lwax.md) / [`lwaux`](lwaux.md), which sign-extend. Most Xbox 360 code uses `lwz` for unsigned word loads and for pointer loads (addresses are 32-bit and fit in the low half). +- **`RA0` (non-update forms).** In `lwz` and `lwzx`, when the encoded `RA = 0` the base is the literal zero, **not** `r0`. This enables absolute-address loads `lwz RT, 0x8000(0)` and is heavily used to read from statically-linked data near the TOC base. +- **Update forms require `RA ≠ 0`.** `lwzu` / `lwzux` invoke "RA = 0" as an invalid form; AIX docs say the result is undefined and assemblers will refuse to assemble `lwzu RT, d(0)`. Further, `RA = RT` is also invalid (the "effective address" write and the "loaded value" write would race). Xenia implements update forms without these checks; rely on incoming code being well-formed. +- **No alignment requirement.** Xenon executes unaligned word loads without a fault (unlike some POWER cores). `MEM(EA, 4)` reads four bytes starting at `EA`, whatever alignment. +- **No ordering guarantee.** These are ordinary cached loads; use [`sync`](sync.md) / [`isync`](isync.md) / [`lwsync`](sync.md) for explicit ordering, or [`lwarx`](lwarx.md) for load-reserve semantics. +- **Indexed variant operand order.** `lwzx RT, RA, RB` — `RA` is the base (with `RA0` semantics), `RB` is the offset. The variant without `RA0` is `lwzux`. + +## Related Instructions + +- [`lwa`](lwa.md), [`lwax`](lwax.md), [`lwaux`](lwaux.md) — load word, sign-extend to 64. +- [`lwbrx`](lwbrx.md) — load word byte-reversed (little-endian word). +- [`lwarx`](lwarx.md) — load word and reserve (pair with [`stwcx`](stwcx.md)). +- [`ld`](ld.md), [`ldu`](ldu.md), [`ldx`](ldx.md), [`ldux`](ldux.md) — 64-bit loads. +- [`lhz`](lhz.md), [`lbz`](lbz.md) — half-word / byte zero-extending loads (same family structure). +- [`stw`](stw.md) family — the corresponding stores. + +## IBM Reference + +- [AIX 7.3 — `lwz` (Load Word and Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwz-load-word-zero-instruction) +- [AIX 7.3 — `lwzu` / `lwzx` / `lwzux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lwzu-load-word-zero-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/stb.md b/migration/project-root/ppc-manual/memory/stb.md new file mode 100644 index 0000000..10926be --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stb.md @@ -0,0 +1,260 @@ +# `stb` — Store Byte + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x98000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stb` | `stb` | — | Store Byte | +| `stbu` | `stbu` | — | Store Byte with Update | +| `stbux` | `stbux` | — | Store Byte with Update Indexed | +| `stbx` | `stbx` | — | Store Byte Indexed | + +## Syntax + +```asm +stb [RS], [d]([RA0]) +stbu [RS], [d]([RA]) +stbux [RS], [RA], [RB] +stbx [RS], [RA0], [RB] +``` + +## Encoding + +### `stb` — form `D` + +- **Opcode word:** `0x98000000` +- **Primary opcode (bits 0–5):** `38` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stbu` — form `D` + +- **Opcode word:** `0x9c000000` +- **Primary opcode (bits 0–5):** `39` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stbux` — form `X` + +- **Opcode word:** `0x7c0001ee` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `247` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stbx` — form `X` + +- **Opcode word:** `0x7c0001ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `215` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stb: read; stbu: read; stbux: read; stbx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stb: read; stbx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | stb: read; stbu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RA` | stbu: read; stbu: write; stbux: read; stbux: write | Source GPR (`r0`–`r31`). | +| `RB` | stbux: read; stbx: read | Source GPR. | + +## Register Effects + +### `stb` + +- **Reads (always):** `RS`, `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stbu` + +- **Reads (always):** `RS`, `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stbux` + +- **Reads (always):** `RS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stbx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +MEM(EA, 1) <- (RS)[56:63] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:404`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L404) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:67`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L67) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:361`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L361) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1327-1335`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1327-L1335) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stb => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u8(ea, ctx.gpr[instr.rs()] as u8); + ctx.pc += 4; + } +``` +
+ +**`stbu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stbu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:423`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L423) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:67`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L67) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:362`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L362) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1336-1344`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1336-L1344) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stbu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u8(ea, ctx.gpr[instr.rs()] as u8); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stbux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stbux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:433`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L433) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:67`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L67) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:793`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L793) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1354-1362`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1354-L1362) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stbux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u8(ea, ctx.gpr[instr.rs()] as u8); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stbx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stbx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:443`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L443) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:67`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L67) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:790`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L790) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1345-1353`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1345-L1353) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stbx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u8(ea, ctx.gpr[instr.rs()] as u8); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single-byte write.** Writes the low 8 bits of `RS` (`(RS)[56:63]` in IBM bit-numbering, equivalently `RS & 0xFF`) at `EA`. The xenia snapshot does `mem.write_u8(ea, ctx.gpr[instr.rs()] as u8)`, which casts the GPR's low byte directly. +- **No endian concerns.** A single byte has no endianness — the byte at `EA` is the byte you wrote. +- **`RA0` (non-update forms).** `RA = 0` in `stb` and `stbx` selects literal zero as base — useful for absolute writes. Update forms `stbu` / `stbux` invoke `RA = 0` as an invalid form (no `RA = RT` collision since the source is `RS`, not `RT`). +- **Update-form post-write.** `stbu` / `stbux` write the computed `EA` back to `RA` after the store. The order is store-then-update; if `RA = RS` the store is unaffected (the store reads `RS` first), but the new `RA` value reflects `EA`, not the original `RS`. +- **No alignment requirement.** Byte stores are intrinsically aligned. Xenon never raises alignment exceptions for byte writes. +- **Common in string and packed-bool code.** Compilers emit `stb` for `char *` writes, packed boolean array updates, and small enum stores. +- **Cache effects.** A `stb` to a cold cache line triggers a cache-line read-allocate (load the whole line, modify one byte, mark dirty). When writing many bytes sequentially, prefer one [`stw`](stw.md) or [`stvx`](stvx.md), or pre-clear the line with [`dcbz128`](dcbz.md). + +## Related Instructions + +- [`sth`](sth.md), [`stw`](stw.md), [`std`](std.md) — wider stores (half / word / doubleword). +- [`lbz`](lbz.md) — corresponding load (no `lba` exists). +- [`stmw`](stmw.md), [`stswi`](stswi.md), [`stswx`](stswx.md) — multi-word / string stores for bulk transfer. +- [`stwbrx`](stwbrx.md), [`sthbrx`](sthbrx.md) — byte-reversed wider stores (no byte-equivalent needed). + +## IBM Reference + +- [AIX 7.3 — `stb` (Store Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stb-store-byte-instruction) +- [AIX 7.3 — `stbu` (Store Byte with Update)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stbu-store-byte-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/std.md b/migration/project-root/ppc-manual/memory/std.md new file mode 100644 index 0000000..ee01e44 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/std.md @@ -0,0 +1,263 @@ +# `std` — Store Doubleword + +> **Category:** [Memory](../categories/memory.md) · **Form:** [DS](../forms/DS.md) · **Opcode:** `0xf8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `std` | `std` | — | Store Doubleword | +| `stdu` | `stdu` | — | Store Doubleword with Update | +| `stdux` | `stdux` | — | Store Doubleword with Update Indexed | +| `stdx` | `stdx` | — | Store Doubleword Indexed | + +## Syntax + +```asm +std [RS], [ds]([RA0]) +stdu [RS], [ds]([RA]) +stdux [RS], [RA], [RB] +stdx [RS], [RA0], [RB] +``` + +## Encoding + +### `std` — form `DS` + +- **Opcode word:** `0xf8000000` +- **Primary opcode (bits 0–5):** `62` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +### `stdu` — form `DS` + +- **Opcode word:** `0xf8000001` +- **Primary opcode (bits 0–5):** `62` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0) | +| 16–29 | `DS` | 14-bit signed word-scaled displacement | +| 30–31 | `XO` | extended opcode | + +### `stdux` — form `X` + +- **Opcode word:** `0x7c00016a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `181` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stdx` — form `X` + +- **Opcode word:** `0x7c00012a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `149` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | std: read; stdu: read; stdux: read; stdx: read | Source GPR (alias for RD in some stores). | +| `RA` | std: read; stdu: read; stdu: write; stdux: read; stdux: write | Source GPR (`r0`–`r31`). | +| `ds` | std: read; stdu: read | 14-bit signed word-aligned displacement (`DS << 2`). | +| `RB` | stdux: read; stdx: read | Source GPR. | +| `RA0` | stdx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | + +## Register Effects + +### `std` + +- **Reads (always):** `RS`, `RA`, `ds` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stdu` + +- **Reads (always):** `RS`, `RA`, `ds` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stdux` + +- **Reads (always):** `RS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stdx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(ds || 0b00) +MEM(EA, 8) <- (RS) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`std`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="std"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:575`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L575) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:399`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L399) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1399-1407`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1399-L1407) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::std => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.ds() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u64(ea, ctx.gpr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stdu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stdu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:594`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L594) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:400`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L400) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1417-1425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1417-L1425) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stdu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.ds() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u64(ea, ctx.gpr[instr.rs()]); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stdux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stdux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:604`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L604) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:786`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L786) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1426-1434`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1426-L1434) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stdux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u64(ea, ctx.gpr[instr.rs()]); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:614`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L614) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:781`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L781) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1408-1416`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1408-L1416) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stdx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u64(ea, ctx.gpr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **DS-form, not D-form.** Like [`ld`](ld.md), `std` uses a 14-bit signed displacement scaled by 4 (`EXTS(ds || 0b00)`). Bits 30–31 are the extended opcode used to distinguish `std` (XO=0) from `stdu` (XO=1). Assemblers verify the byte displacement is a multiple of 4. +- **Big-endian write.** The 64-bit value of `RS` is written most-significant-byte-first: `RS[0:7]` to `EA`, `RS[56:63]` to `EA+7`. Xenia's `mem.write_u64` performs the host-side byte swap if needed. +- **`RA0` for `std` and `stdx`.** When `RA = 0`, base is the literal zero. Update forms `stdu` / `stdux` invoke `RA = 0` as an invalid form (no `RA = RS` collision possible — `RS` and `RA` are independent encoding fields, and even if equal the store reads `RS` first). +- **Update-form post-write.** `stdu` / `stdux` write `EA` to `RA` after the store. Order is store-then-update. +- **Alignment.** Xenon tolerates unaligned doubleword stores. PowerISA permits implementations to raise alignment exceptions; portable code keeps doublewords 8-byte aligned. Cache-inhibited storage may force alignment. +- **Cache-line behaviour.** A doubleword store fits inside one Xenon cache line (128 B), so it's a single line write. A doubleword store that **straddles** a line boundary triggers two line accesses — keep doublewords 8-byte aligned to avoid the cost. +- **64-bit pointer / counter stores.** Xbox 360 user code is 32-bit, but kernel structures, TOC entries, and 64-bit counters are stored with `std`. + +## Related Instructions + +- [`stw`](stw.md), [`sth`](sth.md), [`stb`](stb.md) — narrower integer stores. +- [`stdbrx`](stdbrx.md) — byte-reversed doubleword store. +- [`stdcx`](stdcx.md) — store-conditional (the doubleword reservation pair end). +- [`ld`](ld.md), [`ldu`](ld.md), [`ldx`](ld.md), [`ldux`](ld.md) — corresponding loads. +- [`stfd`](stfd.md) — FP doubleword store (same width, different register file). + +## IBM Reference + +- [AIX 7.3 — `std` (Store Doubleword)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-std-store-doubleword-instruction) +- [AIX 7.3 — `stdu` / `stdx` / `stdux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stdu-store-doubleword-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/stdbrx.md b/migration/project-root/ppc-manual/memory/stdbrx.md new file mode 100644 index 0000000..a27e9b1 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stdbrx.md @@ -0,0 +1,130 @@ +# `stdbrx` — Store Doubleword Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c000528` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stdbrx` | `stdbrx` | — | Store Doubleword Byte-Reverse Indexed | + +## Syntax + +```asm +stdbrx [RS], [RA0], [RB] +``` + +## Encoding + +### `stdbrx` — form `X` + +- **Opcode word:** `0x7c000528` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `660` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stdbrx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stdbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stdbrx: read | Source GPR. | + +## Register Effects + +### `stdbrx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stdbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stdbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:691`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L691) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:829`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L829) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4632-4639`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4632-L4639) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stdbrx => { + let ea = ea_indexed(ctx, instr); + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u64(ea, ctx.gpr[instr.rs()].swap_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Writes little-endian doubleword.** Reverses the 8 bytes of `RS` and stores them at `EA`. Compared to a regular `std`, the byte at `EA` becomes `RS[56:63]` (least-significant), and the byte at `EA+7` becomes `RS[0:7]` (most-significant). The xenia snapshot calls `mem.write_u64(ea, ctx.gpr[instr.rs()].swap_bytes())`. +- **Used to emit little-endian payloads.** Symmetric counterpart of [`ldbrx`](ldbrx.md). Common when writing PC-side file formats, network packets, or PE/COFF headers from PowerPC code. +- **X-form only — no update form, no DS-form.** Only the indexed form exists. `EA = (RA|0) + RB`. Pointer-bumping requires a separate `addi`. +- **`RA0` semantics.** When `RA = 0`, base is literal zero. `stdbrx RS, 0, RB` writes at exact `RB`. +- **Alignment.** Hardware tolerates unaligned 8-byte writes. Cache-inhibited storage may raise alignment exceptions on real hardware. +- **No CR / FPSCR effects.** Pure data movement. +- **Cache-line straddling cost.** As with `std`, writes that cross a 128-byte line boundary touch two cache lines; keep doublewords 8-byte aligned for best performance. + +## Related Instructions + +- [`ldbrx`](ldbrx.md) — load doubleword byte-reverse (the matching load). +- [`stwbrx`](stwbrx.md), [`sthbrx`](sthbrx.md) — narrower byte-reverse stores. +- [`std`](std.md), [`stdx`](std.md) — non-reversing doubleword stores. + +## IBM Reference + +- [AIX 7.3 — `stdbrx` (Store Doubleword Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stdbrx-store-double-word-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/stdcx.md b/migration/project-root/ppc-manual/memory/stdcx.md new file mode 100644 index 0000000..4bacab7 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stdcx.md @@ -0,0 +1,176 @@ +# `stdcx` — Store Doubleword Conditional Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0001ad` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stdcx` | `stdcx` | — | Store Doubleword Conditional Indexed | + +## Syntax + +```asm +stdcx. [RS], [RA0], [RB] +``` + +## Encoding + +### `stdcx` — form `X` + +- **Opcode word:** `0x7c0001ad` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `214` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stdcx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stdcx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stdcx: read | Source GPR. | +| `CR` | stdcx: write | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `stdcx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `stdcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stdcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stdcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:827`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L827) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:69`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L69) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:789`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L789) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4576-4626`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4576-L4626) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stdcx => { + let ea = ea_indexed(ctx, instr); + let line = ea & !RESERVATION_MASK; + let table_route = ctx + .reservation_table + .as_ref() + .filter(|t| t.is_enabled()) + .cloned(); + // PPCBUG-151: stdcx. requires a doubleword (ldarx) reservation; + // a word (lwarx) reservation must not commit here. + let width_ok = ctx.reservation_width == 8; + let success = if let Some(t) = &table_route { + ctx.has_reservation + && width_ok + && ctx.reserved_line == line + && t.try_commit(ea, ctx.reserved_generation, ctx.hw_id) + } else { + // Legacy per-ctx path (M2 default / lockstep). + // PPCBUG-108: same sentinel as stwcx. — fires on non-primary + // HW slots if the table is disabled under --parallel. + debug_assert!( + ctx.hw_id == 0, + "PPCBUG-108: legacy per-ctx stdcx. on non-primary HW slot \ + (hw_id={}) — ReservationTable must be enabled under --parallel", + ctx.hw_id + ); + ctx.has_reservation && width_ok && ctx.reserved_line == line + }; + if success { + mem.write_u64(ea, ctx.gpr[instr.rs()]); + ctx.cr[0] = crate::context::CrField { + lt: false, + gt: false, + eq: true, + so: ctx.xer_so != 0, + }; + } else { + ctx.cr[0] = crate::context::CrField { + lt: false, + gt: false, + eq: false, + so: ctx.xer_so != 0, + }; + if let Some(t) = &table_route { + t.release(ea, ctx.reserved_generation, ctx.hw_id); + } + } + ctx.has_reservation = false; + ctx.reservation_width = 0; // PPCBUG-151: always clear on exit + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Always sets `Rc=1` (the trailing dot).** The mnemonic is `stdcx.` — there is no non-Rc variant. CR0 is updated unconditionally to communicate success/failure. `EQ=1` means the conditional store succeeded; `EQ=0` means it failed (the prior reservation was cleared and no memory was written). +- **Reservation check.** Xenia's snapshot tests `has_reservation && reserved_addr == ea`. On match it performs `mem.write_u64`, sets `EQ=1`. On mismatch it leaves memory untouched and sets `EQ=0`. In both cases the reservation is cleared (`has_reservation = false`), so a retry must be preceded by a fresh [`ldarx`](ldarx.md). +- **Hardware granule.** PowerISA defines reservation by aligned doubleword; Xenon implementations widen this to one 128-byte cache line. A store by another agent anywhere in the line clears the reservation. Xenia's per-address check is more permissive than hardware. +- **Alignment requirement.** `EA` must be 8-byte aligned. Unaligned `stdcx.` raises an alignment exception on real hardware. +- **`RA0` semantics.** When `RA = 0`, base is literal zero — `stdcx. RS, 0, RB` writes at exact `RB`. +- **CR0[SO] reflects XER[SO].** Like all CR-updating ops, CR0[SO] is copied from `XER[SO]` rather than computed from this instruction. +- **Spurious failures permitted.** Hardware may report failure even when no actual conflict occurred (e.g. on context switch). Application code treats failure as a normal retry condition. +- **Pair atomically with [`ldarx`](ldarx.md).** Don't interleave loads/stores between the pair; an [`lwsync`](sync.md) inside the loop body is common. + +## Related Instructions + +- [`ldarx`](ldarx.md) — load-and-reserve doubleword (the matching load). +- [`stwcx`](stwcx.md) / [`lwarx`](lwarx.md) — 32-bit reservation pair. +- [`std`](std.md), [`stdx`](std.md) — non-conditional doubleword stores. +- [`sync`](sync.md), [`lwsync`](sync.md), [`isync`](isync.md) — barriers used around reservation pairs. + +## IBM Reference + +- [AIX 7.3 — `stdcx.` (Store Doubleword Conditional Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stdcx-store-double-word-conditional-indexed-instruction) +- `PowerISA v2.07B Book II` § "Atomic Update Primitives" for canonical reservation semantics and granule rules. diff --git a/migration/project-root/ppc-manual/memory/stfd.md b/migration/project-root/ppc-manual/memory/stfd.md new file mode 100644 index 0000000..8652239 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stfd.md @@ -0,0 +1,260 @@ +# `stfd` — Store Floating-Point Double + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xd8000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stfd` | `stfd` | — | Store Floating-Point Double | +| `stfdu` | `stfdu` | — | Store Floating-Point Double with Update | +| `stfdux` | `stfdux` | — | Store Floating-Point Double with Update Indexed | +| `stfdx` | `stfdx` | — | Store Floating-Point Double Indexed | + +## Syntax + +```asm +stfd [FS], [d]([RA0]) +stfdu [FS], [d]([RA]) +stfdux [FS], [RA], [RB] +stfdx [FS], [RA0], [RB] +``` + +## Encoding + +### `stfd` — form `D` + +- **Opcode word:** `0xd8000000` +- **Primary opcode (bits 0–5):** `54` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stfdu` — form `D` + +- **Opcode word:** `0xdc000000` +- **Primary opcode (bits 0–5):** `55` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stfdux` — form `X` + +- **Opcode word:** `0x7c0005ee` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `759` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stfdx` — form `X` + +- **Opcode word:** `0x7c0005ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `727` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FS` | stfd: read; stfdu: read; stfdux: read; stfdx: read | Source floating-point register. | +| `RA0` | stfd: read; stfdx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | stfd: read; stfdu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RA` | stfdu: read; stfdu: write; stfdux: read; stfdux: write | Source GPR (`r0`–`r31`). | +| `RB` | stfdux: read; stfdx: read | Source GPR. | + +## Register Effects + +### `stfd` + +- **Reads (always):** `FS`, `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stfdu` + +- **Reads (always):** `FS`, `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stfdux` + +- **Reads (always):** `FS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stfdx` + +- **Reads (always):** `FS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +MEM(EA, 8) <- (FRS) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stfd`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfd"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1014`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1014) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:377`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L377) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1473-1481`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1473-L1481) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfd => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f64(ea, ctx.fpr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stfdu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfdu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1026`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1026) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:378`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L378) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1482-1490`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1482-L1490) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfdu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f64(ea, ctx.fpr[instr.rs()]); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stfdux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfdux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1036`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1036) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:837`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L837) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1500-1508`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1500-L1508) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfdux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f64(ea, ctx.fpr[instr.rs()]); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stfdx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfdx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1046`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1046) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:836`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L836) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1491-1499`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1491-L1499) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfdx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f64(ea, ctx.fpr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bit-exact double store.** Writes the 64-bit IEEE binary64 contents of `FRS` directly to memory; no rounding, no format conversion. The xenia snapshot calls `mem.write_f64(ea, ctx.fpr[instr.rs()])`, which preserves the exact bit pattern (including signalling NaNs). +- **No FPSCR side effects.** Like [`lfd`](lfd.md), `stfd` cannot raise IEEE exceptions: there is no rounding step. Contrast [`stfs`](stfs.md), where double→single rounding **can** raise inexact / overflow / underflow. +- **`RA0` (non-update forms).** `RA = 0` in `stfd` and `stfdx` selects literal zero. Update forms `stfdu` / `stfdux` invoke `RA = 0` as an invalid form. +- **Update-form post-write.** `stfdu` / `stfdux` write the computed `EA` back to `RA` after the store. No `FRS` / `RA` collision possible — `RS` is an FPR, `RA` is a GPR. +- **Big-endian write.** Byte at `EA` is the FPR's most-significant byte (sign + part of exponent), byte at `EA+7` is the least-significant mantissa byte. Xenia's `mem.write_f64` performs host-side byte-swap. +- **Alignment.** Xenon tolerates unaligned 8-byte FP stores. PowerISA permits implementations to raise alignment exceptions on cache-inhibited storage. +- **MSR[FP] required.** Disabled FP unit raises Floating-Point Unavailable. + +## Related Instructions + +- [`lfd`](lfd.md), [`lfdu`](lfd.md), [`lfdx`](lfd.md), [`lfdux`](lfd.md) — corresponding loads. +- [`stfs`](stfs.md) — single-precision store with format conversion (can raise FPSCR). +- [`stfiwx`](stfiwx.md) — store low 32 bits of FPR as integer word. +- [`std`](std.md) — integer doubleword store (same width, GPR source). + +## IBM Reference + +- [AIX 7.3 — `stfd` (Store Floating-Point Double)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stfd-store-floating-point-double-instruction) +- [AIX 7.3 — `stfdu` / `stfdx` / `stfdux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stfdu-store-floating-point-double-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/stfiwx.md b/migration/project-root/ppc-manual/memory/stfiwx.md new file mode 100644 index 0000000..1a4ade8 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stfiwx.md @@ -0,0 +1,133 @@ +# `stfiwx` — Store Floating-Point as Integer Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0007ae` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stfiwx` | `stfiwx` | — | Store Floating-Point as Integer Word Indexed | + +## Syntax + +```asm +stfiwx [FS], [RA0], [RB] +``` + +## Encoding + +### `stfiwx` — form `X` + +- **Opcode word:** `0x7c0007ae` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `983` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FS` | stfiwx: read | Source floating-point register. | +| `RA0` | stfiwx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stfiwx: read | Source GPR. | + +## Register Effects + +### `stfiwx` + +- **Reads (always):** `FS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stfiwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfiwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1058`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1058) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:851`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L851) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1509-1518`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1509-L1518) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfiwx => { + // Store FP as integer word: stores low 32 bits of FPR as-is + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, ctx.fpr[instr.rs()].to_bits() as u32); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Stores low 32 bits of FPR as raw bytes.** Writes `FRS[32:63]` (the low half of the 64-bit FPR bit pattern) verbatim — no IEEE rounding, no float→int conversion. Used in conjunction with `fctiw` / `fctiwz` (convert float to integer word, leaving the 32-bit integer in the low half of an FPR) to materialise an integer in memory without going through a GPR. +- **The asymmetric oddity of the FP load/store family.** There is no matching "load FP as integer word" — a 32-bit integer is brought in via `lwz` to a GPR, then to FPR via the memory-round-trip pattern (`stw` then `lfd`). `stfiwx` only exists in the store direction. +- **X-form only — no D-form, no update form.** The instruction has only the indexed form. Compilers usually pair it with `addi` if a constant offset is needed. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `stfiwx FS, 0, RB` writes at exact `RB`. +- **No FPSCR effects.** Pure data movement — does not look at the value, does not round. +- **Big-endian word write.** The 32 bits are written most-significant-byte first into bytes `EA..EA+3`. The xenia snapshot extracts via `to_bits() as u32`, then `mem.write_u32` applies host-side byte-swap. +- **Alignment.** Xenon tolerates unaligned 4-byte writes; cache-inhibited storage may raise alignment exceptions on real hardware. +- **MSR[FP] required.** Disabled FP unit raises Floating-Point Unavailable. + +## Related Instructions + +- [`stfd`](stfd.md), [`stfs`](stfs.md) — regular FP stores. +- [`lfd`](lfd.md), [`lfs`](lfs.md) — FP loads (no `lfiwx` analog). +- [`stw`](stw.md), [`stwx`](stw.md) — integer word stores from a GPR (the GPR-side equivalent). + +## IBM Reference + +- [AIX 7.3 — `stfiwx` (Store Floating-Point as Integer Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stfiwx-store-floating-point-as-integer-word-indexed-instruction) +- `PowerISA v2.07B Book I` § "Floating-Point Load and Store" for the float-to-int memory pattern. diff --git a/migration/project-root/ppc-manual/memory/stfs.md b/migration/project-root/ppc-manual/memory/stfs.md new file mode 100644 index 0000000..e484e65 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stfs.md @@ -0,0 +1,261 @@ +# `stfs` — Store Floating-Point Single + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xd0000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stfs` | `stfs` | — | Store Floating-Point Single | +| `stfsu` | `stfsu` | — | Store Floating-Point Single with Update | +| `stfsux` | `stfsux` | — | Store Floating-Point Single with Update Indexed | +| `stfsx` | `stfsx` | — | Store Floating-Point Single Indexed | + +## Syntax + +```asm +stfs [FS], [d]([RA0]) +stfsu [FS], [d]([RA]) +stfsux [FS], [RA], [RB] +stfsx [FS], [RA], [RB] +``` + +## Encoding + +### `stfs` — form `D` + +- **Opcode word:** `0xd0000000` +- **Primary opcode (bits 0–5):** `52` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stfsu` — form `D` + +- **Opcode word:** `0xd4000000` +- **Primary opcode (bits 0–5):** `53` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stfsux` — form `X` + +- **Opcode word:** `0x7c00056e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `695` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stfsx` — form `X` + +- **Opcode word:** `0x7c00052e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `663` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `FS` | stfs: read; stfsu: read; stfsux: read; stfsx: read | Source floating-point register. | +| `RA0` | stfs: read; stfsx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | stfs: read; stfsu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RA` | stfsu: read; stfsu: write; stfsux: read; stfsux: write | Source GPR (`r0`–`r31`). | +| `RB` | stfsux: read; stfsx: read | Source GPR. | + +## Register Effects + +### `stfs` + +- **Reads (always):** `FS`, `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stfsu` + +- **Reads (always):** `FS`, `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stfsux` + +- **Reads (always):** `FS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stfsx` + +- **Reads (always):** `FS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +MEM(EA, 4) <- SingleFromDouble(FRS) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stfs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1071`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1071) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:375`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L375) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1437-1445`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1437-L1445) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfs => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f32(ea, ctx.fpr[instr.rs()] as f32); + ctx.pc += 4; + } +``` +
+ +**`stfsu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfsu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1084`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1084) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:376`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L376) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1446-1454`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1446-L1454) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfsu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f32(ea, ctx.fpr[instr.rs()] as f32); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stfsux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfsux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1095`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1095) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:834`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L834) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1464-1472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1464-L1472) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfsux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f32(ea, ctx.fpr[instr.rs()] as f32); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stfsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stfsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:1106`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L1106) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:71`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L71) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:832`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L832) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1455-1463`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1455-L1463) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stfsx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_f32(ea, ctx.fpr[instr.rs()] as f32); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Double → single rounding.** `FRS` always holds an IEEE binary64; `stfs` rounds to binary32 using the current `FPSCR[RN]` rounding mode before writing 4 bytes. The xenia snapshot does `ctx.fpr[instr.rs()] as f32`, which Rust defines as round-to-nearest-even; this differs from PPC if `RN` is configured otherwise. Real hardware honours `RN`. +- **FPSCR side effects.** Unlike [`lfs`](lfs.md) / [`lfd`](lfd.md) / [`stfd`](stfd.md), `stfs` **can** raise `FPSCR[XX]` (inexact), `OX` (overflow), `UX` (underflow), and `VXSNAN` (signalling NaN) per IEEE-754 narrowing rules. These take effect even though the write itself succeeds (architecturally — xenia's `as f32` cast does not surface these flags). +- **Out-of-range doubles.** Values larger than binary32's max (~3.4e38) round to ±∞; values smaller than min normal flush to ±0 or denormal per `FPSCR[NI]`. NaNs are quieted (the signalling bit drops). +- **`RA0` (non-update forms).** `RA = 0` in `stfs` and `stfsx` selects literal zero. Update forms `stfsu` / `stfsux` invoke `RA = 0` as an invalid form. +- **Update-form post-write.** `stfsu` / `stfsux` write `EA` back to `RA` after the store. +- **Big-endian write.** 4 bytes most-significant-byte first. +- **Alignment.** Xenon tolerates unaligned 4-byte FP stores; cache-inhibited storage may raise alignment exceptions on real hardware. +- **MSR[FP] required.** Disabled FP unit raises Floating-Point Unavailable. + +## Related Instructions + +- [`lfs`](lfs.md), [`lfsu`](lfs.md), [`lfsx`](lfs.md), [`lfsux`](lfs.md) — corresponding loads (single→double widening, can't raise exceptions). +- [`stfd`](stfd.md) — double-precision store (no rounding, no FPSCR effects). +- [`stfiwx`](stfiwx.md) — store-FP-as-integer-word. +- [`stw`](stw.md) — integer word store (same width, GPR source). + +## IBM Reference + +- [AIX 7.3 — `stfs` (Store Floating-Point Single)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stfs-store-floating-point-single-instruction) +- [AIX 7.3 — `stfsu` / `stfsx` / `stfsux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stfsu-store-floating-point-single-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/sth.md b/migration/project-root/ppc-manual/memory/sth.md new file mode 100644 index 0000000..49028c6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/sth.md @@ -0,0 +1,260 @@ +# `sth` — Store Half Word + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xb0000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sth` | `sth` | — | Store Half Word | +| `sthu` | `sthu` | — | Store Half Word with Update | +| `sthux` | `sthux` | — | Store Half Word with Update Indexed | +| `sthx` | `sthx` | — | Store Half Word Indexed | + +## Syntax + +```asm +sth [RS], [d]([RA0]) +sthu [RS], [d]([RA]) +sthux [RS], [RA], [RB] +sthx [RS], [RA0], [RB] +``` + +## Encoding + +### `sth` — form `D` + +- **Opcode word:** `0xb0000000` +- **Primary opcode (bits 0–5):** `44` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `sthu` — form `D` + +- **Opcode word:** `0xb4000000` +- **Primary opcode (bits 0–5):** `45` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `sthux` — form `X` + +- **Opcode word:** `0x7c00036e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `439` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `sthx` — form `X` + +- **Opcode word:** `0x7c00032e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `407` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | sth: read; sthu: read; sthux: read; sthx: read | Source GPR (alias for RD in some stores). | +| `RA0` | sth: read; sthx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | sth: read; sthu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RA` | sthu: read; sthu: write; sthux: read; sthux: write | Source GPR (`r0`–`r31`). | +| `RB` | sthux: read; sthx: read | Source GPR. | + +## Register Effects + +### `sth` + +- **Reads (always):** `RS`, `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `sthu` + +- **Reads (always):** `RS`, `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `sthux` + +- **Reads (always):** `RS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `sthx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +MEM(EA, 2) <- (RS)[48:63] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sth`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sth"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:455`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L455) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:73`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L73) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:367`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L367) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1363-1371`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1363-L1371) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sth => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u16(ea, ctx.gpr[instr.rs()] as u16); + ctx.pc += 4; + } +``` +
+ +**`sthu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sthu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:475`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L475) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:73`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L73) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:368`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L368) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1372-1380`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1372-L1380) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sthu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u16(ea, ctx.gpr[instr.rs()] as u16); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`sthux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sthux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:485`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L485) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:73`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L73) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:808`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L808) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1390-1398`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1390-L1398) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sthux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u16(ea, ctx.gpr[instr.rs()] as u16); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`sthx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sthx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:495`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L495) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:73`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L73) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:806`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L806) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1381-1389`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1381-L1389) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sthx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u16(ea, ctx.gpr[instr.rs()] as u16); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Stores low 16 bits of `RS`.** Writes `(RS)[48:63]` — the low half-word — at `EA`. The xenia snapshot does `mem.write_u16(ea, ctx.gpr[instr.rs()] as u16)`. The high 48 bits of `RS` are ignored: storing a 64-bit value through `sth` silently truncates. +- **Big-endian write.** Byte at `EA` is the high byte of the half (`RS[48:55]`), byte at `EA+1` is the low byte (`RS[56:63]`). On little-endian hosts the byte-swap happens at the memory boundary. +- **`RA0` (non-update forms).** `RA = 0` in `sth` and `sthx` selects literal zero. Update forms `sthu` / `sthux` invoke `RA = 0` as an invalid form. +- **Update-form post-write.** `sthu` / `sthux` write the computed `EA` back to `RA` after the store. +- **No alignment requirement.** Xenon tolerates unaligned half-word stores; the two bytes are written at `EA` and `EA+1` regardless of alignment. +- **Common in audio / Unicode code.** Standard store for 16-bit PCM samples and UTF-16 code units. Compilers emit `sth` for `short *` writes. +- **Cache effects.** A `sth` to a cold line triggers a read-allocate; for bulk half-word writes to a fresh line, prefer pre-clearing with [`dcbz128`](dcbz.md). + +## Related Instructions + +- [`stb`](stb.md), [`stw`](stw.md), [`std`](std.md) — narrower / wider stores. +- [`sthbrx`](sthbrx.md) — byte-reversed half-word store (little-endian half). +- [`lhz`](lhz.md), [`lha`](lha.md) — corresponding loads (zero / sign extension). +- [`stmw`](stmw.md), [`stswi`](stswi.md), [`stswx`](stswx.md) — bulk stores. + +## IBM Reference + +- [AIX 7.3 — `sth` (Store Half)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sth-store-half-instruction) +- [AIX 7.3 — `sthu` / `sthx` / `sthux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sthu-store-half-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/sthbrx.md b/migration/project-root/ppc-manual/memory/sthbrx.md new file mode 100644 index 0000000..0860cc2 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/sthbrx.md @@ -0,0 +1,131 @@ +# `sthbrx` — Store Half Word Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00072c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `sthbrx` | `sthbrx` | — | Store Half Word Byte-Reverse Indexed | + +## Syntax + +```asm +sthbrx [RS], [RA0], [RB] +``` + +## Encoding + +### `sthbrx` — form `X` + +- **Opcode word:** `0x7c00072c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `918` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | sthbrx: read | Source GPR (alias for RD in some stores). | +| `RA0` | sthbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | sthbrx: read | Source GPR. | + +## Register Effects + +### `sthbrx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`sthbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="sthbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:667`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L667) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:73`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L73) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:846`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L846) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1822-1830`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1822-L1830) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::sthbrx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u16(ea, (ctx.gpr[instr.rs()] as u16).swap_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Writes little-endian half.** Takes the low 16 bits of `RS`, swaps the two bytes, writes them at `EA`. After execution, byte at `EA` is `RS[56:63]` (low byte) and byte at `EA+1` is `RS[48:55]` (high byte). The xenia snapshot does `(ctx.gpr[instr.rs()] as u16).swap_bytes()`. +- **Used to emit little-endian half-words.** Symmetric counterpart of [`lhbrx`](lhbrx.md). Common in PNG / ZIP / RIFF chunk emit paths. +- **High bits of `RS` ignored.** Storing a 64-bit value through `sthbrx` truncates and reverses only the low half-word; the high 48 bits are not consulted. +- **X-form only — no D-form, no update form.** Only the indexed form exists. `EA = (RA|0) + RB`. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `sthbrx RS, 0, RB` writes at exact `RB`. +- **Alignment.** Hardware tolerates unaligned half-word writes; cache-inhibited storage may raise alignment exceptions on real hardware. +- **No CR / FPSCR effects.** + +## Related Instructions + +- [`lhbrx`](lhbrx.md) — load-half byte-reverse (matching load). +- [`stwbrx`](stwbrx.md), [`stdbrx`](stdbrx.md) — wider byte-reverse stores. +- [`sth`](sth.md), [`sthx`](sth.md) — non-reversing half stores. + +## IBM Reference + +- [AIX 7.3 — `sthbrx` (Store Half Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-sthbrx-store-half-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/stmw.md b/migration/project-root/ppc-manual/memory/stmw.md new file mode 100644 index 0000000..c64bbe1 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stmw.md @@ -0,0 +1,143 @@ +# `stmw` — Store Multiple Word + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0xbc000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stmw` | `stmw` | — | Store Multiple Word | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `stmw` — form `D` + +- **Opcode word:** `0xbc000000` +- **Primary opcode (bits 0–5):** `47` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `stmw` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stmw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stmw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:527`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L527) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:75`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L75) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:370`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L370) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1735-1759`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1735-L1759) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stmw => { + let mut ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + ea = ea.wrapping_add(instr.d() as i64 as u64); + // PPCBUG-160: stmw can span two cache lines when (32-rs)*4 > one line. + // Iterate over every touched line so any reservation on a later line + // is also invalidated (same guarantee as single-word stores). + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let start_ea = ea as u32; + let last_ea = start_ea.wrapping_add((32 - instr.rs() as u32) * 4).wrapping_sub(1); + let line_size = RESERVATION_MASK + 1; + let mut line = start_ea & !RESERVATION_MASK; + loop { + t.invalidate_for_write(line); + if line >= (last_ea & !RESERVATION_MASK) { break; } + line = line.wrapping_add(line_size); + } + } + } + for r in instr.rs()..32 { + mem.write_u32(ea as u32, ctx.gpr[r] as u32); + ea = ea.wrapping_add(4); + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bulk register save.** Stores `(32 - RS)` consecutive 32-bit words taken from `r[RS]`, `r[RS+1]`, …, `r31` to memory starting at `EA`. The symmetric counterpart of [`lmw`](lmw.md). Used by AIX/PowerPC ABI prologues to save non-volatile GPRs in one instruction. +- **Each store is the low 32 bits of the GPR.** Xenia's snapshot writes `ctx.gpr[r] as u32` — only the low half of the 64-bit GPR. The high 32 bits are discarded; `stmw` cannot save 64-bit values (use a sequence of [`std`](std.md) instead). +- **Big-endian write.** Word from `r[RS]` lands at `EA`, word from `r[RS+1]` at `EA+4`, etc. Each word is itself written most-significant-byte first. +- **`RA0` semantics.** When `RA = 0`, base is the literal zero. Useful for absolute-address restoration. +- **Alignment.** PowerISA requires word-aligned `EA`; an unaligned `stmw` may raise an alignment exception on hardware. Xenia tolerates it. +- **Performance trap.** Modern PowerPC implementations microcode `stmw` — typically slower than the same number of `stw` instructions. Compilers prefer the unrolled form. +- **Cache-line behaviour.** When the run of words crosses several 128-byte cache lines, each cold line triggers a read-allocate. Pre-clearing with [`dcbz128`](dcbz.md) helps for fresh frames. + +## Related Instructions + +- [`lmw`](lmw.md) — symmetric "load multiple words" (the matching epilogue partner). +- [`stw`](stw.md), [`stwx`](stw.md) — single-word stores; the modern preferred form. +- [`stswi`](stswi.md), [`stswx`](stswx.md) — store string (byte-granular bulk transfer). +- [`std`](std.md) — for 64-bit values (no "store multiple doubleword" exists). + +## IBM Reference + +- [AIX 7.3 — `stmw` (Store Multiple Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stmw-store-multiple-word-instruction) +- `PowerISA v2.07B Book II` § "Load and Store Multiple". diff --git a/migration/project-root/ppc-manual/memory/stswi.md b/migration/project-root/ppc-manual/memory/stswi.md new file mode 100644 index 0000000..7cc9772 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stswi.md @@ -0,0 +1,145 @@ +# `stswi` — Store String Word Immediate + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0005aa` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stswi` | `stswi` | — | Store String Word Immediate | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `stswi` — form `X` + +- **Opcode word:** `0x7c0005aa` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `725` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `stswi` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stswi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stswi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:737`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L737) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:75`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L75) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:835`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L835) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1540-1564`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1540-L1564) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stswi => { + let mut ea = if instr.ra() == 0 { 0u32 } else { ctx.gpr[instr.ra()] as u32 }; + let nb = if instr.nb() == 0 { 32 } else { instr.nb() }; + let mut rs = instr.rs(); + let mut bytes_left = nb; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(nb - 1) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + while bytes_left > 0 { + let val = ctx.gpr[rs] as u32; + for byte_idx in 0..4 { + if bytes_left == 0 { break; } + mem.write_u8(ea, (val >> (24 - byte_idx * 8)) as u8); + ea = ea.wrapping_add(1); + bytes_left -= 1; + } + rs = (rs + 1) % 32; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Byte-granular bulk store.** Symmetric counterpart of [`lswi`](lswi.md). Reads the low 32 bits of `RS`, `RS+1`, …, takes the top byte of each (then the next, etc.) and writes successive bytes at `EA`. The byte count `NB` is in the `RB` field of the encoding (1..31), with `NB = 0` meaning "32 bytes". +- **Register wraparound at r31 → r0.** Xenia's snapshot increments `rs = (rs + 1) % 32`. After r31 the source becomes r0, then r1, etc. Rare in practice; AIX flags overlapping register / address ranges as invalid. +- **Big-endian byte ordering inside each register.** Writes the most-significant byte first: `mem.write_u8(ea, (val >> 24) as u8)`, then bits 16–23, etc. Matches the byte order produced by [`lswi`](lswi.md), so a `lswi`/`stswi` pair round-trips a buffer. +- **Last partial register.** When `NB` is not a multiple of 4, the final source register has its trailing low bytes ignored — only the leading bytes that fit in the byte budget are written. +- **`RA0` semantics.** `RA = 0` selects literal zero. `stswi` is not an update form; `RA` is not modified. +- **Alignment.** Architecture allows arbitrary alignment; cache-inhibited storage may raise alignment exceptions on hardware. +- **Vanishingly rare in compiled code.** Compilers don't emit `stswi`. Hand-written `memcpy` cores may. + +## Related Instructions + +- [`lswi`](lswi.md) — symmetric load. +- [`stswx`](stswx.md) — register-supplied byte-count variant. +- [`stmw`](stmw.md) — word-granular bulk store. +- [`stw`](stw.md), [`stb`](stb.md) — scalar stores compilers actually emit. + +## IBM Reference + +- [AIX 7.3 — `stswi` (Store String Word Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stswi-store-string-word-immediate-instruction) +- `PowerISA v2.07B Book II` § "Load and Store String". diff --git a/migration/project-root/ppc-manual/memory/stswx.md b/migration/project-root/ppc-manual/memory/stswx.md new file mode 100644 index 0000000..cad9b71 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stswx.md @@ -0,0 +1,148 @@ +# `stswx` — Store String Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00052a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stswx` | `stswx` | — | Store String Word Indexed | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `stswx` — form `X` + +- **Opcode word:** `0x7c00052a` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `661` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | + +## Register Effects + +### `stswx` + +- **Reads (always):** _none_ +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stswx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stswx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:742`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L742) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:75`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L75) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:830`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L830) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4663-4689`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4663-L4689) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stswx => { + let mut ea = ea_indexed(ctx, instr); + let nb = ctx.xer() & 0x7F; + let mut rs = instr.rs(); + let mut bytes_left = nb; + if nb > 0 { + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(nb - 1) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + } + while bytes_left > 0 { + let val = ctx.gpr[rs] as u32; + for byte_idx in 0..4 { + if bytes_left == 0 { break; } + mem.write_u8(ea, (val >> (24 - byte_idx * 8)) as u8); + ea = ea.wrapping_add(1); + bytes_left -= 1; + } + rs = (rs + 1) % 32; + } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Byte count from `XER[25..31]`.** Unlike `stswi`, the byte count `NB` (0..127) is read from `XER[25..31]`. The xenia snapshot does `let nb = (ctx.xer() & 0x7F) as u32;`. `NB = 0` means literally zero bytes — the instruction becomes a no-op. +- **Register packing identical to `stswi`.** Bytes are pulled from successive GPRs, four bytes per register, big-endian within each register, with wraparound `r31 → r0`. The final partial register's unused trailing bytes are not written. +- **`RA0` semantics.** `RA = 0` selects literal zero. The instruction has no update form — `RA` is not modified. +- **Invalid forms.** AIX flags as invalid the cases where the byte-stream wraps through `RA` or `RB` while reading the source registers; xenia performs writes regardless. +- **Big-endian byte ordering inside each register.** Writes most-significant byte of each source GPR's low word first. +- **Used for non-multiple-of-4 copies.** Together with `lswx`, gives a way to store a runtime-determined byte count without per-byte loops. Compilers don't emit it. +- **Alignment.** Architecture allows arbitrary alignment; cache-inhibited storage may raise alignment exceptions on hardware. +- **No CR / FPSCR effects.** + +## Related Instructions + +- [`lswx`](lswx.md) — symmetric load. +- [`stswi`](stswi.md) — sibling with byte count encoded as `RB` field (immediate-style). +- [`stmw`](stmw.md) — word-granular bulk store (no byte tail handling). +- [`stw`](stw.md), [`stb`](stb.md) — scalar stores. + +## IBM Reference + +- [AIX 7.3 — `stswx` (Store String Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stswx-store-string-word-indexed-instruction) +- `PowerISA v2.07B Book II` § "Load and Store String" for invalid-form rules and `XER` interaction. diff --git a/migration/project-root/ppc-manual/memory/stvebx.md b/migration/project-root/ppc-manual/memory/stvebx.md new file mode 100644 index 0000000..7b7f1f8 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvebx.md @@ -0,0 +1,136 @@ +# `stvebx` — Store Vector Element Byte Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00010e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvebx` | `stvebx` | — | Store Vector Element Byte Indexed | + +## Syntax + +```asm +stvebx [VS], [RA0], [RB] +``` + +## Encoding + +### `stvebx` — form `X` + +- **Opcode word:** `0x7c00010e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `135` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvebx: read | Source vector register (alias for VD on stores). | +| `RA0` | stvebx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvebx: read | Source GPR. | + +## Register Effects + +### `stvebx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvebx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvebx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:152`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L152) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:778`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L778) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1914-1926`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1914-L1926) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvebx => { + // Store vS[EA & 0xF] (1 byte) to memory at EA. + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + // PPCBUG-512: stvebx was missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let slot = (ea & 0xF) as usize; + let bytes = ctx.vr[instr.rs()].as_bytes(); + mem.write_u8(ea, bytes[slot]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single-byte element store.** Architecturally `stvebx` writes exactly **one** byte from lane `EA mod 16` of `VS` to address `EA`. Other lanes are unaffected, and other memory bytes are unaffected. +- **Xenia simplification — full 16-byte write.** The xenia snapshot is shared with `stvehx` / `stvewx` and writes the **entire 16-byte aligned line** (`ea & ~0xF`, then 16 bytes from the vector). This is stronger than the architectural single-byte store — it overwrites 15 adjacent bytes with whatever the source vector holds. Code that depends on architectural per-byte granularity (e.g. interleaved writes from multiple threads / DMA agents into the same line) may behave differently than on hardware. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **No update form, no VMX128 sibling.** No `stvebux`; no `stvebx128` — single-byte stores were kept Altivec-only in the Xbox 360 extension. +- **Big-endian within the line.** Lane 0 of `VS` corresponds to the byte at the aligned base address. +- **Common idiom.** Pair with `vsplt*` to broadcast a value, then `stvebx` to write one byte. Less efficient than `stb` from a GPR; rare in compiled code. +- **Hardware fault model.** A protected or unmapped page raises a DSI exception just as for any store. + +## Related Instructions + +- [`stvehx`](stvehx.md), [`stvewx`](stvewx.md) — single half / single word element stores. +- [`stvx`](stvx.md), [`stvxl`](stvxl.md) — full 16-byte aligned vector stores. +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — store-left / store-right unaligned vector ops. +- [`lvebx`](lvebx.md) — symmetric single-byte load. + +## IBM Reference + +- [AIX 7.3 — `stvebx` (Store Vector Element Byte Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvebx-store-vector-element-byte-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" § "Vector Load and Store" for canonical per-byte semantics. diff --git a/migration/project-root/ppc-manual/memory/stvehx.md b/migration/project-root/ppc-manual/memory/stvehx.md new file mode 100644 index 0000000..3b67f2a --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvehx.md @@ -0,0 +1,138 @@ +# `stvehx` — Store Vector Element Half Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00014e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvehx` | `stvehx` | — | Store Vector Element Half Word Indexed | + +## Syntax + +```asm +stvehx [VS], [RA0], [RB] +``` + +## Encoding + +### `stvehx` — form `X` + +- **Opcode word:** `0x7c00014e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `167` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvehx: read | Source vector register (alias for VD on stores). | +| `RA0` | stvehx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvehx: read | Source GPR. | + +## Register Effects + +### `stvehx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvehx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvehx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:160`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L160) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:784`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L784) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1927-1941`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1927-L1941) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvehx => { + // Store vS[slot] (1 halfword) at EA & ~1. slot = (EA & 0xF) >> 1. + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea_unaligned = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let ea = ea_unaligned & !0x1u32; + // PPCBUG-512: stvehx was missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let slot = ((ea_unaligned & 0xF) >> 1) as usize; + let bytes = ctx.vr[instr.rs()].as_bytes(); + let h = ((bytes[slot * 2] as u16) << 8) | (bytes[slot * 2 + 1] as u16); + mem.write_u16(ea, h); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single half-word element store.** Architecturally `stvehx` writes exactly **two** bytes from half-word lane `(EA mod 16) >> 1` of `VS` to address `EA & ~1` (low bit forced to half-aligned). Other lanes are unaffected, and bytes outside the 2-byte window are unaffected. +- **Xenia simplification — full 16-byte write.** The xenia snapshot is shared with `stvebx` / `stvewx`: writes 16 bytes of the source vector at `ea & ~0xF`. This is stronger than the architectural 2-byte store — it overwrites 14 adjacent bytes that hardware would have left alone. +- **EA forced half-aligned.** Hardware drops the low bit; xenia's shared snapshot drops the low four bits. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **No update form, no VMX128 sibling.** No `stvehux`; no `stvehx128`. +- **Big-endian half within the lane.** The byte at the lower address is the most-significant byte of the half-word lane. +- **Common idiom.** Pair with `vsplth` to broadcast then store one half; rare in compiled code (compilers prefer `sth`). + +## Related Instructions + +- [`stvebx`](stvebx.md), [`stvewx`](stvewx.md) — single byte / word element stores. +- [`stvx`](stvx.md), [`stvxl`](stvxl.md) — full 16-byte aligned vector stores. +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — store-left / store-right unaligned ops. +- [`lvehx`](lvehx.md) — symmetric single-half load. + +## IBM Reference + +- [AIX 7.3 — `stvehx` (Store Vector Element Half Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvehx-store-vector-element-half-word-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility" § "Vector Load and Store". diff --git a/migration/project-root/ppc-manual/memory/stvewx.md b/migration/project-root/ppc-manual/memory/stvewx.md new file mode 100644 index 0000000..c1355c6 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvewx.md @@ -0,0 +1,198 @@ +# `stvewx` — Store Vector Element Word Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00018e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvewx` | `stvewx` | — | Store Vector Element Word Indexed | +| `stvewx128` | `stvewx128` | — | Store Vector Element Word Indexed 128 | + +## Syntax + +```asm +stvewx [VS], [RA0], [RB] +stvewx128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvewx` — form `X` + +- **Opcode word:** `0x7c00018e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `199` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvewx128` — form `VX128_1` + +- **Opcode word:** `0x10000183` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `387` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvewx: read; stvewx128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvewx: read; stvewx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvewx: read; stvewx128: read | Source GPR. | + +## Register Effects + +### `stvewx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvewx128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvewx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvewx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:180`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L180) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:788`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L788) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1942-1959`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1942-L1959) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvewx => { + // Store vS[slot] (1 word) at EA & ~3. slot = (EA & 0xF) >> 2. + let base = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea_unaligned = base.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let ea = ea_unaligned & !0x3u32; + // PPCBUG-512: stvewx was missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let slot = ((ea_unaligned & 0xF) >> 2) as usize; + let bytes = ctx.vr[instr.rs()].as_bytes(); + let w = ((bytes[slot * 4] as u32) << 24) + | ((bytes[slot * 4 + 1] as u32) << 16) + | ((bytes[slot * 4 + 2] as u32) << 8) + | (bytes[slot * 4 + 3] as u32); + mem.write_u32(ea, w); + ctx.pc += 4; + } +``` +
+ +**`stvewx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvewx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:183`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L183) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:416`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L416) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3175-3192`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3175-L3192) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvewx128 => { + // Mirror of stvewx: word-align EA, extract one 32-bit lane, write 4 bytes only. + // Previous code used & !0xF (16-byte) and wrote all 16 bytes, corrupting 12 + // adjacent bytes on every execution (PPCBUG-510). + let ea_unaligned = ea_indexed(ctx, instr); + let ea = ea_unaligned & !0x3u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let slot = ((ea_unaligned & 0xF) >> 2) as usize; + let bytes = ctx.vr[instr.vs128()].as_bytes(); + let w = ((bytes[slot * 4] as u32) << 24) + | ((bytes[slot * 4 + 1] as u32) << 16) + | ((bytes[slot * 4 + 2] as u32) << 8) + | (bytes[slot * 4 + 3] as u32); + mem.write_u32(ea, w); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Single word element store.** Architecturally `stvewx` writes exactly **four** bytes from word lane `(EA mod 16) >> 2` of `VS` to address `EA & ~3` (low two bits forced to word-aligned). Other lanes are unaffected, and bytes outside the 4-byte window are unaffected. +- **Xenia simplification — full 16-byte write.** Both `stvewx` and `stvewx128` snapshots write the full 16 bytes of the source vector at `ea & ~0xF`. This overwrites 12 bytes that hardware would have left alone. +- **EA forced word-aligned.** Hardware drops the low two bits; xenia's snapshots drop the low four. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **No update form.** No `stvewux`. +- **VMX128 sibling (`stvewx128`).** Identical semantics; alternative operand encoding addressing `v0..v127` via the split-field 7-bit register index. +- **Big-endian word within the lane.** The byte at the lower address is the most-significant byte. +- **Common idiom.** Pair with `vspltw` to broadcast a 32-bit FP/integer value, then `stvewx` to commit one lane. Less common than `stw` from a GPR. + +## Related Instructions + +- [`stvebx`](stvebx.md), [`stvehx`](stvehx.md) — single byte / half element stores. +- [`stvx`](stvx.md), [`stvx128`](stvx.md), [`stvxl`](stvxl.md) — full 16-byte aligned vector stores. +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — store-left / store-right unaligned ops. +- [`lvewx`](lvewx.md), [`lvewx128`](lvewx.md) — symmetric single-word loads. + +## IBM Reference + +- [AIX 7.3 — `stvewx` (Store Vector Element Word Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvewx-store-vector-element-word-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for `stvewx128`. diff --git a/migration/project-root/ppc-manual/memory/stvlx.md b/migration/project-root/ppc-manual/memory/stvlx.md new file mode 100644 index 0000000..f18c6af --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvlx.md @@ -0,0 +1,193 @@ +# `stvlx` — Store Vector Left Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00050e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvlx` | `stvlx` | — | Store Vector Left Indexed | +| `stvlx128` | `stvlx128` | — | Store Vector Left Indexed 128 | + +## Syntax + +```asm +stvlx [VS], [RA0], [RB] +stvlx128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvlx` — form `X` + +- **Opcode word:** `0x7c00050e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `647` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvlx128` — form `VX128_1` + +- **Opcode word:** `0x10000503` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1283` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvlx: read; stvlx128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvlx: read; stvlx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvlx: read; stvlx128: read | Source GPR. | + +## Register Effects + +### `stvlx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvlx128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvlx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvlx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:265`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L265) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:828`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L828) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3103-3119`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3103-L3119) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvlx | PpcOpcode::stvlxl => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-513: stvlx/stvlxl were missing invalidate_for_write. + // store_vector_left writes [ea, (ea & !0xF)+15]; in the worst case (ea & 0xF == 0) + // that is exactly 16 bytes all within the same 16-byte block, so ea+15 lands in the + // same 128-byte cache line. Two-call form is kept for defensive correctness. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_left(mem, ea, ctx.vr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stvlx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvlx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:268`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L268) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:422`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L422) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3120-3133`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3120-L3133) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvlx128 | PpcOpcode::stvlxl128 => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-513: stvlx128/stvlxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_left(mem, ea, ctx.vr[instr.vs128()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Store-left half of an unaligned vector.** `stvlx` writes `(16 - (EA mod 16))` bytes from the **left** (low-lane) half of `VS` to addresses starting at the **exact** `EA`. The right half of `VS` is not stored. Combine with `stvrx` at `EA + 16` to commit a full unaligned vector across an alignment boundary. +- **Companion idiom.** `stvlx VS, RA, RB ; stvrx VS, RA, RB+16` writes the 16 bytes of `VS` to address `EA` regardless of alignment. The two halves are byte-disjoint, so the order between them doesn't affect correctness. +- **No alignment masking.** Unlike `stvx`, the `EA` is **not** rounded down. `EA mod 16` controls how the source vector splits. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of the VMX128 / Cell BE extended set, not in baseline Altivec. +- **Implementation in xenia.** The shared snapshot calls `vmx::store_vector_left(mem, ea, vs)`, performing the unaligned partial-byte write. +- **VMX128 sibling (`stvlx128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. +- **`stvlxl` is the LRU-hint variant.** Same data behaviour, hint ignored under emulation. + +## Related Instructions + +- [`stvrx`](stvrx.md), [`stvrx128`](stvrx.md) — store-right partner. +- [`stvlxl`](stvlxl.md), [`stvlxl128`](stvlxl.md) — LRU-hint variants. +- [`stvx`](stvx.md), [`stvx128`](stvx.md) — aligned store (the EA-masking sibling). +- [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — symmetric unaligned loads. + +## IBM Reference + +- [AIX 7.3 — `stvlx` (Store Vector Left Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvlx-store-vector-left-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 unaligned stores. diff --git a/migration/project-root/ppc-manual/memory/stvlxl.md b/migration/project-root/ppc-manual/memory/stvlxl.md new file mode 100644 index 0000000..e5aef13 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvlxl.md @@ -0,0 +1,192 @@ +# `stvlxl` — Store Vector Left Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00070e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvlxl` | `stvlxl` | — | Store Vector Left Indexed LRU | +| `stvlxl128` | `stvlxl128` | — | Store Vector Left Indexed LRU 128 | + +## Syntax + +```asm +stvlxl [VS], [RA0], [RB] +stvlxl128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvlxl` — form `X` + +- **Opcode word:** `0x7c00070e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `903` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvlxl128` — form `VX128_1` + +- **Opcode word:** `0x10000703` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1795` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvlxl: read; stvlxl128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvlxl: read; stvlxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvlxl: read; stvlxl128: read | Source GPR. | + +## Register Effects + +### `stvlxl` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvlxl128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvlxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvlxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:271`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L271) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:845`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L845) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3103-3119`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3103-L3119) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvlx | PpcOpcode::stvlxl => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-513: stvlx/stvlxl were missing invalidate_for_write. + // store_vector_left writes [ea, (ea & !0xF)+15]; in the worst case (ea & 0xF == 0) + // that is exactly 16 bytes all within the same 16-byte block, so ea+15 lands in the + // same 128-byte cache line. Two-call form is kept for defensive correctness. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_left(mem, ea, ctx.vr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stvlxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvlxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:274`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L274) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:77`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L77) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:426`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L426) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3120-3133`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3120-L3133) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvlx128 | PpcOpcode::stvlxl128 => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-513: stvlx128/stvlxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_left(mem, ea, ctx.vr[instr.vs128()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`stvlx`](stvlx.md), with LRU cache hint.** Writes `(16 - (EA mod 16))` bytes from the left half of `VS` starting at `EA`; right half not stored. The `l` suffix marks the touched line as least-recently-used. +- **Hint ignored under emulation.** Xenia's snapshot is shared with `stvlx` (`PpcOpcode::stvlx | PpcOpcode::stvlxl => …`). +- **No alignment masking.** The exact `EA` controls how data is split. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of VMX128 / Cell BE. +- **Streaming write use case.** Pair with [`stvrxl`](stvrxl.md) when the buffer is one-pass output that should not pollute the cache. +- **VMX128 sibling (`stvlxl128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. + +## Related Instructions + +- [`stvlx`](stvlx.md), [`stvlx128`](stvlx.md) — non-hint variants. +- [`stvrxl`](stvrxl.md), [`stvrxl128`](stvrxl.md) — store-right LRU partner. +- [`stvxl`](stvxl.md), [`stvxl128`](stvxl.md) — aligned LRU vector store. +- [`lvlxl`](lvlxl.md), [`lvrxl`](lvrxl.md) — symmetric LRU loads. + +## IBM Reference + +- [AIX 7.3 — `stvlxl` (Store Vector Left Indexed Last)](https://www.ibm.com/docs/en/aix/7.3.0?topic=reference-instruction-set) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for cache-hint behaviour. diff --git a/migration/project-root/ppc-manual/memory/stvrx.md b/migration/project-root/ppc-manual/memory/stvrx.md new file mode 100644 index 0000000..f900103 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvrx.md @@ -0,0 +1,193 @@ +# `stvrx` — Store Vector Right Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00054e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvrx` | `stvrx` | — | Store Vector Right Indexed | +| `stvrx128` | `stvrx128` | — | Store Vector Right Indexed 128 | + +## Syntax + +```asm +stvrx [VS], [RA0], [RB] +stvrx128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvrx` — form `X` + +- **Opcode word:** `0x7c00054e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `679` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvrx128` — form `VX128_1` + +- **Opcode word:** `0x10000543` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1347` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvrx: read; stvrx128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvrx: read; stvrx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvrx: read; stvrx128: read | Source GPR. | + +## Register Effects + +### `stvrx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvrx128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:290`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L290) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:78`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L78) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:833`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L833) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3134-3150`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3134-L3150) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvrx | PpcOpcode::stvrxl => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-514: stvrx/stvrxl were missing invalidate_for_write. + // store_vector_right writes [ea & !0xF, ea-1] (up to 15 bytes, all within a single + // 16-byte-aligned block). Two-call form is kept for defensive correctness. + // stvrx at shift==0 is a no-op; the guard fires unconditionally (cheap). + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_right(mem, ea, ctx.vr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stvrx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvrx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:293`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L293) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:78`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L78) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:423`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L423) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3151-3164`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3151-L3164) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvrx128 | PpcOpcode::stvrxl128 => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-514: stvrx128/stvrxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_right(mem, ea, ctx.vr[instr.vs128()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Store-right half of an unaligned vector.** `stvrx` writes `(EA mod 16)` bytes from the **right** (high-lane) half of `VS` to the addresses *just below* `EA & ~0xF` (the bytes from the previous aligned line that fall on the right side of the unaligned vector). The left half of `VS` is not stored. +- **Standard pair-mate of [`stvlx`](stvlx.md).** `stvlx VS, RA, RB ; stvrx VS, RA, RB+16` (or analogous addressing) commits the 16 bytes of `VS` to address `EA` regardless of alignment. The two halves are byte-disjoint, so order is irrelevant for correctness. +- **No alignment masking.** Unlike `stvx`, the exact `EA` is used; `EA mod 16` controls how `VS` splits. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of the VMX128 / Cell BE extended set. +- **Implementation in xenia.** The shared snapshot calls `vmx::store_vector_right(mem, ea, vs)`, performing the unaligned partial-byte write of the right side. +- **VMX128 sibling (`stvrx128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. +- **`stvrxl` is the LRU-hint variant.** + +## Related Instructions + +- [`stvlx`](stvlx.md), [`stvlx128`](stvlx.md) — store-left partner. +- [`stvrxl`](stvrxl.md), [`stvrxl128`](stvrxl.md) — LRU-hint variants. +- [`stvx`](stvx.md), [`stvx128`](stvx.md) — aligned vector store. +- [`lvrx`](lvrx.md), [`lvlx`](lvlx.md) — symmetric unaligned loads. + +## IBM Reference + +- [AIX 7.3 — `stvrx` (Store Vector Right Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvrx-store-vector-right-indexed-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for VMX128 unaligned stores. diff --git a/migration/project-root/ppc-manual/memory/stvrxl.md b/migration/project-root/ppc-manual/memory/stvrxl.md new file mode 100644 index 0000000..f65d901 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvrxl.md @@ -0,0 +1,192 @@ +# `stvrxl` — Store Vector Right Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00074e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvrxl` | `stvrxl` | — | Store Vector Right Indexed LRU | +| `stvrxl128` | `stvrxl128` | — | Store Vector Right Indexed LRU 128 | + +## Syntax + +```asm +stvrxl [VS], [RA0], [RB] +stvrxl128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvrxl` — form `X` + +- **Opcode word:** `0x7c00074e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `935` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvrxl128` — form `VX128_1` + +- **Opcode word:** `0x10000743` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1859` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvrxl: read; stvrxl128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvrxl: read; stvrxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvrxl: read; stvrxl128: read | Source GPR. | + +## Register Effects + +### `stvrxl` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvrxl128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvrxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvrxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:296`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L296) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:78`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L78) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:848`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L848) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3134-3150`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3134-L3150) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvrx | PpcOpcode::stvrxl => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-514: stvrx/stvrxl were missing invalidate_for_write. + // store_vector_right writes [ea & !0xF, ea-1] (up to 15 bytes, all within a single + // 16-byte-aligned block). Two-call form is kept for defensive correctness. + // stvrx at shift==0 is a no-op; the guard fires unconditionally (cheap). + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_right(mem, ea, ctx.vr[instr.rs()]); + ctx.pc += 4; + } +``` +
+ +**`stvrxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvrxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:299`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L299) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:78`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L78) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:427`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L427) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3151-3164`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3151-L3164) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvrx128 | PpcOpcode::stvrxl128 => { + let ea = ea_indexed(ctx, instr); + // PPCBUG-514: stvrx128/stvrxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { + let first_line = ea & !RESERVATION_MASK; + let last_line = ea.wrapping_add(15) & !RESERVATION_MASK; + t.invalidate_for_write(first_line); + if last_line != first_line { t.invalidate_for_write(last_line); } + } + } + crate::vmx::store_vector_right(mem, ea, ctx.vr[instr.vs128()]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`stvrx`](stvrx.md), with LRU cache hint.** Writes `(EA mod 16)` bytes from the right half of `VS` to the addresses just below `EA & ~0xF`. The `l` suffix marks the touched line as least-recently-used. +- **Hint ignored under emulation.** Xenia's snapshot is shared with `stvrx` (`PpcOpcode::stvrx | PpcOpcode::stvrxl => …`). +- **No alignment masking.** Exact `EA` used. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **Microsoft Xbox 360 specific.** Part of VMX128 / Cell BE. +- **Streaming write use case.** Pair with [`stvlxl`](stvlxl.md) for a one-pass unaligned vector store sequence that signals "do not retain" to the cache. +- **VMX128 sibling (`stvrxl128`).** Identical semantics; alternative operand encoding addressing `v0..v127`. + +## Related Instructions + +- [`stvrx`](stvrx.md), [`stvrx128`](stvrx.md) — non-hint variants. +- [`stvlxl`](stvlxl.md), [`stvlxl128`](stvlxl.md) — store-left LRU partner. +- [`stvxl`](stvxl.md), [`stvxl128`](stvxl.md) — aligned LRU vector store. +- [`lvrxl`](lvrxl.md), [`lvlxl`](lvlxl.md) — symmetric LRU loads. + +## IBM Reference + +- [AIX 7.3 — `stvrxl` (Store Vector Right Indexed Last)](https://www.ibm.com/docs/en/aix/7.3.0?topic=reference-instruction-set) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for cache-hint behaviour. diff --git a/migration/project-root/ppc-manual/memory/stvx.md b/migration/project-root/ppc-manual/memory/stvx.md new file mode 100644 index 0000000..d7fdca1 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvx.md @@ -0,0 +1,177 @@ +# `stvx` — Store Vector Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0001ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvx` | `stvx` | — | Store Vector Indexed | +| `stvx128` | `stvx128` | — | Store Vector Indexed 128 | + +## Syntax + +```asm +stvx [VS], [RA0], [RB] +stvx128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvx` — form `X` + +- **Opcode word:** `0x7c0001ce` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `231` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvx128` — form `VX128_1` + +- **Opcode word:** `0x100001c3` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `451` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvx: read; stvx128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvx: read; stvx128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvx: read; stvx128: read | Source GPR. | + +## Register Effects + +### `stvx` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvx128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- ((RA|0) + (RB)) & ~0xF ; align to 16 +MEM(EA, 16) <- byteswap(VS) +``` + +## C Translation Example + +```c +/* stvx VS, RA, RB — 16-byte aligned store of a vector register */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +uint32_t ea = (uint32_t)((base + r[insn.RB]) & ~(uint64_t)0xF); +mem_write_vec128_be(ea, v[insn.VS]); +``` + +## Implementation References + +**`stvx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:193`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L193) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:79`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L79) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:791`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L791) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1849-1859`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1849-L1859) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + // PPCBUG-511: stvx was missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let bytes = ctx.vr[instr.rs()].as_bytes(); + for i in 0..16 { mem.write_u8(ea + i as u32, bytes[i]); } + ctx.pc += 4; + } +``` +
+ +**`stvx128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvx128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:196`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L196) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:79`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L79) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:417`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L417) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1860-1870`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1860-L1870) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvx128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + // PPCBUG-511: stvx128 was missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let bytes = ctx.vr[instr.vs128()].as_bytes(); + for i in 0..16 { mem.write_u8(ea + i as u32, bytes[i]); } + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +EA <- ((RA|0) + (RB)) & ~0xF ; force 16-byte alignment +MEM(EA, 16) <- byte_order_adjusted(VS) ; lane 0 at EA, lane 15 at EA+15 +``` + +## Special Cases & Edge Conditions + +- **Alignment is forced, not checked.** The low four bits of the effective address are **cleared** before the store — alignment violations silently corrupt adjacent data rather than trap. This differs from scalar `stw` (no alignment enforcement) and from `stvewx` (which stores only one element and keeps the exact EA). +- **Big-endian lane layout.** Vector lane 0 (the most-significant bytes of the 128-bit register) lives at the lowest address; lane 15 at `EA + 15`. On little-endian hosts the whole 16-byte block is byte-swapped at the memory boundary so the PowerPC-visible layout is preserved. Xenia's helper `mem_write_vec128_be` handles this. +- **`RA0` semantics.** When `RA = 0` the base is the literal zero — just like scalar loads/stores. Combined with the alignment mask this lets `stvx VS, 0, RB` store to address `RB & ~0xF`. +- **No update form.** Unlike scalar stores, VMX stores have no `u` variant that post-writes the base. Use [`stvxl`](stvxl.md) for the cache-hint variant (suggests "last" — the line is not expected to be reused soon). +- **VMX128 sibling (`stvx128`).** Identical semantics; the only difference is the operand encoding. VMX128 uses a 7-bit register index split across three non-contiguous bit fields (`VS128l ‖ VS128h`) so it can address `v0..v127` instead of the 32-register Altivec space. All alignment, byte-order and `RA0` rules are the same. +- **Read-before-write.** The 16-byte write occurs as one conceptual store; subsequent loads from the same address observe the complete new value. There's no split-transaction window visible to software. + +## Related Instructions + +- [`lvx`](lvx.md), [`lvx128`](lvx.md) — the load counterparts. +- [`stvxl`](stvxl.md), [`stvxl128`](stvxl.md) — cache-hint "last-use" variants. +- [`stvebx`](stvebx.md) / [`stvehx`](stvehx.md) / [`stvewx`](stvewx.md) — store single element (byte / half / word) at the exact (unaligned) address. +- [`stvlx`](stvlx.md) / [`stvrx`](stvrx.md) — store-left / store-right for unaligned vector I/O. +- [`dcbz`](dcbz.md) — zero a cache line; often paired with `stvx` in block-fill idioms. + +## IBM Reference + +- [AIX 7.3 — `stvx` (Store Vector Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvx-store-vector-indexed-instruction) +- PowerISA Book II (Altivec / VMX). Xbox 360 VMX128 is Microsoft-documented in the XDK; xenia's `ppc-instructions.xml` captures the deltas. diff --git a/migration/project-root/ppc-manual/memory/stvxl.md b/migration/project-root/ppc-manual/memory/stvxl.md new file mode 100644 index 0000000..b3affd0 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stvxl.md @@ -0,0 +1,186 @@ +# `stvxl` — Store Vector Indexed LRU + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c0003ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stvxl` | `stvxl` | — | Store Vector Indexed LRU | +| `stvxl128` | `stvxl128` | — | Store Vector Indexed LRU 128 | + +## Syntax + +```asm +stvxl [VS], [RA0], [RB] +stvxl128 [VS], [RA0], [RB] +``` + +## Encoding + +### `stvxl` — form `X` + +- **Opcode word:** `0x7c0003ce` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `487` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stvxl128` — form `VX128_1` + +- **Opcode word:** `0x100003c3` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `963` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VS` | stvxl: read; stvxl128: read | Source vector register (alias for VD on stores). | +| `RA0` | stvxl: read; stvxl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stvxl: read; stvxl128: read | Source GPR. | + +## Register Effects + +### `stvxl` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stvxl128` + +- **Reads (always):** `VS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stvxl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvxl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:199`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L199) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:79`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L79) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:813`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L813) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1970-1981`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1970-L1981) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvxl | PpcOpcode::stvxl128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + // PPCBUG-511: stvxl/stvxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let vs = if matches!(instr.opcode, PpcOpcode::stvxl128) { instr.vs128() } else { instr.rs() }; + let bytes = ctx.vr[vs].as_bytes(); + for i in 0..16 { mem.write_u8(ea + i as u32, bytes[i]); } + ctx.pc += 4; + } +``` +
+ +**`stvxl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stvxl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:202`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L202) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:79`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L79) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:419`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L419) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1970-1981`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1970-L1981) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stvxl | PpcOpcode::stvxl128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = (ea.wrapping_add(ctx.gpr[instr.rb()]) & !0xF) as u32; + // PPCBUG-511: stvxl/stvxl128 were missing invalidate_for_write. + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + let vs = if matches!(instr.opcode, PpcOpcode::stvxl128) { instr.vs128() } else { instr.rs() }; + let bytes = ctx.vr[vs].as_bytes(); + for i in 0..16 { mem.write_u8(ea + i as u32, bytes[i]); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Same data effect as [`stvx`](stvx.md), with LRU cache hint.** Writes 16 bytes from `VS` at `EA & ~0xF`. The `l` suffix tells the cache the line is least-recently-used — useful for streaming output (e.g. one-pass writes to a render target the producer will not re-read). +- **Hint ignored under emulation.** Xenia's snapshot is shared with the VMX128 variant; it implements only the data side. Hardware uses the hint to choose write-allocate vs. write-streaming behaviour. +- **Alignment is forced, not checked.** Low four bits of `EA` are masked. +- **Big-endian lane layout.** Lane 0 of `VS` lands at the aligned base; lane 15 at base+15. +- **`RA0` semantics.** `RA = 0` selects literal zero. +- **No update form.** +- **VMX128 sibling (`stvxl128`).** Identical semantics; alternative operand encoding addressing `v0..v127` via the split-field 7-bit register index. +- **Common in render-target writes.** Pair with [`dcbz128`](dcbz.md) to allocate-and-zero, then `stvxl` to commit each line of a streaming output buffer; the LRU hint frees cache for the next line. + +## Related Instructions + +- [`stvx`](stvx.md), [`stvx128`](stvx.md) — non-hint variants. +- [`lvxl`](lvxl.md), [`lvxl128`](lvxl.md) — symmetric LRU loads. +- [`stvebx`](stvebx.md), [`stvehx`](stvehx.md), [`stvewx`](stvewx.md) — single-element stores. +- [`stvlx`](stvlx.md), [`stvrx`](stvrx.md) — store-left / store-right unaligned ops. + +## IBM Reference + +- [AIX 7.3 — `stvxl` (Store Vector Indexed Last)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stvxl-store-vector-indexed-last-instruction) +- `PowerISA v2.07B Book I` "Vector Facility"; Microsoft Xbox 360 XDK for `stvxl128`. diff --git a/migration/project-root/ppc-manual/memory/stw.md b/migration/project-root/ppc-manual/memory/stw.md new file mode 100644 index 0000000..56c6d6e --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stw.md @@ -0,0 +1,257 @@ +# `stw` — Store Word + +> **Category:** [Memory](../categories/memory.md) · **Form:** [D](../forms/D.md) · **Opcode:** `0x90000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stw` | `stw` | — | Store Word | +| `stwu` | `stwu` | — | Store Word with Update | +| `stwux` | `stwux` | — | Store Word with Update Indexed | +| `stwx` | `stwx` | — | Store Word Indexed | + +## Syntax + +```asm +stw [RS], [d]([RA0]) +stwu [RS], [d]([RA]) +stwux [RS], [RA], [RB] +stwx [RS], [RA0], [RB] +``` + +## Encoding + +### `stw` — form `D` + +- **Opcode word:** `0x90000000` +- **Primary opcode (bits 0–5):** `36` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stwu` — form `D` + +- **Opcode word:** `0x94000000` +- **Primary opcode (bits 0–5):** `37` +- **Extended opcode:** — +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT` | destination GPR (or RS when storing) | +| 11–15 | `RA` | source GPR (0 ⇒ literal 0 for RA0 forms) | +| 16–31 | `D/SI/UI` | 16-bit signed or unsigned immediate | + +### `stwux` — form `X` + +- **Opcode word:** `0x7c00016e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `183` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `stwx` — form `X` + +- **Opcode word:** `0x7c00012e` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `151` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stw: read; stwu: read; stwux: read; stwx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stw: read; stwx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `d` | stw: read; stwu: read | 16-bit signed displacement (`d`) added to the base address register. | +| `RA` | stwu: read; stwu: write; stwux: read; stwux: write | Source GPR (`r0`–`r31`). | +| `RB` | stwux: read; stwx: read | Source GPR. | + +## Register Effects + +### `stw` + +- **Reads (always):** `RS`, `RA0`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +### `stwu` + +- **Reads (always):** `RS`, `RA`, `d` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stwux` + +- **Reads (always):** `RS`, `RA`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `RA` +- **Writes (conditional):** _none_ + +### `stwx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +EA <- (RA|0) + EXTS(d) +MEM(EA, 4) <- (RS)[32:63] +``` + +## C Translation Example + +```c +/* stw RS, d(RA) */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +uint32_t ea = (uint32_t)(base + (int64_t)(int16_t)insn.D); +mem_write_u32_be(ea, (uint32_t)r[insn.RS]); +``` + +## Implementation References + +**`stw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:507`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L507) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:359`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L359) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1291-1299`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1291-L1299) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stw => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, ctx.gpr[instr.rs()] as u32); + ctx.pc += 4; + } +``` +
+ +**`stwu`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stwu"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:543`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L543) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:360`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L360) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1300-1308`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1300-L1308) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stwu => { + let ea = ctx.gpr[instr.ra()].wrapping_add(instr.d() as i64 as u64) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, ctx.gpr[instr.rs()] as u32); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stwux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stwux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:553`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L553) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:787`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L787) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1318-1326`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1318-L1326) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stwux => { + let ea = ctx.gpr[instr.ra()].wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, ctx.gpr[instr.rs()] as u32); + ctx.gpr[instr.ra()] = ea as u64; + ctx.pc += 4; + } +``` +
+ +**`stwx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stwx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:563`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L563) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:783`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L783) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1309-1317`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1309-L1317) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stwx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, ctx.gpr[instr.rs()] as u32); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Stores low 32 bits of `RS`.** Writes `(RS)[32:63]` — the low word of the 64-bit GPR — at `EA`. The xenia snapshot does `mem.write_u32(ea, ctx.gpr[instr.rs()] as u32)`. The high 32 bits are silently truncated; use [`std`](std.md) to store all 64 bits. +- **Big-endian write.** `RS[32:39]` (the most-significant byte of the low word) lands at `EA`; `RS[56:63]` at `EA+3`. On little-endian hosts the byte-swap happens at the memory boundary. +- **`RA0` (non-update forms).** `RA = 0` in `stw` and `stwx` selects literal zero. Update forms `stwu` / `stwux` invoke `RA = 0` as an invalid form. **The classic frame-allocation idiom** `stwu r1, -framesize(r1)` exploits the update form: it writes the old SP at the new SP and updates `r1` in one instruction. +- **Update-form post-write.** `stwu` / `stwux` write `EA` to `RA` after the store. Order is store-then-update, so the new `RA` value reflects the post-update address (typically the new stack-frame base). +- **No alignment requirement.** Xenon tolerates unaligned word stores. PowerISA permits implementations to raise alignment exceptions on cache-inhibited storage. +- **Cache-line behaviour.** A word store fits inside one Xenon cache line (128 B). Stores that **straddle** a line boundary touch two lines; keep words 4-byte aligned for best performance. +- **Common as pointer / ABI store.** Standard store for any `int32_t`/`uint32_t`/pointer field (Xbox 360 user pointers are 32-bit) and the workhorse of stack-frame setup. + +## Related Instructions + +- [`stb`](stb.md), [`sth`](sth.md), [`std`](std.md) — narrower / wider integer stores. +- [`stwbrx`](stwbrx.md) — byte-reversed word store. +- [`stwcx`](stwcx.md) — store-conditional word (the reservation pair end). +- [`lwz`](lwz.md), [`lwa`](lwa.md), [`lwarx`](lwarx.md) — corresponding loads. +- [`stmw`](stmw.md), [`stswi`](stswi.md), [`stswx`](stswx.md) — bulk stores. +- [`stfs`](stfs.md), [`stfiwx`](stfiwx.md) — FP-side equivalents. + +## IBM Reference + +- [AIX 7.3 — `stw` (Store Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stw-store-word-instruction) +- [AIX 7.3 — `stwu` / `stwx` / `stwux`](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stwu-store-word-update-instruction) diff --git a/migration/project-root/ppc-manual/memory/stwbrx.md b/migration/project-root/ppc-manual/memory/stwbrx.md new file mode 100644 index 0000000..bb5ab11 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stwbrx.md @@ -0,0 +1,131 @@ +# `stwbrx` — Store Word Byte-Reverse Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00052c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stwbrx` | `stwbrx` | — | Store Word Byte-Reverse Indexed | + +## Syntax + +```asm +stwbrx [RS], [RA0], [RB] +``` + +## Encoding + +### `stwbrx` — form `X` + +- **Opcode word:** `0x7c00052c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `662` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stwbrx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stwbrx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stwbrx: read | Source GPR. | + +## Register Effects + +### `stwbrx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** _none_ +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stwbrx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stwbrx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:679`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L679) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:831`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L831) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1813-1821`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1813-L1821) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stwbrx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + if let Some(t) = ctx.reservation_table.as_ref().filter(|t| t.is_enabled()) { + if t.has_active_reservers() { t.invalidate_for_write(ea); } + } + mem.write_u32(ea, (ctx.gpr[instr.rs()] as u32).swap_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Writes little-endian word.** Takes the low 32 bits of `RS`, reverses the four bytes, writes them at `EA`. Byte at `EA` is `RS[56:63]` (low byte); byte at `EA+3` is `RS[32:39]` (high byte). The xenia snapshot does `(ctx.gpr[instr.rs()] as u32).swap_bytes()`. +- **Used to emit little-endian payloads.** Symmetric counterpart of [`lwbrx`](lwbrx.md). Common when writing PC-side file formats, network packets, GPU command buffers in little-endian layout, etc. +- **High bits of `RS` ignored.** Stores only the low 32 bits; the upper half of the 64-bit GPR is not consulted. +- **X-form only — no D-form, no update form.** Only the indexed form exists. `EA = (RA|0) + RB`. +- **`RA0` semantics.** When `RA = 0`, base is literal zero; `stwbrx RS, 0, RB` writes at exact `RB`. +- **Alignment.** Hardware tolerates unaligned 4-byte writes; cache-inhibited storage may raise alignment exceptions on real hardware. +- **No CR / FPSCR effects.** + +## Related Instructions + +- [`lwbrx`](lwbrx.md) — load-word byte-reverse (matching load). +- [`sthbrx`](sthbrx.md), [`stdbrx`](stdbrx.md) — narrower / wider byte-reverse stores. +- [`stw`](stw.md), [`stwx`](stw.md) — non-reversing word stores. + +## IBM Reference + +- [AIX 7.3 — `stwbrx` (Store Word Byte-Reverse Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stwbrx-store-word-byte-reverse-indexed-instruction) +- `PowerISA v2.07B Book II` § "Byte-Reverse Storage Access". diff --git a/migration/project-root/ppc-manual/memory/stwcx.md b/migration/project-root/ppc-manual/memory/stwcx.md new file mode 100644 index 0000000..e287586 --- /dev/null +++ b/migration/project-root/ppc-manual/memory/stwcx.md @@ -0,0 +1,190 @@ +# `stwcx` — Store Word Conditional Indexed + +> **Category:** [Memory](../categories/memory.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00012d` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `stwcx` | `stwcx` | — | Store Word Conditional Indexed | + +## Syntax + +```asm +stwcx. [RS], [RA0], [RB] +``` + +## Encoding + +### `stwcx` — form `X` + +- **Opcode word:** `0x7c00012d` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `150` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RS` | stwcx: read | Source GPR (alias for RD in some stores). | +| `RA0` | stwcx: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | stwcx: read | Source GPR. | +| `CR` | stwcx: write | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `stwcx` + +- **Reads (always):** `RS`, `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `CR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `stwcx`: **CR0** ← signed-compare(result, 0) with `SO ← XER[SO]` (always). + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`stwcx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="stwcx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_memory.cc:868`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_memory.cc#L868) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:81`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L81) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:782`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L782) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1225-1288`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1225-L1288) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::stwcx => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]) as u32; + let line = ea & !RESERVATION_MASK; + let table_route = ctx + .reservation_table + .as_ref() + .filter(|t| t.is_enabled()) + .cloned(); + // PPCBUG-151: stwcx. requires a word (lwarx) reservation; + // a doubleword (ldarx) reservation must not commit here. + let width_ok = ctx.reservation_width == 4; + let success = if let Some(t) = &table_route { + // Table-routed: success iff the slot still holds our + // reservation AND the per-ctx flag agrees (the per-ctx + // flag would be cleared by an intervening write or + // context switch). + ctx.has_reservation + && width_ok + && ctx.reserved_line == line + && t.try_commit(ea, ctx.reserved_generation, ctx.hw_id) + } else { + // Legacy per-ctx path (M2 default / lockstep). + // PPCBUG-108: fires on non-primary HW slots under misconfig — + // if the table is disabled while workers are active, slots + // 1..N will trip this assert, surfacing the misconfiguration + // early in debug builds. Note: hw_id==0 (primary slot) taking + // this path while other slots run in parallel would NOT be + // caught; that case requires the table to be enabled instead. + debug_assert!( + ctx.hw_id == 0, + "PPCBUG-108: legacy per-ctx stwcx. on non-primary HW slot \ + (hw_id={}) — ReservationTable must be enabled under --parallel", + ctx.hw_id + ); + ctx.has_reservation && width_ok && ctx.reserved_line == line + }; + if success { + mem.write_u32(ea, ctx.gpr[instr.rs()] as u32); + ctx.cr[0] = crate::context::CrField { + lt: false, + gt: false, + eq: true, + so: ctx.xer_so != 0, + }; + } else { + ctx.cr[0] = crate::context::CrField { + lt: false, + gt: false, + eq: false, + so: ctx.xer_so != 0, + }; + // Failed stwcx: if we held the reservation in the table + // (someone else displaced our gen), release it from the + // counter so `has_active_reservers` returns to zero + // when no real reserver exists. + if let Some(t) = &table_route { + t.release(ea, ctx.reserved_generation, ctx.hw_id); + } + } + ctx.has_reservation = false; + ctx.reservation_width = 0; // PPCBUG-151: always clear on exit + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Always sets `Rc=1` (the trailing dot).** The mnemonic is `stwcx.` — there is no non-Rc variant. CR0 is updated unconditionally to communicate success/failure. `EQ=1` means the conditional store succeeded; `EQ=0` means it failed (the prior reservation was lost; no memory write). +- **Reservation check.** Xenia's snapshot tests `has_reservation && reserved_addr == ea`. On match it performs `mem.write_u32` (low 32 bits of `RS`, big-endian), sets `EQ=1`. On mismatch, no memory write and `EQ=0`. In both cases the reservation is cleared, so a retry must begin with a fresh [`lwarx`](lwarx.md). +- **Hardware granule.** PowerISA defines reservation by aligned word; Xenon implementations widen this to one 128-byte cache line. A store by another agent anywhere in the line clears the reservation. Xenia's per-address check is more permissive than hardware. +- **Alignment requirement.** `EA` must be 4-byte aligned. Unaligned `stwcx.` raises an alignment exception on real hardware; xenia does not check. +- **`RA0` semantics.** When `RA = 0`, base is literal zero — `stwcx. RS, 0, RB` writes at exact `RB`. +- **CR0[SO] reflects XER[SO].** Like all CR-updating ops, CR0[SO] is copied from `XER[SO]` rather than computed. +- **Spurious failures permitted.** Hardware may report failure even when no actual conflict occurred (e.g. on context switch). Application code treats failure as a normal retry condition. +- **Pair atomically with [`lwarx`](lwarx.md).** Don't interleave loads/stores between the pair; an [`lwsync`](sync.md) inside the loop body is common. +- **Stores low 32 bits of `RS`.** The high 32 bits of the source GPR are ignored. + +## Related Instructions + +- [`lwarx`](lwarx.md) — load-and-reserve word (the matching load). +- [`stdcx`](stdcx.md) / [`ldarx`](ldarx.md) — 64-bit reservation pair. +- [`stw`](stw.md), [`stwx`](stw.md) — non-conditional word stores. +- [`sync`](sync.md), [`lwsync`](sync.md), [`isync`](isync.md) — barriers used around reservation pairs. + +## IBM Reference + +- [AIX 7.3 — `stwcx.` (Store Word Conditional Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-stwcx-store-word-conditional-indexed-instruction) +- `PowerISA v2.07B Book II` § "Atomic Update Primitives" for canonical reservation semantics and granule rules. diff --git a/migration/project-root/ppc-manual/vmx/lvsl.md b/migration/project-root/ppc-manual/vmx/lvsl.md new file mode 100644 index 0000000..3f5002e --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/lvsl.md @@ -0,0 +1,184 @@ +# `lvsl` — Load Vector for Shift Left Indexed + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00000c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvsl` | `lvsl` | — | Load Vector for Shift Left Indexed | +| `lvsl128` | `lvsl128` | — | Load Vector for Shift Left Indexed 128 | + +## Syntax + +```asm +lvsl [VD], [RA0], [RB] +lvsl128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvsl` — form `X` + +- **Opcode word:** `0x7c00000c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `6` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvsl128` — form `VX128_1` + +- **Opcode word:** `0x10000003` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `3` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvsl: read; lvsl128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvsl: read; lvsl128: read | Source GPR. | +| `VD` | lvsl: write; lvsl128: write | Destination vector register. | + +## Register Effects + +### `lvsl` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvsl128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +addr_lo <- ((RA|0) + (RB))[60:63] +for i in 0..15: VD[i] <- addr_lo + i +``` + +## C Translation Example + +```c +/* lvsl VD, RA, RB — load-shift-left permute control */ +uint64_t base = (insn.RA == 0) ? 0 : r[insn.RA]; +uint8_t sh = (uint8_t)((base + r[insn.RB]) & 0xF); +for (int i = 0; i < 16; ++i) v[insn.VD].b[i] = sh + i; +``` + +## Implementation References + +**`lvsl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:111`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L111) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:751`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L751) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2520-2529`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2520-L2529) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvsl | PpcOpcode::lvsl128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]); + let sh = (ea & 0xF) as u8; + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = sh + i as u8; } + let vd = if matches!(instr.opcode, PpcOpcode::lvsl128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`lvsl128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsl128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:114`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L114) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:412`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L412) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2520-2529`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2520-L2529) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvsl | PpcOpcode::lvsl128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]); + let sh = (ea & 0xF) as u8; + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = sh + i as u8; } + let vd = if matches!(instr.opcode, PpcOpcode::lvsl128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +; lvsl VD, RA, RB — load vector for shift left (generates a permute mask) +EA <- (RA|0) + (RB) ; full 64-bit EA; only the low 4 bits matter +sh <- EA[60:63] ; bits 60..63 of EA (the misalignment) +for i in 0..15: + VD[i] <- sh + i ; bytes 0..15 of VD = {sh, sh+1, …, sh+15} +``` + +## Special Cases & Edge Conditions + +- **No memory is actually read.** Despite the name, `lvsl` / `lvsr` do **not** touch memory. They consume the effective address only to extract the low four bits (the alignment offset) and materialise a 16-byte permute control vector in `VD`. They are pure "address → permute-mask" converters. +- **Big-endian byte indexing.** `VD[0]` is the most-significant byte of the 128-bit register (lane 0). When `EA & 0xF == 0` the output is `{0, 1, 2, …, 15}`, i.e. the identity permute. When `EA & 0xF == 3` the output is `{3, 4, …, 18}` — modulo nothing, the values *do* exceed 15. That's intentional: fed into [`vperm`](vperm.md) (`vperm VD, VA, VB, VC`), byte selectors 0..15 index into `VA` and 16..31 index into `VB`. A stream of `lvsl` + two aligned `lvx` loads of consecutive 16-byte blocks + `vperm` reconstructs the unaligned 16-byte vector at `EA`. +- **Pair with [`lvsr`](lvsr.md) for the opposite direction.** `lvsl` shifts "left" (toward the low index / high address byte); `lvsr` shifts "right". Which one to pick depends on which aligned block you're starting from — see the idiom below. +- **Standard unaligned-load idiom.** + ``` + lvx vAL, r0, rA ; aligned block at EA & ~0xF + lvx vAH, r0, rA + 16 ; next aligned block + lvsl vC, r0, rA ; permute mask from misalignment + vperm vD, vAL, vAH, vC ; the unaligned 16 bytes starting at EA + ``` +- **`RA0` semantics.** When `RA = 0` the base is the literal zero, so `lvsl vD, 0, rB` derives the mask from `rB & 0xF`. +- **VMX128 sibling (`lvsl128`).** Same semantics; only the `VD` register is encoded with the 7-bit VMX128 register-fusion (`VD128l ‖ VD128h`) so `vD` may be `v0..v127`. +- **No flags, no side effects** beyond writing `VD`. Trivial to move and schedule. + +## Related Instructions + +- [`lvsr`](lvsr.md) — the mirror: `VD[i] = 16 − sh + i`. +- [`vperm`](vperm.md) — consumes the mask to perform arbitrary byte-level permutation across two vectors. +- [`lvx`](lvx.md), [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — the actual memory loads used alongside the mask. +- [`vsldoi`](vsldoi.md) — static-offset shift-double; when the shift is compile-time known, this is cheaper than the `lvsl`/`vperm` pair. + +## IBM Reference + +- [AIX 7.3 — `lvsl` (Load Vector for Shift Left Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvsl-load-vector-shift-left-indexed) +- [IBM AltiVec Technology Programmer's Interface Manual — unaligned-load idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/lvsr.md b/migration/project-root/ppc-manual/vmx/lvsr.md new file mode 100644 index 0000000..859ff7b --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/lvsr.md @@ -0,0 +1,181 @@ +# `lvsr` — Load Vector for Shift Right Indexed + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [X](../forms/X.md) · **Opcode:** `0x7c00004c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `lvsr` | `lvsr` | — | Load Vector for Shift Right Indexed | +| `lvsr128` | `lvsr128` | — | Load Vector for Shift Right Indexed 128 | + +## Syntax + +```asm +lvsr [VD], [RA0], [RB] +lvsr128 [VD], [RA0], [RB] +``` + +## Encoding + +### `lvsr` — form `X` + +- **Opcode word:** `0x7c00004c` +- **Primary opcode (bits 0–5):** `31` +- **Extended opcode:** `38` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode | +| 6–10 | `RT/FRT/VRT` | destination | +| 11–15 | `RA/FRA/VRA` | source A | +| 16–20 | `RB/FRB/VRB` | source B | +| 21–30 | `XO` | extended opcode (10 bits) | +| 31 | `Rc` | record-form flag | + +### `lvsr128` — form `VX128_1` + +- **Opcode word:** `0x10000043` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `67` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `RA` | address register | +| 16–20 | `RB` | offset register | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `—` | reserved | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `RA0` | lvsr: read; lvsr128: read | Source GPR; when the encoded register number is 0 the operand is the literal 64-bit zero, **not** `r0`. | +| `RB` | lvsr: read; lvsr128: read | Source GPR. | +| `VD` | lvsr: write; lvsr128: write | Destination vector register. | + +## Register Effects + +### `lvsr` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `lvsr128` + +- **Reads (always):** `RA0`, `RB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +addr_lo <- ((RA|0) + (RB))[60:63] +for i in 0..15: VD[i] <- 16 − addr_lo + i +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`lvsr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:126`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L126) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:762`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L762) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2530-2539`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2530-L2539) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvsr | PpcOpcode::lvsr128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]); + let sh = (ea & 0xF) as u8; + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = (16 - sh) + i as u8; } + let vd = if matches!(instr.opcode, PpcOpcode::lvsr128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`lvsr128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="lvsr128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:129`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L129) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:46`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L46) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:413`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L413) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2530-2539`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2530-L2539) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::lvsr | PpcOpcode::lvsr128 => { + let ea = if instr.ra() == 0 { 0u64 } else { ctx.gpr[instr.ra()] }; + let ea = ea.wrapping_add(ctx.gpr[instr.rb()]); + let sh = (ea & 0xF) as u8; + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = (16 - sh) + i as u8; } + let vd = if matches!(instr.opcode, PpcOpcode::lvsr128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **No memory access.** Like [`lvsl`](lvsl.md), `lvsr` does not touch memory: the effective address is consumed solely to extract the low four bits, which then drive the synthesised permute mask in `VD`. +- **Mirror of `lvsl`.** Where `lvsl` produces `{sh, sh+1, …, sh+15}`, `lvsr` produces `{16−sh, 17−sh, …, 31−sh}`. When `EA & 0xF == 0` the output is `{16, 17, …, 31}` — the identity permute that selects all of `VB` (in the `vperm VD, VA, VB, VC` orientation). When `EA & 0xF == 3` the output is `{13, 14, …, 28}`, splitting the `vperm` between the high three bytes of `VA` and the low thirteen of `VB`. +- **Big-endian byte indexing.** `VD[0]` is the most-significant byte (the byte at the lowest address after a `stvx`). +- **Right-shift unaligned-load idiom.** Pair with two aligned `lvx` and a `vperm` when the source data is laid out so the wanted vector starts in the *second* aligned block: + ``` + lvx vAL, r0, rA ; aligned block at EA & ~0xF + lvx vAH, r0, rA + 16 ; next aligned block + lvsr vC, r0, rA ; right-shift permute mask + vperm vD, vAH, vAL, vC ; note: vAH then vAL — opposite of lvsl + ``` + The argument flip versus the `lvsl` idiom is the whole reason both masks exist. +- **`RA0` semantics.** When `RA = 0` the base is the literal zero, so `lvsr vD, 0, rB` derives the mask from `rB & 0xF`. +- **Selectors >15 are intentional.** Inside `vperm`, byte selectors with bit 4 set (i.e. `>= 16`) index into the second source vector. `lvsr` deliberately produces values up to `31`, since only the low five bits are honoured by `vperm`. +- **VMX128 sibling (`lvsr128`).** Identical semantics; the extended `VD128l ‖ VD128h` encoding lets `vD` reach `v0..v127`. +- **No flags, no exceptions, trivially reorderable.** + +## Related Instructions + +- [`lvsl`](lvsl.md) — the mirror: `VD[i] = sh + i`. +- [`vperm`](vperm.md) — consumes the mask to perform arbitrary byte-level permutation across two vectors. +- [`lvx`](lvx.md), [`lvlx`](lvlx.md), [`lvrx`](lvrx.md) — the actual memory loads that supply the two aligned halves. +- [`vsldoi`](vsldoi.md) — when the misalignment is a compile-time constant, the static-offset shift is cheaper than the `lvsr`/`vperm` pair. + +## IBM Reference + +- [AIX 7.3 — `lvsr` (Load Vector for Shift Right Indexed)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-lvsr-load-vector-shift-right-indexed-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual — unaligned-load idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddcuw.md b/migration/project-root/ppc-manual/vmx/vaddcuw.md new file mode 100644 index 0000000..37582b3 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddcuw.md @@ -0,0 +1,133 @@ +# `vaddcuw` — Vector Add Carryout Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000180` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddcuw` | `vaddcuw` | — | Vector Add Carryout Unsigned Word | + +## Syntax + +```asm +vaddcuw [VD], [VA], [VB] +``` + +## Encoding + +### `vaddcuw` — form `VX` + +- **Opcode word:** `0x10000180` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `384` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddcuw: read | Source A vector register. | +| `VB` | vaddcuw: read | Source B vector register. | +| `VD` | vaddcuw: write | Destination vector register. | + +## Register Effects + +### `vaddcuw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddcuw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddcuw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:325`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L325) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:466`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L466) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3380-3390`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3380-L3390) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddcuw => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let (_, c) = a[i].overflowing_add(b[i]); + r[i] = if c { 1 } else { 0 }; + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Carry-out only — the sum is discarded.** Each of the four 32-bit lanes computes `1` if `VA[i] + VB[i]` overflows in unsigned arithmetic, else `0`. The actual modulo sum lives wherever a paired [`vadduwm`](vadduwm.md) is scheduled. +- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. Each lane is 32-bit unsigned; output values are exactly `0` or `1`, padded to 32 bits. +- **Builds wide-integer adds.** Pair `vaddcuw` with [`vadduwm`](vadduwm.md) and a left-byte shift to chain four 32-bit adds into a single 128-bit add — the canonical Altivec implementation of `__uint128_t` arithmetic. To carry into the *next* lane you typically apply [`vsldoi`](vsldoi.md) by 4 bytes and a [`vadduwm`](vadduwm.md). +- **Unsigned only.** There is no `vaddcsw` (signed-carry) — the operation is intrinsically unsigned because "carry" is undefined for signed two's-complement. +- **No `VSCR[SAT]` update.** Modulo carry is always representable; nothing saturates. XER is also untouched (Altivec never updates `XER[CA]`). +- **No VMX128 sibling.** Only the 32-register VX form exists. +- **Aliasing legal.** `vaddcuw v3, v3, v4` works as expected. + +## Related Instructions + +- [`vadduwm`](vadduwm.md) — the modulo sum that `vaddcuw` complements; together they form a full 32-bit-with-carry add. +- [`vsubcuw`](vsubcuw.md) — the matching borrow-out (returns `1` when *no* borrow occurred — i.e. when `VA[i] >= VB[i]`). +- [`vsldoi`](vsldoi.md) — used to align the carry vector for the next lane during multi-precision chains. +- [`vaddubm`](vaddubm.md), [`vadduhm`](vadduhm.md) — modulo siblings at narrower lane widths (no carrying-instruction variant exists for 8- or 16-bit lanes). + +## IBM Reference + +- [AIX 7.3 — `vaddcuw` (Vector Add Carry-Out Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddcuw-vector-add-carryout-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — multi-precision arithmetic idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddfp.md b/migration/project-root/ppc-manual/vmx/vaddfp.md new file mode 100644 index 0000000..5059af5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddfp.md @@ -0,0 +1,189 @@ +# `vaddfp` — Vector Add Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddfp` | `vaddfp` | — | Vector Add Floating Point | +| `vaddfp128` | `vaddfp128` | — | Vector128 Add Floating Point | + +## Syntax + +```asm +vaddfp [VD], [VA], [VB] +vaddfp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vaddfp` — form `VX` + +- **Opcode word:** `0x1000000a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `10` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vaddfp128` — form `VX128` + +- **Opcode word:** `0x14000010` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `16` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddfp: read; vaddfp128: read | Source A vector register. | +| `VB` | vaddfp: read; vaddfp128: read | Source B vector register. | +| `VD` | vaddfp: write; vaddfp128: write | Destination vector register. | + +## Register Effects + +### `vaddfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vaddfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +for each 32-bit float lane i in 0..3: + VD[i] <- VA[i] + VB[i] +``` + +## C Translation Example + +```c +/* vaddfp VD, VA, VB — lane-wise float add */ +for (int i = 0; i < 4; ++i) v[insn.VD].f[i] = v[insn.VA].f[i] + v[insn.VB].f[i]; +``` + +## Implementation References + +**`vaddfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:341`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L341) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:438`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L438) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1984-1998`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1984-L1998) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddfp => { + // PPCBUG-435: VSCR.NJ=1 (Xbox 360 always boots with this set) requires + // flush-to-zero on subnormal inputs and outputs. Canary VMX float + // arithmetic flushes denormals unconditionally. + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + r[i] = vmx::flush_denorm(ai + bi); + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vaddfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:344`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L344) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:610`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L610) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:1999-2011`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L1999-L2011) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddfp128 => { + // PPCBUG-435: same as vaddfp. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + r[i] = vmx::flush_denorm(ai + bi); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Extended Pseudocode + +``` +; Four independent lane-wise IEEE-754 single-precision adds +for i in 0..3: + VD[i] <- VA[i] + VB[i] ; binary32, rounded to nearest + +; No FPSCR update (VMX uses VSCR, which only has NJ / SAT — and vaddfp doesn't saturate) +``` + +## Special Cases & Edge Conditions + +- **Lane indexing is big-endian.** Lane 0 is the **most significant** 4 bytes of the 128-bit register (the one that appears at the lowest byte offset after a `stvx`). Xenia's `Vec128::as_f32x4()` already reads lanes in PPC order on x86-64. When writing C that manipulates individual lanes, index `v.f[0]` as "the byte 0..3" of the big-endian layout. +- **Flush-denormals ("NJ") mode.** Altivec is independent of FPSCR — it has its own 2-bit VSCR (`NJ` for non-Java mode + `SAT` sticky-saturation). VMX float operations honour `VSCR[NJ]`: when set (the Xenon boot default), denormal inputs and outputs are flushed to zero. This is **opposite** to the scalar FPU, which has its own non-IEEE bit. Xenia sets `NJ = 1` at context creation ([`context.rs`](../../xenia-rs/crates/xenia-cpu/src/context.rs)). +- **No exception, no trap.** Altivec floats never raise exceptions. NaN inputs produce NaN outputs; `±∞ − ±∞` yields a NaN; there is no VXISI-style status bit. `VSCR[SAT]` is **not** touched by `vaddfp` (it saturates integer ops, not floats). +- **Four independent lanes.** Each lane's operation is unaffected by the others. Aliasing between `VA`, `VB`, and `VD` is legal and common (`vaddfp v3, v3, v4`). +- **VMX128 sibling (`vaddfp128`).** Semantics identical; only the register encoding differs. VMX128 uses a 7-bit operand ID per source (and destination) built from two or three non-contiguous bit fields — see [`categories/vmx128.md`](../categories/vmx128.md). Any bit pattern encodable as a 32-register VX-form is also encodable as a VMX128 form, so compilers picked the more compact form that reached the needed register range. +- **On x86-64 hosts.** A natural compilation uses `_mm_add_ps` or AVX `vaddps`. These preserve lane indexing because PPC lane 0 maps to x86 lane 3 only if you treat the 128-bit value as "big-endian in memory" — i.e. byte-swap on load/store. With xenia's `_be` memory helpers, `_mm_add_ps` gives the right per-lane result. + +## Related Instructions + +- [`vsubfp`](vsubfp.md) — lane-wise float subtract. +- [`vmaddfp`](vmaddfp.md) — lane-wise `(VA × VC) + VB` (fused multiply-add with single rounding). +- [`vnmsubfp`](vnmsubfp.md) — `−((VA × VC) − VB)`. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — IEEE-754-aware max/min (NaN propagation). +- [`vcmpeqfp`](vcmpeqfp.md), [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md), [`vcmpbfp`](vcmpbfp.md) — compares producing per-lane all-ones / all-zero masks. +- [`vrfin`](vrfin.md), [`vrfim`](vrfim.md), [`vrfip`](vrfip.md), [`vrfiz`](vrfiz.md) — round to integer (to-nearest / down / up / toward-zero). +- [`vmulfp`](vmulfp.md) — xenia's helper; not a native Altivec op, included for convenience. Hardware games use `vmaddfp v, va, vc, v0_zero` instead. + +## IBM Reference + +- [AIX 7.3 — `vaddfp` (Vector Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddfp-vector-add-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddsbs.md b/migration/project-root/ppc-manual/vmx/vaddsbs.md new file mode 100644 index 0000000..c6040ed --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddsbs.md @@ -0,0 +1,136 @@ +# `vaddsbs` — Vector Add Signed Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000300` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddsbs` | `vaddsbs` | — | Vector Add Signed Byte Saturate | + +## Syntax + +```asm +vaddsbs [VD], [VA], [VB] +``` + +## Encoding + +### `vaddsbs` — form `VX` + +- **Opcode word:** `0x10000300` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `768` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddsbs: read | Source A vector register. | +| `VB` | vaddsbs: read | Source B vector register. | +| `VD` | vaddsbs: write | Destination vector register. | +| `VSCR` | vaddsbs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vaddsbs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vaddsbs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddsbs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddsbs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:348`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L348) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:498`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L498) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3258-3269`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3258-L3269) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddsbs => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i8; 16]; let mut sat = false; + for i in 0..16 { + let (v, s) = crate::vmx::sat_add_i8(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sixteen signed-byte lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], -128, +127)` for `i = 0..15`, with both inputs interpreted as signed `int8`. Lane 0 is the most-significant byte (the byte at the lowest address after `stvx`). +- **`VSCR[SAT]` is sticky-set** when *any* lane saturates — either positively (overflow above `+127`) or negatively (underflow below `-128`). The SAT bit is never cleared by this op; software must use [`mtvscr`](mtvscr.md) to clear it. Xenia routes the OR of per-lane saturation flags into `ctx.set_vscr_sat(true)` exactly when at least one lane clamped (see `crate::vmx::sat_add_i8` in [`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **Compare with the modulo sibling.** [`vaddubm`](vaddubm.md) is bit-pattern-identical to a hypothetical `vaddsbm` and silently wraps without touching `VSCR[SAT]`. Use `vaddsbs` whenever clipping is desired and you need the sticky overflow flag. +- **Asymmetric clamp.** `+127 + 1 = +127`; `-128 + (-1) = -128`. Tests that look for "any saturation" should mask both saturation directions. +- **No XER side effects.** Altivec never updates `XER[CA]` / `XER[OV]`. The only status bit affected is `VSCR[SAT]`. +- **Aliasing legal.** `vaddsbs v3, v3, v4` is the standard accumulate idiom for a clamping sum. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vaddubs`](vaddubs.md) — same width, **unsigned** saturating add (clamps to `0..255`). +- [`vaddubm`](vaddubm.md) — same width, modulo (non-saturating) add; sign-agnostic. +- [`vaddshs`](vaddshs.md), [`vaddsws`](vaddsws.md) — signed saturating add at half / word width. +- [`vsubsbs`](vsubsbs.md) — the matching signed saturating subtract. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit observed here. + +## IBM Reference + +- [AIX 7.3 — `vaddsbs` (Vector Add Signed Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddsbs-vector-add-signed-byte-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddshs.md b/migration/project-root/ppc-manual/vmx/vaddshs.md new file mode 100644 index 0000000..d42cc41 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddshs.md @@ -0,0 +1,137 @@ +# `vaddshs` — Vector Add Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000340` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddshs` | `vaddshs` | — | Vector Add Signed Half Word Saturate | + +## Syntax + +```asm +vaddshs [VD], [VA], [VB] +``` + +## Encoding + +### `vaddshs` — form `VX` + +- **Opcode word:** `0x10000340` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `832` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddshs: read | Source A vector register. | +| `VB` | vaddshs: read | Source B vector register. | +| `VD` | vaddshs: write | Destination vector register. | +| `VSCR` | vaddshs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vaddshs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vaddshs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddshs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddshs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:356`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L356) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:505`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L505) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3306-3317`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3306-L3317) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddshs => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..8 { + let (v, s) = crate::vmx::sat_add_i16(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Eight signed-half lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], -32768, +32767)` for `i = 0..7`, with both inputs interpreted as signed `int16`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **`VSCR[SAT]` is sticky-set** if *any* lane clamps. Once set, it stays set until explicit clear via [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) which returns the per-lane saturation flag; the OR is written back via `ctx.set_vscr_sat(true)`. +- **The modulo counterpart is `vadduhm`.** Modulo add for signed and unsigned halves is bit-identical, so [`vadduhm`](vadduhm.md) covers both when wraparound is wanted; switch to `vaddshs` only when clipping with sign awareness is desired. +- **Asymmetric clamp.** `+32767 + 1 = +32767`; `-32768 + (-1) = -32768`. +- **Common 16-bit DSP idiom.** Audio mixing and fixed-point colour blending lean heavily on `vaddshs` to combine signed Q15 / Q1.15 quantities without wraparound artefacts. +- **No XER side effects, no NJ involvement** (this is an integer op). +- **No VMX128 sibling.** + +## Related Instructions + +- [`vadduhs`](vadduhs.md) — same width, unsigned saturating add (clamps to `0..0xFFFF`). +- [`vadduhm`](vadduhm.md) — same width, modulo add; sign-agnostic. +- [`vaddsbs`](vaddsbs.md), [`vaddsws`](vaddsws.md) — signed saturating add at byte / word width. +- [`vsubshs`](vsubshs.md) — the matching signed saturating subtract. +- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — signed-half multiply-add with saturation, common for fixed-point DSP. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the `VSCR[SAT]` bit affected here. + +## IBM Reference + +- [AIX 7.3 — `vaddshs` (Vector Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddshs-vector-add-signed-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddsws.md b/migration/project-root/ppc-manual/vmx/vaddsws.md new file mode 100644 index 0000000..24ea801 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddsws.md @@ -0,0 +1,138 @@ +# `vaddsws` — Vector Add Signed Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000380` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddsws` | `vaddsws` | — | Vector Add Signed Word Saturate | + +## Syntax + +```asm +vaddsws [VD], [VA], [VB] +``` + +## Encoding + +### `vaddsws` — form `VX` + +- **Opcode word:** `0x10000380` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `896` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddsws: read | Source A vector register. | +| `VB` | vaddsws: read | Source B vector register. | +| `VD` | vaddsws: write | Destination vector register. | +| `VSCR` | vaddsws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vaddsws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vaddsws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddsws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddsws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:364`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L364) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:89`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L89) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:512`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L512) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3354-3365`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3354-L3365) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddsws => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::sat_add_i32(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Four signed-word lanes, saturating.** Each `VD[i] = clamp(VA[i] + VB[i], INT32_MIN, INT32_MAX)` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **`VSCR[SAT]` is sticky-set** if any lane clamps. Xenia tracks this through `crate::vmx::sat_add_i32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) and ORs the flag into the architectural `VSCR[SAT]`. +- **No multi-precision carry.** Unlike [`vaddcuw`](vaddcuw.md), `vaddsws` does not expose a per-lane carry/borrow — a saturated lane simply clips; it does not overflow into the adjacent lane. +- **Asymmetric clamp.** `INT32_MAX + 1 = INT32_MAX`; `INT32_MIN + (-1) = INT32_MIN`. +- **The modulo sibling is `vadduwm`.** Modulo add for signed and unsigned words is bit-identical; switch to `vaddsws` only when clipping with sign awareness is desired. +- **No XER side effects.** +- **No VMX128 sibling.** +- **Common usage.** Accumulate four 32-bit signed sums per cycle (e.g. dot products of int16 lanes after a [`vmsumshs`](vmsumshs.md) — which already saturates internally — for further accumulation across multiple iterations). + +## Related Instructions + +- [`vadduws`](vadduws.md) — same width, unsigned saturating add. +- [`vadduwm`](vadduwm.md) — same width, modulo (non-saturating) add; sign-agnostic. +- [`vaddsbs`](vaddsbs.md), [`vaddshs`](vaddshs.md) — signed saturating add at byte / half width. +- [`vsubsws`](vsubsws.md) — the matching signed saturating subtract. +- [`vmsumshs`](vmsumshs.md), [`vmsumuhs`](vmsumuhs.md) — saturating multiply-sum that often feeds a `vaddsws` chain. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the `VSCR[SAT]` bit. + +## IBM Reference + +- [AIX 7.3 — `vaddsws` (Vector Add Signed Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddsws-vector-add-signed-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddubm.md b/migration/project-root/ppc-manual/vmx/vaddubm.md new file mode 100644 index 0000000..cf708d5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddubm.md @@ -0,0 +1,132 @@ +# `vaddubm` — Vector Add Unsigned Byte Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000000` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddubm` | `vaddubm` | — | Vector Add Unsigned Byte Modulo | + +## Syntax + +```asm +vaddubm [VD], [VA], [VB] +``` + +## Encoding + +### `vaddubm` — form `VX` + +- **Opcode word:** `0x10000000` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddubm: read | Source A vector register. | +| `VB` | vaddubm: read | Source B vector register. | +| `VD` | vaddubm: write | Destination vector register. | + +## Register Effects + +### `vaddubm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddubm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddubm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:372`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L372) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:434`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L434) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3198-3205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3198-L3205) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddubm => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i].wrapping_add(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sixteen independent byte lanes.** `VD[i] = (VA[i] + VB[i]) mod 256` for `i = 0..15`. Lane 0 is the most-significant byte (the byte at the lowest address after `stvx`). +- **Modulo wrap, not saturating.** Overflow silently wraps in 8-bit unsigned arithmetic — there is no carry-out and **`VSCR[SAT]` is not touched**. This is the same bit pattern as a signed-byte modulo add, so `vaddubm` is also the de-facto `vaddsbm` (which doesn't exist in the ISA — modulo arithmetic is sign-agnostic). +- **No carry, no flags.** XER is untouched (Altivec never updates `XER[CA]`/`XER[OV]`). The dedicated [`vaddcuw`](vaddcuw.md) instruction exists *only* because there is no SAT/CA byproduct — extracting the carry needs an explicit op. +- **Aliasing is legal.** `vaddubm v3, v3, v4` (in-place accumulate) is a single-cycle issue on Xenon's VMX pipe. +- **VSCR untouched.** Neither `SAT` nor `NJ` is read or written. Schedulable next to floats, compares and saturating ops without dependency stalls. +- **Pairs with a saturating sibling.** When you need 8-bit add with clamping, switch to [`vaddubs`](vaddubs.md) (unsigned saturate, range `0..0xFF`) or [`vaddsbs`](vaddsbs.md) (signed saturate, range `-128..+127`) — both of which *do* sticky-set `VSCR[SAT]`. +- **No VMX128 sibling.** The `vaddubm` opcode is not exposed as a `*128` form; the 32-register encoding is the only one available. + +## Related Instructions + +- [`vaddubs`](vaddubs.md) — same lane width, unsigned saturating add (`SAT` sticky-set on overflow). +- [`vaddsbs`](vaddsbs.md) — same lane width, signed saturating add. +- [`vadduhm`](vadduhm.md), [`vadduwm`](vadduwm.md) — modulo add with 8-lane half / 4-lane word width. +- [`vaddcuw`](vaddcuw.md) — produces the per-lane carry bits a 32-bit modulo add discards. +- [`vsububm`](vsububm.md) — the matching modulo subtract. +- [`vavgub`](vavgub.md) — unsigned byte average (carry-aware: `(a + b + 1) >> 1`), useful when byte addition needs rounding without overflow. + +## IBM Reference + +- [AIX 7.3 — `vaddubm` (Vector Add Unsigned Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddubm-vector-add-unsigned-byte-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vaddubs.md b/migration/project-root/ppc-manual/vmx/vaddubs.md new file mode 100644 index 0000000..e2eaa1d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vaddubs.md @@ -0,0 +1,138 @@ +# `vaddubs` — Vector Add Unsigned Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000200` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vaddubs` | `vaddubs` | — | Vector Add Unsigned Byte Saturate | + +## Syntax + +```asm +vaddubs [VD], [VA], [VB] +``` + +## Encoding + +### `vaddubs` — form `VX` + +- **Opcode word:** `0x10000200` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `512` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vaddubs: read | Source A vector register. | +| `VB` | vaddubs: read | Source B vector register. | +| `VD` | vaddubs: write | Destination vector register. | +| `VSCR` | vaddubs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vaddubs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vaddubs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vaddubs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vaddubs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:379`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L379) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:475`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L475) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3233-3245`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3233-L3245) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vaddubs => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + let mut sat = false; + for i in 0..16 { + let (v, s) = crate::vmx::sat_add_u8(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sixteen unsigned-byte lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFF)` for `i = 0..15`. Lane 0 is the most-significant byte after `stvx`. +- **`VSCR[SAT]` is sticky-set** if any lane saturates. Once set, it stays set until [`mtvscr`](mtvscr.md) clears it. Xenia computes this with `crate::vmx::sat_add_u8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **One-sided clamp.** Only the upper bound applies (unsigned add cannot underflow). Distinct from [`vaddsbs`](vaddsbs.md), which clips at both `+127` and `-128`. +- **Pixel-blend workhorse.** Common usage is to add two unsigned-byte colour vectors with clamp-to-white at `0xFF`. Saturation behaves the same way as `_mm_adds_epu8` on x86 SSE2 — making it a one-to-one host translation candidate. +- **Versus modulo.** [`vaddubm`](vaddubm.md) wraps silently and never touches `VSCR[SAT]`. Use `vaddubs` when overflow indicates "too bright" / "out of range" and you want to flag it sticky. +- **No XER side effects.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vaddubm`](vaddubm.md) — same width, modulo (non-saturating) add. +- [`vaddsbs`](vaddsbs.md) — same width, signed saturating add (range `-128..+127`). +- [`vadduhs`](vadduhs.md), [`vadduws`](vadduws.md) — unsigned saturating add at half / word width. +- [`vsububs`](vsububs.md) — the matching unsigned saturating subtract (clamps to `0`). +- [`vavgub`](vavgub.md) — rounding average; alternative when you want `(a + b + 1) >> 1` without overflow worry. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit. + +## IBM Reference + +- [AIX 7.3 — `vaddubs` (Vector Add Unsigned Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vaddubs-vector-add-unsigned-byte-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vadduhm.md b/migration/project-root/ppc-manual/vmx/vadduhm.md new file mode 100644 index 0000000..4f81fd2 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vadduhm.md @@ -0,0 +1,131 @@ +# `vadduhm` — Vector Add Unsigned Half Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000040` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vadduhm` | `vadduhm` | — | Vector Add Unsigned Half Word Modulo | + +## Syntax + +```asm +vadduhm [VD], [VA], [VB] +``` + +## Encoding + +### `vadduhm` — form `VX` + +- **Opcode word:** `0x10000040` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `64` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vadduhm: read | Source A vector register. | +| `VB` | vadduhm: read | Source B vector register. | +| `VD` | vadduhm: write | Destination vector register. | + +## Register Effects + +### `vadduhm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vadduhm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduhm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:387`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L387) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:441`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L441) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3214-3221`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3214-L3221) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vadduhm => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i].wrapping_add(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Eight half-word lanes.** `VD[i] = (VA[i] + VB[i]) mod 65536` for `i = 0..7`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **Modulo wrap, not saturating.** Overflow silently wraps in 16-bit arithmetic; **`VSCR[SAT]` is not touched** and there is no carry-out. Sign-agnostic — modulo add for signed `int16` and unsigned `u16` is bit-pattern-identical, so this is also the de-facto `vaddshm`. +- **No XER, no NJ involvement.** +- **Aliasing legal.** `vadduhm v3, v3, v4` is a single-issue accumulate. +- **Pairs with saturating siblings.** Switch to [`vadduhs`](vadduhs.md) for unsigned clamp at `0xFFFF` or [`vaddshs`](vaddshs.md) for signed clamp at `±32767` when overflow needs to be detected via sticky `VSCR[SAT]`. +- **Common usage.** Multi-precision adds composed from 16-bit lanes; UV-coordinate accumulation; per-pixel half-precision counters. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vadduhs`](vadduhs.md) — same width, unsigned saturating add. +- [`vaddshs`](vaddshs.md) — same width, signed saturating add. +- [`vaddubm`](vaddubm.md), [`vadduwm`](vadduwm.md) — modulo add at byte / word width. +- [`vsubuhm`](vsubuhm.md) — the matching modulo subtract. +- [`vavguh`](vavguh.md) — unsigned half-word rounding average; useful when addition needs to stay representable. + +## IBM Reference + +- [AIX 7.3 — `vadduhm` (Vector Add Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduhm-vector-add-unsigned-half-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vadduhs.md b/migration/project-root/ppc-manual/vmx/vadduhs.md new file mode 100644 index 0000000..17bb101 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vadduhs.md @@ -0,0 +1,136 @@ +# `vadduhs` — Vector Add Unsigned Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000240` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vadduhs` | `vadduhs` | — | Vector Add Unsigned Half Word Saturate | + +## Syntax + +```asm +vadduhs [VD], [VA], [VB] +``` + +## Encoding + +### `vadduhs` — form `VX` + +- **Opcode word:** `0x10000240` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `576` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vadduhs: read | Source A vector register. | +| `VB` | vadduhs: read | Source B vector register. | +| `VD` | vadduhs: write | Destination vector register. | +| `VSCR` | vadduhs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vadduhs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vadduhs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vadduhs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduhs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:394`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L394) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:482`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L482) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3282-3293`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3282-L3293) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vadduhs => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..8 { + let (v, s) = crate::vmx::sat_add_u16(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Eight unsigned-half lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFFFF)` for `i = 0..7`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **`VSCR[SAT]` is sticky-set** if any lane clamps. Cleared only by [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_u16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) and ORs the per-lane flag. +- **One-sided clamp.** Unsigned add cannot underflow, so only the upper bound `0xFFFF` ever clips. +- **The modulo counterpart is `vadduhm`.** Use `vadduhs` when "too large to fit" must be flagged or clipped — typical for accumulating Q16 unsigned counters. +- **No XER side effects.** +- **Maps directly to `_mm_adds_epu16`** on SSE2 hosts — semantically identical, including the sticky-saturation observation step (xenia recovers the SAT flag from the per-lane comparison). +- **No VMX128 sibling.** + +## Related Instructions + +- [`vadduhm`](vadduhm.md) — same width, modulo (non-saturating) add. +- [`vaddshs`](vaddshs.md) — same width, signed saturating add (range `-32768..+32767`). +- [`vaddubs`](vaddubs.md), [`vadduws`](vadduws.md) — unsigned saturating add at byte / word width. +- [`vsubuhs`](vsubuhs.md) — the matching unsigned saturating subtract. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit. + +## IBM Reference + +- [AIX 7.3 — `vadduhs` (Vector Add Unsigned Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduhs-vector-add-unsigned-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vadduwm.md b/migration/project-root/ppc-manual/vmx/vadduwm.md new file mode 100644 index 0000000..363e1c5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vadduwm.md @@ -0,0 +1,131 @@ +# `vadduwm` — Vector Add Unsigned Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000080` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vadduwm` | `vadduwm` | — | Vector Add Unsigned Word Modulo | + +## Syntax + +```asm +vadduwm [VD], [VA], [VB] +``` + +## Encoding + +### `vadduwm` — form `VX` + +- **Opcode word:** `0x10000080` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `128` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vadduwm: read | Source A vector register. | +| `VB` | vadduwm: read | Source B vector register. | +| `VD` | vadduwm: write | Destination vector register. | + +## Register Effects + +### `vadduwm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vadduwm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduwm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:402`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L402) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:448`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L448) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2396-2403`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2396-L2403) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vadduwm => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i].wrapping_add(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Four 32-bit word lanes.** `VD[i] = (VA[i] + VB[i]) mod 2^32` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **Modulo wrap, not saturating.** Carry is dropped; **`VSCR[SAT]` is not touched**. Sign-agnostic — bit-pattern-identical for signed `int32` and unsigned `u32` modulo addition. +- **Multi-precision idiom.** Pair with [`vaddcuw`](vaddcuw.md) to recover the per-lane carry, then [`vsldoi`](vsldoi.md) the carry one word left and feed it back into another `vadduwm` to chain a 128-bit add. +- **No XER, no NJ involvement.** +- **Aliasing legal.** `vadduwm v3, v3, v4`. +- **No VMX128 sibling** in the `vadduwm` mnemonic specifically; `vaddfp128` covers the float case, but integer-modulo-word stays VMX-only. +- **Common usage.** RGBA8 packed-pixel sums; per-tile counters; BigInt limbs. + +## Related Instructions + +- [`vaddcuw`](vaddcuw.md) — produces the per-lane carry that `vadduwm` discards. +- [`vadduws`](vadduws.md), [`vaddsws`](vaddsws.md) — unsigned / signed saturating add at the same width. +- [`vaddubm`](vaddubm.md), [`vadduhm`](vadduhm.md) — modulo add at byte / half width. +- [`vsubuwm`](vsubuwm.md) — the matching modulo subtract. +- [`vsldoi`](vsldoi.md) — used to align carries during multi-precision chains. + +## IBM Reference + +- [AIX 7.3 — `vadduwm` (Vector Add Unsigned Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduwm-vector-add-unsigned-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Integer Arithmetic & multi-precision idiom](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vadduws.md b/migration/project-root/ppc-manual/vmx/vadduws.md new file mode 100644 index 0000000..cb20ad4 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vadduws.md @@ -0,0 +1,137 @@ +# `vadduws` — Vector Add Unsigned Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000280` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vadduws` | `vadduws` | — | Vector Add Unsigned Word Saturate | + +## Syntax + +```asm +vadduws [VD], [VA], [VB] +``` + +## Encoding + +### `vadduws` — form `VX` + +- **Opcode word:** `0x10000280` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `640` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vadduws: read | Source A vector register. | +| `VB` | vadduws: read | Source B vector register. | +| `VD` | vadduws: write | Destination vector register. | +| `VSCR` | vadduws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vadduws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vadduws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vadduws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vadduws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:409`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L409) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:90`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L90) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:489`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L489) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3330-3341`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3330-L3341) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vadduws => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::sat_add_u32(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Four unsigned-word lanes, saturating.** Each `VD[i] = min(VA[i] + VB[i], 0xFFFF_FFFF)` for `i = 0..3`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **`VSCR[SAT]` is sticky-set** if any lane clamps. Cleared only via [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_add_u32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **One-sided clamp** at `UINT32_MAX`. There is no underflow path for unsigned add. +- **The modulo counterpart is `vadduwm`.** Use `vadduws` only when overflow needs to be visible / clamped; otherwise the modulo form is one cycle and never touches the sticky bit. +- **No XER side effects, no carry exposure.** Unlike `vadduwm + vaddcuw`, the saturating form does **not** make the carry available — it is fused into the clamp. +- **No VMX128 sibling.** +- **Common usage.** Pixel sums where four packed unsigned 32-bit accumulators must clip at white; counter overflow detection. + +## Related Instructions + +- [`vadduwm`](vadduwm.md) — same width, modulo add (no saturation, no SAT flag). +- [`vaddsws`](vaddsws.md) — same width, signed saturating add. +- [`vaddubs`](vaddubs.md), [`vadduhs`](vadduhs.md) — unsigned saturating add at byte / half width. +- [`vsubuws`](vsubuws.md) — the matching unsigned saturating subtract. +- [`vaddcuw`](vaddcuw.md) — explicit carry-out (paired with the modulo form). +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear the sticky `VSCR[SAT]` bit. + +## IBM Reference + +- [AIX 7.3 — `vadduws` (Vector Add Unsigned Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vadduws-vector-add-unsigned-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Saturating Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vand.md b/migration/project-root/ppc-manual/vmx/vand.md new file mode 100644 index 0000000..42d3096 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vand.md @@ -0,0 +1,181 @@ +# `vand` — Vector Logical AND + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000404` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vand` | `vand` | — | Vector Logical AND | +| `vand128` | `vand128` | — | Vector128 Logical AND | + +## Syntax + +```asm +vand [VD], [VA], [VB] +vand128 [VD], [VA], [VB] +``` + +## Encoding + +### `vand` — form `VX` + +- **Opcode word:** `0x10000404` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1028` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vand128` — form `VX128` + +- **Opcode word:** `0x14000210` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `528` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vand: read; vand128: read | Source A vector register. | +| `VB` | vand: read; vand128: read | Source B vector register. | +| `VD` | vand: write; vand128: write | Destination vector register. | + +## Register Effects + +### `vand` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vand128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vand`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vand"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:423`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L423) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:521`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L521) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2208-2216`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2208-L2216) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vand | PpcOpcode::vand128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] & b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vand128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vand128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:426`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L426) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:619`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L619) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2208-2216`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2208-L2216) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vand | PpcOpcode::vand128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] & b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bitwise across the full 128 bits.** `VD = VA & VB`. Lane width is irrelevant — the AND is bit-for-bit and there is no lane boundary. Xenia chooses to express this as four `u32` ANDs, but any widening (`u8`, `u16`, `u64`, `u128`) is observationally identical. +- **No flags, no exceptions, no `VSCR` interaction.** Pure combinational op; one of the cheapest VMX instructions. +- **Common usage with compares.** Compare ops produce per-lane all-ones / all-zero masks; `vand` with the mask selects the matching lanes (clearing the rest). For "select-by-mask" with a non-zero alternative use [`vsel`](vsel.md) instead. +- **Idiom: clear lanes.** `vand VD, VD, vZero` zeroes a register; in practice [`vxor VD, VD, VD`](vxor.md) is preferred since it doesn't need a zero-vector source. +- **Aliasing legal.** All three operands may overlap. +- **VMX128 sibling (`vand128`).** Identical semantics with the extended 128-register encoding; xenia reuses one match arm via the `vmx_reg_triple` helper (see [`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)). + +## Related Instructions + +- [`vandc`](vandc.md) — `VA & ~VB`; useful for clearing bits selected by a mask. +- [`vor`](vor.md), [`vxor`](vxor.md), [`vnor`](vnor.md) — the rest of the bitwise family. +- [`vsel`](vsel.md) — bit-wise select using a mask: `(VC & VB) | (~VC & VA)`. The recommended idiom whenever the "false" path is non-zero. +- [`vcmpequb`](vcmpequb.md) and other compares — natural mask producers. + +## IBM Reference + +- [AIX 7.3 — `vand` (Vector Logical AND)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vand-vector-logical-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vandc.md b/migration/project-root/ppc-manual/vmx/vandc.md new file mode 100644 index 0000000..430a38a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vandc.md @@ -0,0 +1,181 @@ +# `vandc` — Vector Logical AND with Complement + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000444` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vandc` | `vandc` | — | Vector Logical AND with Complement | +| `vandc128` | `vandc128` | — | Vector128 Logical AND with Complement | + +## Syntax + +```asm +vandc [VD], [VA], [VB] +vandc128 [VD], [VA], [VB] +``` + +## Encoding + +### `vandc` — form `VX` + +- **Opcode word:** `0x10000444` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1092` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vandc128` — form `VX128` + +- **Opcode word:** `0x14000250` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `592` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vandc: read; vandc128: read | Source A vector register. | +| `VB` | vandc: read; vandc128: read | Source B vector register. | +| `VD` | vandc: write; vandc128: write | Destination vector register. | + +## Register Effects + +### `vandc` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vandc128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vandc`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vandc"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:436`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L436) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:526`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L526) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2217-2225`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2217-L2225) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vandc | PpcOpcode::vandc128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] & !b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vandc128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vandc128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:439`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L439) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:91`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L91) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:621`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L621) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2217-2225`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2217-L2225) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vandc | PpcOpcode::vandc128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] & !b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bitwise AND-with-complement of the full 128 bits.** `VD = VA & ~VB`. Lane width is irrelevant — the operation is bit-for-bit. Order matters: `vandc VA, VB` is *not* the same as `vandc VB, VA`. +- **Standard "clear bits in mask" idiom.** Drop bits selected by the mask in `VB`: `vandc VD, VD, vMask`. Equivalent to `VD &= ~vMask`. Cheaper than synthesising the complement first with [`vnor`](vnor.md) and then ANDing. +- **Compare → mask → mask-out idiom.** A compare produces per-lane all-ones; pair with `vandc` to keep only the lanes where the compare was *false*. The complement avoids an extra [`vnor`](vnor.md) or `vxor` with all-ones. +- **No flags, no exceptions, no `VSCR` interaction.** +- **Aliasing legal.** `vandc VD, VD, VD` clears `VD` (`x & ~x = 0`). +- **VMX128 sibling (`vandc128`).** Identical semantics with the extended 128-register encoding; xenia reuses one match arm. + +## Related Instructions + +- [`vand`](vand.md) — the un-complemented sibling. +- [`vor`](vor.md), [`vxor`](vxor.md), [`vnor`](vnor.md) — the rest of the bitwise family. +- [`vsel`](vsel.md) — bitwise select using a third register; useful when the "false" branch is non-zero. +- [`vcmpequb`](vcmpequb.md) and other compares — natural mask producers. + +## IBM Reference + +- [AIX 7.3 — `vandc` (Vector Logical AND with Complement)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vandc-vector-logical-complement-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavgsb.md b/migration/project-root/ppc-manual/vmx/vavgsb.md new file mode 100644 index 0000000..2a4db5a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavgsb.md @@ -0,0 +1,130 @@ +# `vavgsb` — Vector Average Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000502` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavgsb` | `vavgsb` | — | Vector Average Signed Byte | + +## Syntax + +```asm +vavgsb [VD], [VA], [VB] +``` + +## Encoding + +### `vavgsb` — form `VX` + +- **Opcode word:** `0x10000502` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1282` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavgsb: read | Source A vector register. | +| `VB` | vavgsb: read | Source B vector register. | +| `VD` | vavgsb: write | Destination vector register. | + +## Register Effects + +### `vavgsb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavgsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:443`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L443) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:533`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L533) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3410-3417`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3410-L3417) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavgsb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i8; 16]; + for i in 0..16 { r[i] = crate::vmx::avg_i8(a[i], b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sixteen signed-byte rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, performed in arithmetic *wider* than 8 bits (so the `+1` cannot overflow). The result is then truncated back to `int8` — saturation never triggers because the average of two `int8` values fits in `int8`. Rounding is "round half up toward +∞". +- **Big-endian byte lanes.** Lane 0 is the most-significant byte after `stvx`. +- **No `VSCR[SAT]` impact.** Mathematical impossibility — `(a + b + 1) / 2` for `a, b ∈ [-128, 127]` always lies in `[-128, 127]`. Xenia's `crate::vmx::avg_i8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) widens to `i16` before the add. +- **No XER side effects.** +- **Common usage.** Filtering / decimation passes, motion-compensation half-pel interpolation in older video codecs (the rounding-up bias matches MPEG/H.263 averaging conventions). +- **Aliasing legal.** `vavgsb v3, v3, v4` is a typical lowpass-step idiom. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavgub`](vavgub.md) — same width, unsigned rounding average. +- [`vavgsh`](vavgsh.md), [`vavgsw`](vavgsw.md) — signed rounding average at half / word width. +- [`vaddubm`](vaddubm.md), [`vaddsbs`](vaddsbs.md) — addition variants without the rounding-divide step. +- [`vsububm`](vsububm.md) — modulo subtract; needed for differential before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavgsb` (Vector Average Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsb-vector-average-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavgsh.md b/migration/project-root/ppc-manual/vmx/vavgsh.md new file mode 100644 index 0000000..8bfc6e7 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavgsh.md @@ -0,0 +1,130 @@ +# `vavgsh` — Vector Average Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000542` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavgsh` | `vavgsh` | — | Vector Average Signed Half Word | + +## Syntax + +```asm +vavgsh [VD], [VA], [VB] +``` + +## Encoding + +### `vavgsh` — form `VX` + +- **Opcode word:** `0x10000542` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1346` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavgsh: read | Source A vector register. | +| `VB` | vavgsh: read | Source B vector register. | +| `VD` | vavgsh: write | Destination vector register. | + +## Register Effects + +### `vavgsh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavgsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:450`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L450) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:535`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L535) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3426-3433`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3426-L3433) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavgsh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = crate::vmx::avg_i16(a[i], b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Eight signed-half rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 32-bit arithmetic to avoid overflow on the intermediate sum, then truncated back to `int16`. Rounding is half-up toward +∞. +- **Big-endian half lanes.** Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **No `VSCR[SAT]` impact.** The result is always representable in `int16`. Xenia's `crate::vmx::avg_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) widens to `i32` before adding. +- **No XER side effects.** +- **Common usage.** Audio sample interpolation, fixed-point Q15 midpoint filters, video upscaling at 16-bit precision. +- **Aliasing legal.** `vavgsh v3, v3, v4` collapses two half-precision streams into one. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavguh`](vavguh.md) — same width, unsigned rounding average. +- [`vavgsb`](vavgsb.md), [`vavgsw`](vavgsw.md) — signed rounding average at byte / word width. +- [`vadduhm`](vadduhm.md), [`vaddshs`](vaddshs.md) — addition variants without the divide step. +- [`vsubuhm`](vsubuhm.md) — modulo subtract; difference computation before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavgsh` (Vector Average Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsh-vector-average-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavgsw.md b/migration/project-root/ppc-manual/vmx/vavgsw.md new file mode 100644 index 0000000..c5b968a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavgsw.md @@ -0,0 +1,129 @@ +# `vavgsw` — Vector Average Signed Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000582` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavgsw` | `vavgsw` | — | Vector Average Signed Word | + +## Syntax + +```asm +vavgsw [VD], [VA], [VB] +``` + +## Encoding + +### `vavgsw` — form `VX` + +- **Opcode word:** `0x10000582` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1410` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavgsw: read | Source A vector register. | +| `VB` | vavgsw: read | Source B vector register. | +| `VD` | vavgsw: write | Destination vector register. | + +## Register Effects + +### `vavgsw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavgsw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgsw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:457`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L457) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:537`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L537) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3442-3449`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3442-L3449) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavgsw => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = crate::vmx::avg_i32(a[i], b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Four signed-word rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 64-bit arithmetic to avoid intermediate overflow, then truncated back to `int32`. Rounding is half-up toward +∞. +- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **No `VSCR[SAT]` impact.** The mathematical result always fits in `int32`. Xenia's `crate::vmx::avg_i32` widens to `i64` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **No XER side effects.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavguw`](vavguw.md) — same width, unsigned rounding average. +- [`vavgsb`](vavgsb.md), [`vavgsh`](vavgsh.md) — signed rounding average at byte / half width. +- [`vadduwm`](vadduwm.md), [`vaddsws`](vaddsws.md) — addition variants without the divide step. +- [`vsubuwm`](vsubuwm.md) — modulo subtract; difference computation before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavgsw` (Vector Average Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgsw-vector-average-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavgub.md b/migration/project-root/ppc-manual/vmx/vavgub.md new file mode 100644 index 0000000..a848a4d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavgub.md @@ -0,0 +1,130 @@ +# `vavgub` — Vector Average Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000402` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavgub` | `vavgub` | — | Vector Average Unsigned Byte | + +## Syntax + +```asm +vavgub [VD], [VA], [VB] +``` + +## Encoding + +### `vavgub` — form `VX` + +- **Opcode word:** `0x10000402` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1026` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavgub: read | Source A vector register. | +| `VB` | vavgub: read | Source B vector register. | +| `VD` | vavgub: write | Destination vector register. | + +## Register Effects + +### `vavgub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavgub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavgub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:468`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L468) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:520`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L520) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3402-3409`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3402-L3409) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavgub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = crate::vmx::avg_u8(a[i], b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sixteen unsigned-byte rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 16-bit arithmetic so the `+1` cannot overflow, then truncated back to `u8`. Rounding is half-up. +- **Big-endian byte lanes.** Lane 0 is the most-significant byte after `stvx`. +- **No `VSCR[SAT]` impact.** The result always fits in `u8` (the average of two `u8` values is at most `255`). Xenia uses `crate::vmx::avg_u8` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **Equivalent to `_mm_avg_epu8`** on x86 SSE2 — semantically identical (rounding mode and width match). +- **Common usage.** Pixel-blend `(A + B + 1) / 2`, MPEG/H.264 half-pel motion-compensation averaging, downscale filters, alpha midpoint. +- **Aliasing legal.** `vavgub v3, v3, v4`. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavgsb`](vavgsb.md) — same width, signed rounding average. +- [`vavguh`](vavguh.md), [`vavguw`](vavguw.md) — unsigned rounding average at half / word width. +- [`vaddubm`](vaddubm.md), [`vaddubs`](vaddubs.md) — addition variants without the divide step. +- [`vsububm`](vsububm.md) — modulo subtract; difference before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavgub` (Vector Average Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavgub-vector-average-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavguh.md b/migration/project-root/ppc-manual/vmx/vavguh.md new file mode 100644 index 0000000..7a31210 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavguh.md @@ -0,0 +1,130 @@ +# `vavguh` — Vector Average Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000442` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavguh` | `vavguh` | — | Vector Average Unsigned Half Word | + +## Syntax + +```asm +vavguh [VD], [VA], [VB] +``` + +## Encoding + +### `vavguh` — form `VX` + +- **Opcode word:** `0x10000442` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1090` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavguh: read | Source A vector register. | +| `VB` | vavguh: read | Source B vector register. | +| `VD` | vavguh: write | Destination vector register. | + +## Register Effects + +### `vavguh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavguh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavguh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:475`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L475) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:525`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L525) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3418-3425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3418-L3425) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavguh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = crate::vmx::avg_u16(a[i], b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Eight unsigned-half rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 32-bit arithmetic to avoid the intermediate `+1` overflowing, then truncated to `u16`. Rounding is half-up. +- **Big-endian half lanes.** Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **No `VSCR[SAT]` impact.** The result always fits in `u16`. Xenia uses `crate::vmx::avg_u16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **Equivalent to `_mm_avg_epu16`** on x86 SSE2 — same rounding, same width. +- **Common usage.** Higher-precision pixel blending (e.g. RGB565 sums after widening), Q16 unsigned filters. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavgsh`](vavgsh.md) — same width, signed rounding average. +- [`vavgub`](vavgub.md), [`vavguw`](vavguw.md) — unsigned rounding average at byte / word width. +- [`vadduhm`](vadduhm.md), [`vadduhs`](vadduhs.md) — addition variants without the divide step. +- [`vsubuhm`](vsubuhm.md) — modulo subtract; difference before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavguh` (Vector Average Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavguh-vector-average-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vavguw.md b/migration/project-root/ppc-manual/vmx/vavguw.md new file mode 100644 index 0000000..c7264f0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vavguw.md @@ -0,0 +1,130 @@ +# `vavguw` — Vector Average Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000482` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vavguw` | `vavguw` | — | Vector Average Unsigned Word | + +## Syntax + +```asm +vavguw [VD], [VA], [VB] +``` + +## Encoding + +### `vavguw` — form `VX` + +- **Opcode word:** `0x10000482` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1154` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vavguw: read | Source A vector register. | +| `VB` | vavguw: read | Source B vector register. | +| `VD` | vavguw: write | Destination vector register. | + +## Register Effects + +### `vavguw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vavguw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vavguw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:482`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L482) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:92`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L92) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:530`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L530) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3434-3441`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3434-L3441) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vavguw => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = crate::vmx::avg_u32(a[i], b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Four unsigned-word rounding averages.** Each `VD[i] = (VA[i] + VB[i] + 1) >> 1`, computed in 64-bit arithmetic to avoid intermediate overflow, then truncated to `u32`. Rounding is half-up. +- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **No `VSCR[SAT]` impact.** The result always fits in `u32`. Xenia uses `crate::vmx::avg_u32` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **No SSE2 direct equivalent.** SSE2 only provides `_mm_avg_epu8` and `_mm_avg_epu16`; on x86 hosts xenia has to widen to 64-bit and do the average manually. +- **Common usage.** Per-tile counters; midpoint of two 32-bit packed values. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vavgsw`](vavgsw.md) — same width, signed rounding average. +- [`vavgub`](vavgub.md), [`vavguh`](vavguh.md) — unsigned rounding average at byte / half width. +- [`vadduwm`](vadduwm.md), [`vadduws`](vadduws.md) — addition variants without the divide step. +- [`vsubuwm`](vsubuwm.md) — modulo subtract; difference before averaging. + +## IBM Reference + +- [AIX 7.3 — `vavguw` (Vector Average Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vavguw-vector-average-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Average Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcfsx.md b/migration/project-root/ppc-manual/vmx/vcfsx.md new file mode 100644 index 0000000..17e5adb --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcfsx.md @@ -0,0 +1,131 @@ +# `vcfsx` — Vector Convert from Signed Fixed-Point Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000034a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcfs` | `vcfsx` | — | Vector Convert from Signed Fixed-Point Word | + +## Syntax + +```asm +vcfsx [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcfsx` — form `VX` + +- **Opcode word:** `0x1000034a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `842` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcfsx: read | Source B vector register. | +| `UIMM` | vcfsx: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcfsx: write | Destination vector register. | + +## Register Effects + +### `vcfsx` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcfsx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfsx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:500`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L500) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:509`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L509) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4306-4313`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4306-L4313) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcfsx => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = crate::vmx::cvt_i32_to_f32(b[i], uimm); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Convert signed-Q `int32` lane to `binary32`.** For each of the four word lanes, `VD[i] = (float)VB[i] / 2^UIMM`, where `UIMM` is the 5-bit immediate at bits 11..15 of the instruction. UIMM ranges 0..31; UIMM=0 is plain integer-to-float. +- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **Use case.** Q-format fixed-point (`Qm.n`) → IEEE float in one instruction. UIMM gives the fractional bit count, so `vcfsx vD, vB, 16` interprets each lane as Q15.16. +- **Inexact rounding.** Values whose magnitude exceeds `2^24` lose mantissa precision (only 24 bits in `binary32`'s significand). The default rounding mode is round-to-nearest-even; VMX has no per-instruction rounding control. +- **`VSCR[NJ]` (flush-denormals)** affects the output if the scaled value is sub-normal. Xenia's `crate::vmx::cvt_i32_to_f32` honours this via the architectural `VSCR[NJ]` snapshot. +- **No `VSCR[SAT]` or XER changes**, no exceptions raised. +- **No VMX128 sibling.** +- **Round-trip caveat.** `vctsxs` (the inverse) saturates instead of wrapping, so a `vcfsx`/`vctsxs` round-trip is *not* identity for values outside the signed-int32 representable range — important for fixed-point interpolation kernels. + +## Related Instructions + +- [`vcfux`](vcfux.md) — same shape, unsigned source. +- [`vctsxs`](vctsxs.md) — inverse: float → signed-Q `int32` with saturation. +- [`vctuxs`](vctuxs.md) — inverse: float → unsigned-Q `uint32` with saturation. +- [`vrfin`](vrfin.md), [`vrfiz`](vrfiz.md) — float-to-integer rounding modes when no Q-format scale is needed. + +## IBM Reference + +- [AIX 7.3 — `vcfsx` (Vector Convert from Signed Fixed-Point Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcfsx-vector-convert-from-signed-fixed-point-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcfux.md b/migration/project-root/ppc-manual/vmx/vcfux.md new file mode 100644 index 0000000..e98a343 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcfux.md @@ -0,0 +1,131 @@ +# `vcfux` — Vector Convert from Unsigned Fixed-Point Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000030a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcfu` | `vcfux` | — | Vector Convert from Unsigned Fixed-Point Word | + +## Syntax + +```asm +vcfux [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcfux` — form `VX` + +- **Opcode word:** `0x1000030a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `778` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcfux: read | Source B vector register. | +| `UIMM` | vcfux: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcfux: write | Destination vector register. | + +## Register Effects + +### `vcfux` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcfux`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfux"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:518`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L518) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:502`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L502) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4314-4321`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4314-L4321) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcfux => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = crate::vmx::cvt_u32_to_f32(b[i], uimm); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Convert unsigned-Q `uint32` lane to `binary32`.** For each of the four word lanes, `VD[i] = (float)VB[i] / 2^UIMM`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`. +- **Big-endian word lanes.** Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **Use case.** Unsigned Q-format fixed-point → IEEE float; common for normalised colour channels (`vcfux vD, vColor, 8` rescales `0..255` to `0..0.996`). +- **Inexact rounding.** Magnitudes above `2^24` lose precision. Default rounding is round-to-nearest-even; VMX has no per-instruction rounding control. +- **`VSCR[NJ]`** affects sub-normal outputs. Xenia's `crate::vmx::cvt_u32_to_f32` honours the architectural snapshot. +- **No `VSCR[SAT]`, no XER changes, no exceptions.** +- **No VMX128 sibling.** +- **Round-trip caveat.** Pair with [`vctuxs`](vctuxs.md) for the inverse — but the inverse saturates rather than wraps, so floats above `2^32 − 1` clamp to `0xFFFFFFFF` and stick `VSCR[SAT]`. + +## Related Instructions + +- [`vcfsx`](vcfsx.md) — same shape, signed source. +- [`vctuxs`](vctuxs.md) — inverse: float → unsigned-Q `uint32` with saturation. +- [`vctsxs`](vctsxs.md) — inverse: float → signed-Q `int32` with saturation. +- [`vrfin`](vrfin.md), [`vrfiz`](vrfiz.md) — float-to-integer rounding modes for the un-scaled case. + +## IBM Reference + +- [AIX 7.3 — `vcfux` (Vector Convert from Unsigned Fixed-Point Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcfux-vector-convert-from-unsigned-fixed-point-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpbfp.md b/migration/project-root/ppc-manual/vmx/vcmpbfp.md new file mode 100644 index 0000000..59c2dc4 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpbfp.md @@ -0,0 +1,220 @@ +# `vcmpbfp` — Vector Compare Bounds Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100003c6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpbfp` | `vcmpbfp` | — | Vector Compare Bounds Floating Point | +| `vcmpbfp.` | `vcmpbfp` | Rc=1 | Vector Compare Bounds Floating Point | +| `vcmpbfp128` | `vcmpbfp128` | — | Vector128 Compare Bounds Floating Point | +| `vcmpbfp128.` | `vcmpbfp128` | Rc=1 | Vector128 Compare Bounds Floating Point | + +## Syntax + +```asm +vcmpbfp[Rc] [VD], [VA], [VB] +vcmpbfp128[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpbfp` — form `VC` + +- **Opcode word:** `0x100003c6` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `966` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +### `vcmpbfp128` — form `VX128_R` + +- **Opcode word:** `0x18000180` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `384` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpbfp: read; vcmpbfp128: read | Source A vector register. | +| `VB` | vcmpbfp: read; vcmpbfp128: read | Source B vector register. | +| `VD` | vcmpbfp: write; vcmpbfp128: write | Destination vector register. | +| `CR` | vcmpbfp: write (conditional); vcmpbfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpbfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +### `vcmpbfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpbfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. +- `vcmpbfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpbfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpbfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:583`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L583) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:569`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L569) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3822-3847`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3822-L3847) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpbfp | PpcOpcode::vcmpbfp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vcmpbfp128); + let (ra, rb, rd) = if is_128 { + (instr.va128(), instr.vb128(), instr.vd128()) + } else { + (instr.ra(), instr.rb(), instr.rd()) + }; + let a = ctx.vr[ra].as_f32x4(); + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0u32; 4]; + let mut any_out = false; + for i in 0..4 { + let mut lane: u32 = 0; + if a[i].is_nan() || b[i].is_nan() || a[i] > b[i] { lane |= 0x8000_0000; any_out = true; } + if a[i].is_nan() || b[i].is_nan() || a[i] < -b[i] { lane |= 0x4000_0000; any_out = true; } + r[i] = lane; + } + let rc = if is_128 { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { + ctx.cr[6] = crate::context::CrField { + lt: false, gt: false, eq: !any_out, so: false, + }; + } + ctx.vr[rd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vcmpbfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpbfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:586`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L586) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:684`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L684) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3822-3847`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3822-L3847) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpbfp | PpcOpcode::vcmpbfp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vcmpbfp128); + let (ra, rb, rd) = if is_128 { + (instr.va128(), instr.vb128(), instr.vd128()) + } else { + (instr.ra(), instr.rb(), instr.rd()) + }; + let a = ctx.vr[ra].as_f32x4(); + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0u32; 4]; + let mut any_out = false; + for i in 0..4 { + let mut lane: u32 = 0; + if a[i].is_nan() || b[i].is_nan() || a[i] > b[i] { lane |= 0x8000_0000; any_out = true; } + if a[i].is_nan() || b[i].is_nan() || a[i] < -b[i] { lane |= 0x4000_0000; any_out = true; } + r[i] = lane; + } + let rc = if is_128 { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { + ctx.cr[6] = crate::context::CrField { + lt: false, gt: false, eq: !any_out, so: false, + }; + } + ctx.vr[rd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **"Bounds" compare, not equality.** Per word lane, sets two output bits: bit 0 (mask `0x80000000`) if `VA[i] > VB[i]` (out-of-range high) and bit 1 (mask `0x40000000`) if `VA[i] < -VB[i]` (out-of-range low). Bits 2..31 of each lane are zero. +- **NaN inputs are out-of-range in *both* directions.** Xenia sets both `0x80000000` and `0x40000000` if either input is NaN, matching the IBM manual: NaN is treated as "violates both bounds". +- **CR6 update when `Rc=1`.** CR6 is set as `[lt=0, gt=0, eq=(no-lane-out-of-range), so=0]` — i.e. only the `eq` bit signifies "all four lanes were within `±VB`". Useful as `bc 12,26` (branch if all in-range) for SIMD clamping loops. +- **No `VSCR[SAT]`, no XER changes, no exceptions.** +- **The convention is "is point inside box?"** — not a per-lane compare like the other `vcmp*` ops. Output is a flag-pair, not a boolean mask, so it does **not** plug directly into [`vsel`](vsel.md). To get a boolean, OR the two bits down with [`vor`](vor.md) and a shift. +- **VMX128 sibling (`vcmpbfp128`).** Identical semantics; the `Rc` bit lives at bit 27 of the VX128_R encoding. +- **Lane width is fixed at word.** Bounds check is single-precision float only; there is no `vcmpb*` for half / byte / int. + +## Related Instructions + +- [`vcmpeqfp`](vcmpeqfp.md) — element-wise `==` for floats. +- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — element-wise `>` and `>=` for floats. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vor`](vor.md) — combine the two bits per lane into a boolean mask if needed. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — clamp values to a range without testing. + +## IBM Reference + +- [AIX 7.3 — `vcmpbfp` (Vector Compare Bounds Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpbfp-vector-compare-bounds-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpeqfp.md b/migration/project-root/ppc-manual/vmx/vcmpeqfp.md new file mode 100644 index 0000000..7c69105 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpeqfp.md @@ -0,0 +1,192 @@ +# `vcmpeqfp` — Vector Compare Equal-to Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100000c6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpeqfp` | `vcmpeqfp` | — | Vector Compare Equal-to Floating Point | +| `vcmpeqfp.` | `vcmpeqfp` | Rc=1 | Vector Compare Equal-to Floating Point | +| `vcmpeqfp128` | `vcmpeqfp128` | — | Vector128 Compare Equal-to Floating Point | +| `vcmpeqfp128.` | `vcmpeqfp128` | Rc=1 | Vector128 Compare Equal-to Floating Point | + +## Syntax + +```asm +vcmpeqfp[Rc] [VD], [VA], [VB] +vcmpeqfp128[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpeqfp` — form `VC` + +- **Opcode word:** `0x100000c6` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `198` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +### `vcmpeqfp128` — form `VX128_R` + +- **Opcode word:** `0x18000000` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpeqfp: read; vcmpeqfp128: read | Source A vector register. | +| `VB` | vcmpeqfp: read; vcmpeqfp128: read | Source B vector register. | +| `VD` | vcmpeqfp: write; vcmpeqfp128: write | Destination vector register. | +| `CR` | vcmpeqfp: write (conditional); vcmpeqfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpeqfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +### `vcmpeqfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpeqfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. +- `vcmpeqfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpeqfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpeqfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:623`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L623) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:560`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L560) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2173-2183`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2173-L2183) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpeqfp | PpcOpcode::vcmpeqfp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpeqfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ +**`vcmpeqfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpeqfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:627`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L627) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:94`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L94) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:681`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L681) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2173-2183`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2173-L2183) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpeqfp | PpcOpcode::vcmpeqfp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpeqfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane mask: all-ones / all-zero.** For each of the four word lanes, `VD[i] = (VA[i] == VB[i]) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word. +- **NaN handling is IEEE-754: never equal.** `NaN == anything` is false (including `NaN == NaN`), so the lane stays zero. This is the standard quiet-compare behaviour — no exception, no sticky flag. +- **Sign of zero ignored.** `+0 == -0` per IEEE-754, so the lane is set to all-ones. +- **`VSCR[NJ]` — denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` *before* the comparison; `±denormal == ±0` then compares as true. This is one of the few VMX float ops where the NJ flag changes program-visible mask values. +- **CR6 update when `Rc=1`** (`vcmpeqfp.`). CR6 is `{any-true, 0, all-true, 0}` = `[lt = all-true, gt = 0, eq = all-false, so = 0]` in the standard mapping; xenia's `update_cr6_from_vmask` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)) handles the bit packing. Use `bc 12,24` for "all-equal" branches and `bc 4,26` for "any-equal". +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to pick between two source vectors per lane. Or combine masks with [`vand`](vand.md) / [`vor`](vor.md) / [`vandc`](vandc.md) to express conjunctions. +- **No `VSCR[SAT]`, no XER changes, no traps** — even on signaling NaNs (Altivec's quiet-compare semantics). +- **VMX128 sibling (`vcmpeqfp128`).** Identical semantics with the extended 128-register encoding; xenia routes both opcodes to one match arm via `vmx_reg_triple`. + +## Related Instructions + +- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — element-wise `>` and `>=` for floats. +- [`vcmpbfp`](vcmpbfp.md) — IEEE bounds check (`±VB`). +- [`vcmpequw`](vcmpequw.md) — same shape, integer compare. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vxor`](vxor.md) — mask consumers. +- [`vminfp`](vminfp.md), [`vmaxfp`](vmaxfp.md) — direct min / max without comparing. + +## IBM Reference + +- [AIX 7.3 — `vcmpeqfp` (Vector Compare Equal-to Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpeqfp-vector-compare-equal-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpequb.md b/migration/project-root/ppc-manual/vmx/vcmpequb.md new file mode 100644 index 0000000..f7f83a0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpequb.md @@ -0,0 +1,140 @@ +# `vcmpequb` — Vector Compare Equal-to Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000006` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpequb` | `vcmpequb` | — | Vector Compare Equal-to Unsigned Byte | +| `vcmpequb.` | `vcmpequb` | Rc=1 | Vector Compare Equal-to Unsigned Byte | + +## Syntax + +```asm +vcmpequb[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpequb` — form `VC` + +- **Opcode word:** `0x10000006` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `6` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpequb: read | Source A vector register. | +| `VB` | vcmpequb: read | Source B vector register. | +| `VD` | vcmpequb: write | Destination vector register. | +| `CR` | vcmpequb: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpequb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpequb`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpequb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:719`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L719) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:557`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L557) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3723-3735`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3723-L3735) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpequb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = if a[i] == b[i] { 0xFF } else { 0 }; } + let v = xenia_types::Vec128::from_bytes(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`. +- **Sign-agnostic.** Equality compare is identical for signed and unsigned bytes; there is no separate `vcmpeqsb`. +- **CR6 update when `Rc=1`** (`vcmpequb.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). Standard SIMD-search idiom: `vcmpequb. vMask, vData, vNeedle` then `bc 12,26` to branch when *no* lane matched. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to pick per-byte between two source vectors. +- **Common usage.** `memchr` / `strlen` / character classification — compare against a broadcast byte (often built with [`vspltisb`](vspltisb.md)) and inspect CR6 for early-out. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpequh`](vcmpequh.md), [`vcmpequw`](vcmpequw.md) — equality compare at half / word width. +- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtsb`](vcmpgtsb.md) — `>` at byte width, unsigned / signed. +- [`vsel`](vsel.md) — primary mask consumer. +- [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask combinators. +- [`vspltisb`](vspltisb.md), [`vspltb`](vspltb.md) — broadcast sources for needle patterns. + +## IBM Reference + +- [AIX 7.3 — `vcmpequb` (Vector Compare Equal-to Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequb-vector-compare-equal-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpequh.md b/migration/project-root/ppc-manual/vmx/vcmpequh.md new file mode 100644 index 0000000..ad69a97 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpequh.md @@ -0,0 +1,139 @@ +# `vcmpequh` — Vector Compare Equal-to Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000046` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpequh` | `vcmpequh` | — | Vector Compare Equal-to Unsigned Half Word | +| `vcmpequh.` | `vcmpequh` | Rc=1 | Vector Compare Equal-to Unsigned Half Word | + +## Syntax + +```asm +vcmpequh[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpequh` — form `VC` + +- **Opcode word:** `0x10000046` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `70` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpequh: read | Source A vector register. | +| `VB` | vcmpequh: read | Source B vector register. | +| `VD` | vcmpequh: write | Destination vector register. | +| `CR` | vcmpequh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpequh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpequh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpequh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:723`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L723) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:558`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L558) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3736-3748`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3736-L3748) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpequh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = if a[i] == b[i] { 0xFFFF } else { 0 }; } + let v = xenia_types::Vec128::from_u16x8_array(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFFFF : 0x0000`. Lane 0 (`VD[0..1]` after `stvx`) is the most-significant half. +- **Sign-agnostic.** Equality is bit-identical for signed and unsigned halves; there is no `vcmpeqsh`. +- **CR6 update when `Rc=1`** (`vcmpequh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Use `bc 12,24` for "all-equal" branches and `bc 12,26` for "no-equal". +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half-word. +- **Common usage.** UTF-16 character classification, audio-sample needle search, indexed-mesh deduplication. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpequb`](vcmpequb.md), [`vcmpequw`](vcmpequw.md) — equality compare at byte / word width. +- [`vcmpgtuh`](vcmpgtuh.md), [`vcmpgtsh`](vcmpgtsh.md) — `>` at half width, unsigned / signed. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vspltish`](vspltish.md), [`vsplth`](vsplth.md) — broadcast sources for needle patterns. + +## IBM Reference + +- [AIX 7.3 — `vcmpequh` (Vector Compare Equal-to Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequh-vector-compare-equal-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpequw.md b/migration/project-root/ppc-manual/vmx/vcmpequw.md new file mode 100644 index 0000000..6c46fd3 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpequw.md @@ -0,0 +1,192 @@ +# `vcmpequw` — Vector Compare Equal-to Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000086` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpequw` | `vcmpequw` | — | Vector Compare Equal-to Unsigned Word | +| `vcmpequw.` | `vcmpequw` | Rc=1 | Vector Compare Equal-to Unsigned Word | +| `vcmpequw128` | `vcmpequw128` | — | Vector128 Compare Equal-to Unsigned Word | +| `vcmpequw128.` | `vcmpequw128` | Rc=1 | Vector128 Compare Equal-to Unsigned Word | + +## Syntax + +```asm +vcmpequw[Rc] [VD], [VA], [VB] +vcmpequw128[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpequw` — form `VC` + +- **Opcode word:** `0x10000086` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `134` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +### `vcmpequw128` — form `VX128_R` + +- **Opcode word:** `0x18000200` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `512` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpequw: read; vcmpequw128: read | Source A vector register. | +| `VB` | vcmpequw: read; vcmpequw128: read | Source B vector register. | +| `VD` | vcmpequw: write; vcmpequw128: write | Destination vector register. | +| `CR` | vcmpequw: write (conditional); vcmpequw128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpequw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +### `vcmpequw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpequw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. +- `vcmpequw128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpequw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:727`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L727) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:559`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L559) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2542-2552`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2542-L2552) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpequw | PpcOpcode::vcmpequw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpequw128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ +**`vcmpequw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpequw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:731`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L731) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:95`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L95) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:685`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L685) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2542-2552`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2542-L2552) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpequw | PpcOpcode::vcmpequw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] == b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpequw128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] == VB[i]) ? 0xFFFFFFFF : 0`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **Sign-agnostic.** Equality is bit-identical for signed and unsigned words; there is no `vcmpeqsw`. +- **CR6 update when `Rc=1`** (`vcmpequw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Classic "did all four 32-bit hash buckets match?" early-out pattern. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word. +- **Common usage.** Hashtable probe matching, packed-RGBA pixel comparisons, packed-int handle equality. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **VMX128 sibling (`vcmpequw128`).** Identical semantics with the extended encoding; xenia routes both to one match arm via `vmx_reg_triple`. + +## Related Instructions + +- [`vcmpequb`](vcmpequb.md), [`vcmpequh`](vcmpequh.md) — equality compare at byte / half width. +- [`vcmpgtuw`](vcmpgtuw.md), [`vcmpgtsw`](vcmpgtsw.md) — `>` at word width, unsigned / signed. +- [`vcmpeqfp`](vcmpeqfp.md) — same shape, IEEE-754 single-precision equality. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vspltisw`](vspltisw.md), [`vspltw`](vspltw.md) — broadcast sources for needle patterns. + +## IBM Reference + +- [AIX 7.3 — `vcmpequw` (Vector Compare Equal-to Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpequw-vector-compare-equal-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgefp.md b/migration/project-root/ppc-manual/vmx/vcmpgefp.md new file mode 100644 index 0000000..d53b8a6 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgefp.md @@ -0,0 +1,192 @@ +# `vcmpgefp` — Vector Compare Greater-Than-or-Equal-to Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100001c6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgefp` | `vcmpgefp` | — | Vector Compare Greater-Than-or-Equal-to Floating Point | +| `vcmpgefp.` | `vcmpgefp` | Rc=1 | Vector Compare Greater-Than-or-Equal-to Floating Point | +| `vcmpgefp128` | `vcmpgefp128` | — | Vector128 Compare Greater-Than-or-Equal-to Floating Point | +| `vcmpgefp128.` | `vcmpgefp128` | Rc=1 | Vector128 Compare Greater-Than-or-Equal-to Floating Point | + +## Syntax + +```asm +vcmpgefp[Rc] [VD], [VA], [VB] +vcmpgefp128[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgefp` — form `VC` + +- **Opcode word:** `0x100001c6` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `454` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +### `vcmpgefp128` — form `VX128_R` + +- **Opcode word:** `0x18000080` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `128` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgefp: read; vcmpgefp128: read | Source A vector register. | +| `VB` | vcmpgefp: read; vcmpgefp128: read | Source B vector register. | +| `VD` | vcmpgefp: write; vcmpgefp128: write | Destination vector register. | +| `CR` | vcmpgefp: write (conditional); vcmpgefp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgefp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +### `vcmpgefp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgefp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. +- `vcmpgefp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgefp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgefp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:631`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L631) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:561`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L561) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2184-2194`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2184-L2194) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgefp | PpcOpcode::vcmpgefp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] >= b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpgefp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ +**`vcmpgefp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgefp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:635`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L635) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:682`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L682) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2184-2194`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2184-L2194) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgefp | PpcOpcode::vcmpgefp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] >= b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpgefp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] >= VB[i]) ? 0xFFFFFFFF : 0`. +- **NaN handling: false.** Any NaN input makes the comparison false (lane stays zero) — matches IEEE-754 quiet-compare semantics. There is no exception, no sticky flag. +- **`+0 >= -0` is true.** Zero signs are ignored. +- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to zero before the compare; this can flip a comparison's outcome relative to strict IEEE. +- **CR6 update when `Rc=1`** (`vcmpgefp.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. `bc 12,24` branches on "all four lanes ≥". +- **Compose with `vsel`.** The mask drives [`vsel`](vsel.md) for per-lane selection. +- **No `VSCR[SAT]`, no XER changes, no traps.** +- **VMX128 sibling (`vcmpgefp128`).** Identical semantics with the extended encoding. + +## Related Instructions + +- [`vcmpgtfp`](vcmpgtfp.md) — strict `>` for floats. +- [`vcmpeqfp`](vcmpeqfp.md) — equality for floats. +- [`vcmpbfp`](vcmpbfp.md) — bounds check `|VA| <= |VB|`. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — direct max / min when the mask isn't needed elsewhere. + +## IBM Reference + +- [AIX 7.3 — `vcmpgefp` (Vector Compare Greater-Than-or-Equal-to Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgefp-vector-compare-greater-than-equal-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtfp.md b/migration/project-root/ppc-manual/vmx/vcmpgtfp.md new file mode 100644 index 0000000..f327d05 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtfp.md @@ -0,0 +1,192 @@ +# `vcmpgtfp` — Vector Compare Greater-Than Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x100002c6` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtfp` | `vcmpgtfp` | — | Vector Compare Greater-Than Floating Point | +| `vcmpgtfp.` | `vcmpgtfp` | Rc=1 | Vector Compare Greater-Than Floating Point | +| `vcmpgtfp128` | `vcmpgtfp128` | — | Vector128 Compare Greater-Than Floating-Point | +| `vcmpgtfp128.` | `vcmpgtfp128` | Rc=1 | Vector128 Compare Greater-Than Floating-Point | + +## Syntax + +```asm +vcmpgtfp[Rc] [VD], [VA], [VB] +vcmpgtfp128[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtfp` — form `VC` + +- **Opcode word:** `0x100002c6` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `710` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +### `vcmpgtfp128` — form `VX128_R` + +- **Opcode word:** `0x18000100` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `256` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `XO` | extended opcode (compare) | +| 26 | `VA128h` | source A middle bit | +| 27 | `Rc` | record-form flag (updates CR6) | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtfp: read; vcmpgtfp128: read | Source A vector register. | +| `VB` | vcmpgtfp: read; vcmpgtfp128: read | Source B vector register. | +| `VD` | vcmpgtfp: write; vcmpgtfp128: write | Destination vector register. | +| `CR` | vcmpgtfp: write (conditional); vcmpgtfp128: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +### `vcmpgtfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtfp`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. +- `vcmpgtfp128`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:639`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L639) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:565`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L565) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2195-2205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2195-L2205) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtfp | PpcOpcode::vcmpgtfp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpgtfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ +**`vcmpgtfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:643`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L643) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:96`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L96) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:683`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L683) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2195-2205`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2195-L2205) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtfp | PpcOpcode::vcmpgtfp128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_f32x4(); + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFF_FFFF } else { 0 }; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + let rc = if matches!(instr.opcode, PpcOpcode::vcmpgtfp128) { instr.vx128r_rc_bit() } else { instr.vc_rc_bit() }; + if rc { update_cr6_from_vmask(&r, ctx); } + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane mask: all-ones / all-zero.** Four word lanes; `VD[i] = (VA[i] > VB[i]) ? 0xFFFFFFFF : 0`. +- **NaN handling: false.** Any NaN input produces a false lane (no sticky flag, no exception) — matches IEEE-754 quiet-compare. +- **`+0 > -0` is false.** Zero signs ignored. +- **`VSCR[NJ]` denormals.** With `NJ = 1`, denormal inputs flush to zero before the compare. +- **CR6 update when `Rc=1`** (`vcmpgtfp.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. Use `bc 12,24` for "all-greater" branches and `bc 12,26` for "no-lane-greater". +- **Compose with `vsel`.** Mask plus [`vsel`](vsel.md) implements per-lane `if (a > b) x else y`. +- **No `VSCR[SAT]`, no XER changes, no traps.** +- **VMX128 sibling (`vcmpgtfp128`).** Identical semantics with the extended encoding. + +## Related Instructions + +- [`vcmpgefp`](vcmpgefp.md) — `>=` for floats. +- [`vcmpeqfp`](vcmpeqfp.md) — equality for floats. +- [`vcmpbfp`](vcmpbfp.md) — bounds check (`|VA| <= |VB|`). +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — direct max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtfp` (Vector Compare Greater-Than Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtfp-vector-compare-greater-than-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtsb.md b/migration/project-root/ppc-manual/vmx/vcmpgtsb.md new file mode 100644 index 0000000..82784a4 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtsb.md @@ -0,0 +1,140 @@ +# `vcmpgtsb` — Vector Compare Greater-Than Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000306` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtsb` | `vcmpgtsb` | — | Vector Compare Greater-Than Signed Byte | +| `vcmpgtsb.` | `vcmpgtsb` | Rc=1 | Vector Compare Greater-Than Signed Byte | + +## Syntax + +```asm +vcmpgtsb[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtsb` — form `VC` + +- **Opcode word:** `0x10000306` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `774` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtsb: read | Source A vector register. | +| `VB` | vcmpgtsb: read | Source B vector register. | +| `VD` | vcmpgtsb: write | Destination vector register. | +| `CR` | vcmpgtsb: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtsb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtsb`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:735`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L735) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:566`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L566) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3762-3774`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3762-L3774) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtsb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = if a[i] > b[i] { 0xFF } else { 0 }; } + let v = xenia_types::Vec128::from_bytes(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (int8(VA[i]) > int8(VB[i])) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`. +- **Sign matters.** Identical bit patterns to [`vcmpgtub`](vcmpgtub.md) compare differently because of the signed interpretation: e.g. `0xFF > 0x01` is `true` unsigned but `false` signed (`-1 > 1`). +- **CR6 update when `Rc=1`** (`vcmpgtsb.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) to select per byte. +- **Common usage.** Signed-byte audio thresholding, signed-difference sign extraction (`vsubsbs` then `vcmpgtsb`). +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtub`](vcmpgtub.md) — same width, unsigned `>`. +- [`vcmpequb`](vcmpequb.md) — equality at byte width. +- [`vcmpgtsh`](vcmpgtsh.md), [`vcmpgtsw`](vcmpgtsw.md) — signed `>` at half / word width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxsb`](vmaxsb.md), [`vminsb`](vminsb.md) — direct signed max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtsb` (Vector Compare Greater-Than Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsb-vector-compare-greater-than-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtsh.md b/migration/project-root/ppc-manual/vmx/vcmpgtsh.md new file mode 100644 index 0000000..4f805a0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtsh.md @@ -0,0 +1,140 @@ +# `vcmpgtsh` — Vector Compare Greater-Than Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000346` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtsh` | `vcmpgtsh` | — | Vector Compare Greater-Than Signed Half Word | +| `vcmpgtsh.` | `vcmpgtsh` | Rc=1 | Vector Compare Greater-Than Signed Half Word | + +## Syntax + +```asm +vcmpgtsh[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtsh` — form `VC` + +- **Opcode word:** `0x10000346` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `838` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtsh: read | Source A vector register. | +| `VB` | vcmpgtsh: read | Source B vector register. | +| `VD` | vcmpgtsh: write | Destination vector register. | +| `CR` | vcmpgtsh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtsh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtsh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:739`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L739) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:567`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L567) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3788-3800`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3788-L3800) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtsh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = if a[i] > b[i] { 0xFFFF } else { 0 }; } + let v = xenia_types::Vec128::from_u16x8_array(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (int16(VA[i]) > int16(VB[i])) ? 0xFFFF : 0x0000`. Lane 0 is the most-significant half. +- **Sign matters.** `0x8000 > 0x0001` is `true` unsigned but `false` signed (`-32768 > 1`). Pick `vcmpgtsh` deliberately when sign bit affects ordering. +- **CR6 update when `Rc=1`** (`vcmpgtsh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half. +- **Common usage.** Q15 audio threshold detection, signed image-processing kernels. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtuh`](vcmpgtuh.md) — same width, unsigned `>`. +- [`vcmpequh`](vcmpequh.md) — equality at half width. +- [`vcmpgtsb`](vcmpgtsb.md), [`vcmpgtsw`](vcmpgtsw.md) — signed `>` at byte / word width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxsh`](vmaxsh.md), [`vminsh`](vminsh.md) — direct signed max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtsh` (Vector Compare Greater-Than Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsh-vector-compare-greater-than-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtsw.md b/migration/project-root/ppc-manual/vmx/vcmpgtsw.md new file mode 100644 index 0000000..f3713d7 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtsw.md @@ -0,0 +1,137 @@ +# `vcmpgtsw` — Vector Compare Greater-Than Signed Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000386` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtsw` | `vcmpgtsw` | — | Vector Compare Greater-Than Signed Word | +| `vcmpgtsw.` | `vcmpgtsw` | Rc=1 | Vector Compare Greater-Than Signed Word | + +## Syntax + +```asm +vcmpgtsw[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtsw` — form `VC` + +- **Opcode word:** `0x10000386` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `902` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtsw: read | Source A vector register. | +| `VB` | vcmpgtsw: read | Source B vector register. | +| `VD` | vcmpgtsw: write | Destination vector register. | +| `CR` | vcmpgtsw: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtsw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtsw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtsw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtsw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:743`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L743) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:568`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L568) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3811-3820`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3811-L3820) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtsw => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFFFFFF } else { 0 }; } + let v = xenia_types::Vec128::from_u32x4_array(r); + if instr.vc_rc_bit() { update_cr6_from_vmask(&r, ctx); } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (int32(VA[i]) > int32(VB[i])) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word. +- **Sign matters.** `0x8000_0000 > 0x0000_0001` is `true` unsigned but `false` signed (`INT32_MIN > 1`). +- **CR6 update when `Rc=1`** (`vcmpgtsw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word. +- **Common usage.** Z-buffer ordering, signed counter thresholds, "argmax" of int32 arrays. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtuw`](vcmpgtuw.md) — same width, unsigned `>`. +- [`vcmpequw`](vcmpequw.md) — equality at word width. +- [`vcmpgtsb`](vcmpgtsb.md), [`vcmpgtsh`](vcmpgtsh.md) — signed `>` at byte / half width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxsw`](vmaxsw.md), [`vminsw`](vminsw.md) — direct signed max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtsw` (Vector Compare Greater-Than Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtsw-vector-compare-greater-than-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtub.md b/migration/project-root/ppc-manual/vmx/vcmpgtub.md new file mode 100644 index 0000000..0db4883 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtub.md @@ -0,0 +1,140 @@ +# `vcmpgtub` — Vector Compare Greater-Than Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000206` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtub` | `vcmpgtub` | — | Vector Compare Greater-Than Unsigned Byte | +| `vcmpgtub.` | `vcmpgtub` | Rc=1 | Vector Compare Greater-Than Unsigned Byte | + +## Syntax + +```asm +vcmpgtub[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtub` — form `VC` + +- **Opcode word:** `0x10000206` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `518` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtub: read | Source A vector register. | +| `VB` | vcmpgtub: read | Source B vector register. | +| `VD` | vcmpgtub: write | Destination vector register. | +| `CR` | vcmpgtub: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtub`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:747`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L747) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:562`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L562) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3749-3761`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3749-L3761) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = if a[i] > b[i] { 0xFF } else { 0 }; } + let v = xenia_types::Vec128::from_bytes(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte mask: all-ones / all-zero.** Sixteen byte lanes; `VD[i] = (uint8(VA[i]) > uint8(VB[i])) ? 0xFF : 0x00`. Lane 0 is the most-significant byte after `stvx`. +- **Sign matters.** `0xFF > 0x01` is `true` unsigned but `false` signed (`-1 > 1`); pick `vcmpgtub` only when both sides should be treated as `0..255`. +- **CR6 update when `Rc=1`** (`vcmpgtub.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]` — built by xenia's `crate::vmx::cr6_flags_from_mask`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per byte. +- **Common usage.** Pixel "brighter than" tests, byte-level histogramming, threshold-based binarisation. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtsb`](vcmpgtsb.md) — same width, signed `>`. +- [`vcmpequb`](vcmpequb.md) — equality at byte width. +- [`vcmpgtuh`](vcmpgtuh.md), [`vcmpgtuw`](vcmpgtuw.md) — unsigned `>` at half / word width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxub`](vmaxub.md), [`vminub`](vminub.md) — direct unsigned max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtub` (Vector Compare Greater-Than Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtub-vector-compare-greater-than-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtuh.md b/migration/project-root/ppc-manual/vmx/vcmpgtuh.md new file mode 100644 index 0000000..1e5c3e6 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtuh.md @@ -0,0 +1,140 @@ +# `vcmpgtuh` — Vector Compare Greater-Than Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000246` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtuh` | `vcmpgtuh` | — | Vector Compare Greater-Than Unsigned Half Word | +| `vcmpgtuh.` | `vcmpgtuh` | Rc=1 | Vector Compare Greater-Than Unsigned Half Word | + +## Syntax + +```asm +vcmpgtuh[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtuh` — form `VC` + +- **Opcode word:** `0x10000246` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `582` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtuh: read | Source A vector register. | +| `VB` | vcmpgtuh: read | Source B vector register. | +| `VD` | vcmpgtuh: write | Destination vector register. | +| `CR` | vcmpgtuh: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtuh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtuh`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtuh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtuh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:751`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L751) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:563`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L563) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3775-3787`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3775-L3787) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtuh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = if a[i] > b[i] { 0xFFFF } else { 0 }; } + let v = xenia_types::Vec128::from_u16x8_array(r); + if instr.vc_rc_bit() { + let (t, f) = crate::vmx::cr6_flags_from_mask(v); + ctx.cr[6] = crate::context::CrField { lt: t, gt: false, eq: f, so: false }; + } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half mask: all-ones / all-zero.** Eight half-word lanes; `VD[i] = (uint16(VA[i]) > uint16(VB[i])) ? 0xFFFF : 0x0000`. Lane 0 is the most-significant half. +- **Sign matters.** `0xFFFF > 0x0001` is `true` unsigned but `false` signed (`-1 > 1`). +- **CR6 update when `Rc=1`** (`vcmpgtuh.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per half. +- **Common usage.** UTF-16 codepoint range testing, unsigned-half threshold binarisation. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtsh`](vcmpgtsh.md) — same width, signed `>`. +- [`vcmpequh`](vcmpequh.md) — equality at half width. +- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtuw`](vcmpgtuw.md) — unsigned `>` at byte / word width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxuh`](vmaxuh.md), [`vminuh`](vminuh.md) — direct unsigned max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtuh` (Vector Compare Greater-Than Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtuh-vector-compare-greater-than-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vcmpgtuw.md b/migration/project-root/ppc-manual/vmx/vcmpgtuw.md new file mode 100644 index 0000000..a7dd19c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vcmpgtuw.md @@ -0,0 +1,137 @@ +# `vcmpgtuw` — Vector Compare Greater-Than Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VC](../forms/VC.md) · **Opcode:** `0x10000286` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcmpgtuw` | `vcmpgtuw` | — | Vector Compare Greater-Than Unsigned Word | +| `vcmpgtuw.` | `vcmpgtuw` | Rc=1 | Vector Compare Greater-Than Unsigned Word | + +## Syntax + +```asm +vcmpgtuw[Rc] [VD], [VA], [VB] +``` + +## Encoding + +### `vcmpgtuw` — form `VC` + +- **Opcode word:** `0x10000286` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `646` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21 | `Rc` | record-form flag (updates CR6) | +| 22–31 | `XO` | extended opcode (10 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vcmpgtuw: read | Source A vector register. | +| `VB` | vcmpgtuw: read | Source B vector register. | +| `VD` | vcmpgtuw: write | Destination vector register. | +| `CR` | vcmpgtuw: write (conditional) | Condition-register update. When `Rc=1`, CR field 0 (or CR6 for vector compares, CR1 for FPU) is updated from the result. | + +## Register Effects + +### `vcmpgtuw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** `CR` + +## Status-Register Effects + +- `vcmpgtuw`: **CR6** ← `[all-true, 0, all-false, 0]` when `Rc=1`. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcmpgtuw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcmpgtuw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:755`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L755) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:97`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L97) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:564`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L564) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3801-3810`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3801-L3810) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcmpgtuw => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] > b[i] { 0xFFFFFFFF } else { 0 }; } + let v = xenia_types::Vec128::from_u32x4_array(r); + if instr.vc_rc_bit() { update_cr6_from_vmask(&r, ctx); } + ctx.vr[instr.rd()] = v; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word mask: all-ones / all-zero.** Four word lanes; `VD[i] = (uint32(VA[i]) > uint32(VB[i])) ? 0xFFFFFFFF : 0`. Lane 0 is the most-significant word. +- **Sign matters.** `0x8000_0000 > 0x0000_0001` is `true` unsigned but `false` signed. +- **CR6 update when `Rc=1`** (`vcmpgtuw.`). CR6 = `[lt = all-true, gt = 0, eq = all-false, so = 0]`. +- **Compose with `vsel`.** Mask drives [`vsel`](vsel.md) per word. +- **Common usage.** Hashtable bucket selection, packed-RGBA bit-pattern ordering, ID range checks. +- **No `VSCR` interaction, no XER, no traps.** +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vcmpgtsw`](vcmpgtsw.md) — same width, signed `>`. +- [`vcmpequw`](vcmpequw.md) — equality at word width. +- [`vcmpgtub`](vcmpgtub.md), [`vcmpgtuh`](vcmpgtuh.md) — unsigned `>` at byte / half width. +- [`vsel`](vsel.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vxor`](vxor.md) — mask consumers / combinators. +- [`vmaxuw`](vmaxuw.md), [`vminuw`](vminuw.md) — direct unsigned max / min. + +## IBM Reference + +- [AIX 7.3 — `vcmpgtuw` (Vector Compare Greater-Than Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vcmpgtuw-vector-compare-greater-than-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Vector Compares](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vctsxs.md b/migration/project-root/ppc-manual/vmx/vctsxs.md new file mode 100644 index 0000000..af958e8 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vctsxs.md @@ -0,0 +1,138 @@ +# `vctsxs` — Vector Convert to Signed Fixed-Point Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100003ca` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vctsxs` | `vctsxs` | — | Vector Convert to Signed Fixed-Point Word Saturate | + +## Syntax + +```asm +vctsxs [VD], [VB], [UIMM] +``` + +## Encoding + +### `vctsxs` — form `VX` + +- **Opcode word:** `0x100003ca` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `970` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vctsxs: read | Source B vector register. | +| `UIMM` | vctsxs: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vctsxs: write | Destination vector register. | +| `VSCR` | vctsxs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vctsxs` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vctsxs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vctsxs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vctsxs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:536`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L536) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:517`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L517) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4281-4292`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4281-L4292) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vctsxs => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::cvt_f32_to_i32_sat(b[i], uimm); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Convert IEEE float lane to signed-Q `int32`, saturating.** For each of the four word lanes, `VD[i] = clamp(round_toward_zero(VB[i] * 2^UIMM), INT32_MIN, INT32_MAX)`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`. +- **Saturating, not wrapping.** Out-of-range floats clamp to `INT32_MIN` (negative overflow) or `INT32_MAX` (positive overflow) — *not* the wrap-around behaviour of x86 `cvttps2dq` (which produces `0x80000000` on overflow regardless of sign). Xenia's `crate::vmx::cvt_f32_to_i32_sat` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) handles the difference. +- **NaN → 0.** A NaN input becomes `0` in the output lane and stickies `VSCR[SAT]`. (Many references state "NaN → INT32_MIN"; verify against [`vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs) for the canonical xenia behaviour, which differs from POWER ISA wording.) +- **`VSCR[SAT]` is sticky-set** if any lane saturates (overflow or NaN). Cleared only by [`mtvscr`](mtvscr.md). +- **Rounding is truncate-toward-zero.** Always; no per-instruction rounding control. +- **`VSCR[NJ]` flushes denormal *inputs* to zero before scaling** (Xenon default). +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **No XER changes, no traps.** +- **No VMX128 sibling.** +- **Inverse of [`vcfsx`](vcfsx.md)**, but the inverse direction saturates rather than wraps — round-trips lose the magnitude of out-of-range values. + +## Related Instructions + +- [`vctuxs`](vctuxs.md) — same shape, unsigned destination. +- [`vcfsx`](vcfsx.md), [`vcfux`](vcfux.md) — inverse direction (int → float with Q-shift). +- [`vrfin`](vrfin.md), [`vrfip`](vrfip.md), [`vrfim`](vrfim.md), [`vrfiz`](vrfiz.md) — float-to-float rounding modes (round-to-nearest, up, down, toward-zero) when staying in float. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`. + +## IBM Reference + +- [AIX 7.3 — `vctsxs` (Vector Convert to Signed Fixed-Point Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vctsxs-vector-convert-signed-fixed-point-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vctuxs.md b/migration/project-root/ppc-manual/vmx/vctuxs.md new file mode 100644 index 0000000..c1ea025 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vctuxs.md @@ -0,0 +1,137 @@ +# `vctuxs` — Vector Convert to Unsigned Fixed-Point Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000038a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vctuxs` | `vctuxs` | — | Vector Convert to Unsigned Fixed-Point Word Saturate | + +## Syntax + +```asm +vctuxs [VD], [VB], [UIMM] +``` + +## Encoding + +### `vctuxs` — form `VX` + +- **Opcode word:** `0x1000038a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `906` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vctuxs: read | Source B vector register. | +| `UIMM` | vctuxs: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vctuxs: write | Destination vector register. | +| `VSCR` | vctuxs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vctuxs` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vctuxs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vctuxs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vctuxs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:554`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L554) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:515`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L515) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4293-4304`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4293-L4304) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vctuxs => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::cvt_f32_to_u32_sat(b[i], uimm); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Convert IEEE float lane to unsigned-Q `uint32`, saturating.** For each of the four word lanes, `VD[i] = clamp(round_toward_zero(VB[i] * 2^UIMM), 0, UINT32_MAX)`. The 5-bit `UIMM` (bits 11..15) gives the Q-format fractional shift, in `0..31`. +- **Saturating, not wrapping.** Negative inputs clamp to `0`; values above `2^32 − 1` clamp to `0xFFFF_FFFF`. NaN → `0`. All clamping events sticky-set `VSCR[SAT]`. Xenia's `crate::vmx::cvt_f32_to_u32_sat` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)) handles the boundaries. +- **`VSCR[SAT]` sticky.** Cleared only by [`mtvscr`](mtvscr.md). +- **Rounding is truncate-toward-zero.** Always. +- **`VSCR[NJ]` flushes denormal inputs to zero before scaling** (Xenon default). +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **No XER changes, no traps.** +- **No VMX128 sibling.** +- **Common usage.** Float colour `[0.0, 1.0]` → packed `0..2^N−1` integer with `vctuxs vD, vColor, 8` (Q24.8 → `0..255` after a [`vpkshus`](vpkshus.md)) or `, 32` for full unsigned-int range. + +## Related Instructions + +- [`vctsxs`](vctsxs.md) — same shape, signed destination. +- [`vcfsx`](vcfsx.md), [`vcfux`](vcfux.md) — inverse direction. +- [`vrfin`](vrfin.md), [`vrfip`](vrfip.md), [`vrfim`](vrfim.md), [`vrfiz`](vrfiz.md) — float-to-float rounding modes. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`. + +## IBM Reference + +- [AIX 7.3 — `vctuxs` (Vector Convert to Unsigned Fixed-Point Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vctuxs-vector-convert-unsigned-fixed-point-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Conversion Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vexptefp.md b/migration/project-root/ppc-manual/vmx/vexptefp.md new file mode 100644 index 0000000..247c896 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vexptefp.md @@ -0,0 +1,180 @@ +# `vexptefp` — Vector 2 Raised to the Exponent Estimate Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vexptefp` | `vexptefp` | — | Vector 2 Raised to the Exponent Estimate Floating Point | +| `vexptefp128` | `vexptefp128` | — | Vector128 Log2 Estimate Floating Point | + +## Syntax + +```asm +vexptefp [VD], [VB] +vexptefp128 [VD], [VB] +``` + +## Encoding + +### `vexptefp` — form `VX` + +- **Opcode word:** `0x1000018a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `394` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vexptefp128` — form `VX128_3` + +- **Opcode word:** `0x180006b0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1712` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vexptefp: read; vexptefp128: read | Source B vector register. | +| `VD` | vexptefp: write; vexptefp128: write | Destination vector register. | + +## Register Effects + +### `vexptefp` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vexptefp128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vexptefp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vexptefp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:766`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L766) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:469`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L469) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4367-4376`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4367-L4376) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vexptefp | PpcOpcode::vexptefp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vexptefp128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].exp2(); } + ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vexptefp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vexptefp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:769`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L769) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:666`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L666) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4367-4376`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4367-L4376) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vexptefp | PpcOpcode::vexptefp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vexptefp128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].exp2(); } + ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane base-2 exponent.** Each of the four word lanes computes `VD[i] = 2^VB[i]` in `binary32`. **Note:** the IBM manual specifies a low-precision estimate (≤ 1/16 ULP relative error). Xenia uses Rust's `f32::exp2`, which is full-precision — programs that depend on hardware-quality estimation may observe small numerical differences. +- **Use `vlogefp` for the inverse.** The natural pair is `vexptefp(vlogefp(x)) = x` for positive finite `x`, modulo each estimate's error budget. +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **NaN, ±∞.** `2^NaN = NaN`; `2^(+∞) = +∞`; `2^(-∞) = +0`. Subnormal results may be flushed to `±0` if `VSCR[NJ] = 1` (Xenon default). +- **No exception, no `VSCR[SAT]` change, no XER change.** +- **VMX128 sibling (`vexptefp128`).** Identical semantics with the extended encoding. +- **Build natural exp / log via change-of-base.** `e^x = 2^(x * log2(e))`, so combine `vmaddfp` (multiply-by-constant) with `vexptefp`. + +## Related Instructions + +- [`vlogefp`](vlogefp.md) — base-2 logarithm (the inverse). +- [`vrefp`](vrefp.md) — reciprocal estimate. +- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal-square-root estimate. +- [`vmaddfp`](vmaddfp.md) — fused multiply-add for change-of-base scaling. +- [`vmulfp`](vmulfp.md) — float multiply (xenia helper). + +## IBM Reference + +- [AIX 7.3 — `vexptefp` (Vector 2 Raised to the Exponent Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vexptefp-vector-2-raised-exponent-estimate-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Estimate Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vlogefp.md b/migration/project-root/ppc-manual/vmx/vlogefp.md new file mode 100644 index 0000000..950ac4d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vlogefp.md @@ -0,0 +1,179 @@ +# `vlogefp` — Vector Log2 Estimate Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100001ca` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vlogefp` | `vlogefp` | — | Vector Log2 Estimate Floating Point | +| `vlogefp128` | `vlogefp128` | — | Vector128 Log2 Estimate Floating Point | + +## Syntax + +```asm +vlogefp [VD], [VB] +vlogefp128 [VD], [VB] +``` + +## Encoding + +### `vlogefp` — form `VX` + +- **Opcode word:** `0x100001ca` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `458` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vlogefp128` — form `VX128_3` + +- **Opcode word:** `0x180006f0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1776` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vlogefp: read; vlogefp128: read | Source B vector register. | +| `VD` | vlogefp: write; vlogefp128: write | Destination vector register. | + +## Register Effects + +### `vlogefp` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vlogefp128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vlogefp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vlogefp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:779`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L779) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:473`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L473) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4377-4386`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4377-L4386) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vlogefp | PpcOpcode::vlogefp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vlogefp128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].log2(); } + ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vlogefp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vlogefp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:782`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L782) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:99`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L99) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:667`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L667) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4377-4386`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4377-L4386) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vlogefp | PpcOpcode::vlogefp128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vlogefp128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = ctx.vr[rb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].log2(); } + ctx.vr[rd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane base-2 logarithm.** Each of the four word lanes computes `VD[i] = log2(VB[i])` in `binary32`. **Note:** the IBM manual specifies a low-precision estimate (≤ 1/32 ULP relative error). Xenia uses Rust's `f32::log2`, which is full-precision; hardware-precise programs may observe small numerical differences. +- **Use `vexptefp` for the inverse.** Pair gives `2^(log2(x)) ≈ x` for positive finite `x`. +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **NaN, negatives, zero, ±∞.** `log2(negative)` and `log2(NaN)` produce NaN; `log2(+0) = -∞`; `log2(-0) = -∞` (per IEEE-754); `log2(+∞) = +∞`. None of these stickies `VSCR[SAT]` — float ops never touch SAT. +- **No exception, no `VSCR[SAT]` change, no XER change.** +- **VMX128 sibling (`vlogefp128`).** Identical semantics with the extended encoding. +- **Natural log via change-of-base.** `ln(x) = log2(x) * (1 / log2(e))` — multiply by a constant with `vmaddfp`. + +## Related Instructions + +- [`vexptefp`](vexptefp.md) — base-2 exponent (the inverse). +- [`vrefp`](vrefp.md) — reciprocal estimate. +- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal-square-root estimate. +- [`vmaddfp`](vmaddfp.md) — fused multiply-add for change-of-base scaling. + +## IBM Reference + +- [AIX 7.3 — `vlogefp` (Vector log2 Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vlogefp-vector-log2-estimate-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Estimate Instructions](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaddfp.md b/migration/project-root/ppc-manual/vmx/vmaddfp.md new file mode 100644 index 0000000..72fc231 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaddfp.md @@ -0,0 +1,196 @@ +# `vmaddfp` — Vector Multiply-Add Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaddfp` | `vmaddfp` | — | Vector Multiply-Add Floating Point | +| `vmaddfp128` | `vmaddfp128` | — | Vector128 Multiply Add Floating Point | + +## Syntax + +```asm +vmaddfp [VD], [VA], [VC], [VB] +vmaddfp128 [VD], [VA], [VB], [VD] +``` + +## Encoding + +### `vmaddfp` — form `VA` + +- **Opcode word:** `0x1000002e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `46` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +### `vmaddfp128` — form `VX128` + +- **Opcode word:** `0x140000d0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `208` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaddfp: read; vmaddfp128: read | Source A vector register. | +| `VC` | vmaddfp: read; vmaddfp128: read | Source C vector register / 3-bit selector. | +| `VB` | vmaddfp: read; vmaddfp128: read | Source B vector register. | +| `VD` | vmaddfp: write; vmaddfp128: write | Destination vector register. | + +## Register Effects + +### `vmaddfp` + +- **Reads (always):** `VA`, `VC`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vmaddfp128` + +- **Reads (always):** `VA`, `VC`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +for each 32-bit float lane i in 0..3: + VD[i] <- (VA[i] * VC[i]) + VB[i] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaddfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:801`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L801) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:588`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L588) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2038-2054`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2038-L2054) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaddfp => { + // vD = (vA * vC) + vB. AltiVec unconditionally flushes denormal + // *inputs* to 0 regardless of VSCR[NJ] (confirmed on POWER8 hw). + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let c = ctx.vr[instr.rc()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + let ci = vmx::flush_denorm(c[i]); + // PPCBUG-437: flush subnormal output too. + r[i] = vmx::flush_denorm(ai.mul_add(ci, bi)); + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vmaddfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:805`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L805) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:613`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L613) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2055-2073`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2055-L2073) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaddfp128 => { + // ISA: (VD) <- (VA × VD) + VB. VD is both the second multiplicand and destination. + // Canary InstrEmit_vmaddfp128 (ppc_emit_altivec.cc:806-809): MulAdd(VA, VD, VB). + // Previous code computed ai.mul_add(bi, di) = VA×VB+VD — VB and VD roles swapped + // (PPCBUG-424). Fix: ai.mul_add(di, bi) = VA×VD+VB. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let d = ctx.vr[instr.vd128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + let di = vmx::flush_denorm(d[i]); + // PPCBUG-437. + r[i] = vmx::flush_denorm(ai.mul_add(di, bi)); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Fused multiply-add: `VD = (VA * VC) + VB`** per word lane (single rounding). No intermediate rounding between the multiply and the add — this is critical for numerical accuracy in DSP filters and reduces error in dot products. +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **NaN propagation, ±∞ arithmetic.** Standard IEEE-754: any NaN input yields NaN; `(+∞ * 0)` yields NaN; the sum of `+∞` and `-∞` (e.g. `(+∞ * 1) + -∞`) yields NaN. No trap, no sticky bit. +- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs and outputs are flushed to `±0`. +- **No `VSCR[SAT]` change, no XER change, no exceptions.** +- **VMX128 sibling has surprising operand layout — `VD` is also a source.** Xenia's `vmaddfp128` reads `VA`, `VB`, *and `VD` itself* (as the accumulator), computing `VD = (VA * VB) + VD_prev` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)). The standard `vmaddfp` keeps the canonical 4-operand `VA, VC, VB → VD` shape. **This is a real difference in operand encoding** (VX128_3 form vs. VA-form) that compilers must respect — VMX128 sacrifices the third source register slot for the extra register-file bits. +- **Aliasing legal.** `vmaddfp v3, v3, v3, v3` works (squares + adds itself). +- **Common usage.** Per-lane polynomial evaluation, dot-product accumulation, any matrix multiply inner loop. Pair four `vmaddfp` instructions to do a 4×4 × 4-vec multiply. + +## Related Instructions + +- [`vnmsubfp`](vnmsubfp.md) — `−((VA * VC) − VB)`; fused negative-multiply-subtract. +- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — plain float add / subtract. +- [`vmulfp`](vmulfp.md) — xenia helper for `VA * VC`; on hardware games use `vmaddfp v, va, vc, v0_zero`. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — min / max for clamping. +- [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — reciprocal / inverse-sqrt estimates that often appear in the same FMA chain. + +## IBM Reference + +- [AIX 7.3 — `vmaddfp` (Vector Multiply-Add Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaddfp-vector-multiply-add-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxfp.md b/migration/project-root/ppc-manual/vmx/vmaxfp.md new file mode 100644 index 0000000..0c188ac --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxfp.md @@ -0,0 +1,182 @@ +# `vmaxfp` — Vector Maximum Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000040a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxfp` | `vmaxfp` | — | Vector Maximum Floating Point | +| `vmaxfp128` | `vmaxfp128` | — | Vector128 Maximum Floating Point | + +## Syntax + +```asm +vmaxfp [VD], [VA], [VB] +vmaxfp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxfp` — form `VX` + +- **Opcode word:** `0x1000040a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1034` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vmaxfp128` — form `VX128` + +- **Opcode word:** `0x18000280` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `640` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxfp: read; vmaxfp128: read | Source A vector register. | +| `VB` | vmaxfp: read; vmaxfp128: read | Source B vector register. | +| `VD` | vmaxfp: write; vmaxfp128: write | Destination vector register. | + +## Register Effects + +### `vmaxfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vmaxfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:831`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L831) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:522`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L522) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2121-2128`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2121-L2128) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxfp => { + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = vmx::max_nan(a[i], b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vmaxfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:834`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L834) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:696`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L696) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2129-2136`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2129-L2136) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxfp128 => { + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = vmx::max_nan(a[i], b[i]); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane IEEE max.** Four word lanes; `VD[i] = (VA[i] > VB[i]) ? VA[i] : VB[i]`. +- **NaN propagation surprise.** Xenia uses `if a > b { a } else { b }`, so any NaN comparison evaluates false and the result is `VB`. The IBM manual specifies "the larger of `VA[i]` and `VB[i]`, with NaN handling such that any NaN input yields a NaN result" — this is *not* what xenia does. Hardware's `vmaxfp(NaN, x) = NaN` while xenia returns `x`. **Worth checking against `vmx.rs` for any future correctness fixes.** +- **Sign of zero.** `vmaxfp(+0, -0)` returns `-0` in xenia (since `+0 > -0` is false → returns `b = -0`). The hardware likely returns the sign-positive zero — also worth verifying. +- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` before comparison. +- **No `VSCR[SAT]` change, no XER change, no exceptions.** +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **Aliasing legal.** `vmaxfp v3, v3, v4` is the standard "clamp from below by `v4`" idiom. +- **VMX128 sibling (`vmaxfp128`).** Identical comparator semantics with the extended encoding. + +## Related Instructions + +- [`vminfp`](vminfp.md) — the per-lane minimum. +- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — combine masks with arbitrary alternatives. +- [`vmaddfp`](vmaddfp.md) — fused multiply-add when the max is part of a polynomial. +- [`vmaxsw`](vmaxsw.md) — integer-word max if the lanes are signed integers. + +## IBM Reference + +- [AIX 7.3 — `vmaxfp` (Vector Maximum Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxfp-vector-maximum-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxsb.md b/migration/project-root/ppc-manual/vmx/vmaxsb.md new file mode 100644 index 0000000..b94ef14 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxsb.md @@ -0,0 +1,130 @@ +# `vmaxsb` — Vector Maximum Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000102` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxsb` | `vmaxsb` | — | Vector Maximum Signed Byte | + +## Syntax + +```asm +vmaxsb [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxsb` — form `VX` + +- **Opcode word:** `0x10000102` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `258` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxsb: read | Source A vector register. | +| `VB` | vmaxsb: read | Source B vector register. | +| `VD` | vmaxsb: write | Destination vector register. | + +## Register Effects + +### `vmaxsb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:838`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L838) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:454`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L454) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4407-4414`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4407-L4414) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxsb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i8; 16]; + for i in 0..16 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte signed max.** Sixteen byte lanes; `VD[i] = max(int8(VA[i]), int8(VB[i]))`. Lane 0 is the most-significant byte. +- **Sign-aware ordering.** `vmaxsb(0xFF, 0x01) = 0x01` (i.e. `max(-1, 1) = 1`), versus [`vmaxub`](vmaxub.md) which would return `0xFF`. Pick `vmaxsb` deliberately when both sides are signed. +- **No `VSCR` interaction, no XER, no exceptions.** Pure compare-select. +- **Common usage.** Per-lane clamping with [`vminsb`](vminsb.md) implements `clamp(x, lo, hi)` with no branch. +- **Aliasing legal.** `vmaxsb v3, v3, v4` is "raise lower-bound to `v4`" idiom. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminsb`](vminsb.md) — the matching minimum. +- [`vmaxub`](vmaxub.md) — same width, unsigned max. +- [`vmaxsh`](vmaxsh.md), [`vmaxsw`](vmaxsw.md) — signed max at half / word width. +- [`vcmpgtsb`](vcmpgtsb.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection style with arbitrary fallbacks. + +## IBM Reference + +- [AIX 7.3 — `vmaxsb` (Vector Maximum Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsb-vector-maximum-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxsh.md b/migration/project-root/ppc-manual/vmx/vmaxsh.md new file mode 100644 index 0000000..c2c997f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxsh.md @@ -0,0 +1,130 @@ +# `vmaxsh` — Vector Maximum Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000142` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxsh` | `vmaxsh` | — | Vector Maximum Signed Half Word | + +## Syntax + +```asm +vmaxsh [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxsh` — form `VX` + +- **Opcode word:** `0x10000142` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `322` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxsh: read | Source A vector register. | +| `VB` | vmaxsh: read | Source B vector register. | +| `VD` | vmaxsh: write | Destination vector register. | + +## Register Effects + +### `vmaxsh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:845`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L845) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:460`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L460) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4439-4446`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4439-L4446) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxsh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half signed max.** Eight half-word lanes; `VD[i] = max(int16(VA[i]), int16(VB[i]))`. Lane 0 is the most-significant half. +- **Sign-aware ordering.** `vmaxsh(0x8000, 0x0001) = 0x0001` (i.e. `max(-32768, 1) = 1`), versus [`vmaxuh`](vmaxuh.md) which would return `0x8000`. +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Q15 audio peak detection; signed image-processing kernels. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminsh`](vminsh.md) — the matching minimum. +- [`vmaxuh`](vmaxuh.md) — same width, unsigned max. +- [`vmaxsb`](vmaxsb.md), [`vmaxsw`](vmaxsw.md) — signed max at byte / word width. +- [`vcmpgtsh`](vcmpgtsh.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection style with arbitrary fallbacks. + +## IBM Reference + +- [AIX 7.3 — `vmaxsh` (Vector Maximum Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsh-vector-maximum-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxsw.md b/migration/project-root/ppc-manual/vmx/vmaxsw.md new file mode 100644 index 0000000..6be39fc --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxsw.md @@ -0,0 +1,130 @@ +# `vmaxsw` — Vector Maximum Signed Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000182` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxsw` | `vmaxsw` | — | Vector Maximum Signed Word | + +## Syntax + +```asm +vmaxsw [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxsw` — form `VX` + +- **Opcode word:** `0x10000182` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `386` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxsw: read | Source A vector register. | +| `VB` | vmaxsw: read | Source B vector register. | +| `VD` | vmaxsw: write | Destination vector register. | + +## Register Effects + +### `vmaxsw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxsw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxsw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:852`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L852) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:467`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L467) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4471-4478`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4471-L4478) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxsw => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word signed max.** Four word lanes; `VD[i] = max(int32(VA[i]), int32(VB[i]))`. Lane 0 is the most-significant word. +- **Sign-aware ordering.** `vmaxsw(0x8000_0000, 0x0000_0001) = 0x0000_0001` (i.e. `max(INT32_MIN, 1) = 1`). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Z-buffer "keep nearest" updates, signed counter ceilings. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminsw`](vminsw.md) — the matching minimum. +- [`vmaxuw`](vmaxuw.md) — same width, unsigned max. +- [`vmaxsb`](vmaxsb.md), [`vmaxsh`](vmaxsh.md) — signed max at byte / half width. +- [`vcmpgtsw`](vcmpgtsw.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vmaxsw` (Vector Maximum Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxsw-vector-maximum-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxub.md b/migration/project-root/ppc-manual/vmx/vmaxub.md new file mode 100644 index 0000000..7c14577 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxub.md @@ -0,0 +1,130 @@ +# `vmaxub` — Vector Maximum Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000002` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxub` | `vmaxub` | — | Vector Maximum Unsigned Byte | + +## Syntax + +```asm +vmaxub [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxub` — form `VX` + +- **Opcode word:** `0x10000002` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `2` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxub: read | Source A vector register. | +| `VB` | vmaxub: read | Source B vector register. | +| `VD` | vmaxub: write | Destination vector register. | + +## Register Effects + +### `vmaxub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:859`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L859) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:435`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L435) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4391-4398`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4391-L4398) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte unsigned max.** Sixteen byte lanes; `VD[i] = max(uint8(VA[i]), uint8(VB[i]))`. Lane 0 is the most-significant byte. +- **Unsigned ordering.** `vmaxub(0xFF, 0x01) = 0xFF`, opposite to [`vmaxsb`](vmaxsb.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Pixel "brighter of two" channel selection; alpha mask combining. +- **Aliasing legal.** `vmaxub v3, v3, v4` raises `v3`'s lower bound to `v4`. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminub`](vminub.md) — the matching minimum. +- [`vmaxsb`](vmaxsb.md) — same width, signed max. +- [`vmaxuh`](vmaxuh.md), [`vmaxuw`](vmaxuw.md) — unsigned max at half / word width. +- [`vcmpgtub`](vcmpgtub.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vmaxub` (Vector Maximum Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxub-vector-maximum-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxuh.md b/migration/project-root/ppc-manual/vmx/vmaxuh.md new file mode 100644 index 0000000..e6e0220 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxuh.md @@ -0,0 +1,130 @@ +# `vmaxuh` — Vector Maximum Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000042` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxuh` | `vmaxuh` | — | Vector Maximum Unsigned Half Word | + +## Syntax + +```asm +vmaxuh [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxuh` — form `VX` + +- **Opcode word:** `0x10000042` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `66` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxuh: read | Source A vector register. | +| `VB` | vmaxuh: read | Source B vector register. | +| `VD` | vmaxuh: write | Destination vector register. | + +## Register Effects + +### `vmaxuh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxuh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxuh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:867`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L867) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:442`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L442) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4423-4430`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4423-L4430) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxuh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half unsigned max.** Eight half-word lanes; `VD[i] = max(uint16(VA[i]), uint16(VB[i]))`. Lane 0 is the most-significant half. +- **Unsigned ordering.** `vmaxuh(0xFFFF, 0x0001) = 0xFFFF`, opposite to [`vmaxsh`](vmaxsh.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Audio sample magnitude tracking; UTF-16 codepoint upper bound. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminuh`](vminuh.md) — the matching minimum. +- [`vmaxsh`](vmaxsh.md) — same width, signed max. +- [`vmaxub`](vmaxub.md), [`vmaxuw`](vmaxuw.md) — unsigned max at byte / word width. +- [`vcmpgtuh`](vcmpgtuh.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vmaxuh` (Vector Maximum Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxuh-vector-maximum-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmaxuw.md b/migration/project-root/ppc-manual/vmx/vmaxuw.md new file mode 100644 index 0000000..c44300f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmaxuw.md @@ -0,0 +1,130 @@ +# `vmaxuw` — Vector Maximum Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000082` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaxuw` | `vmaxuw` | — | Vector Maximum Unsigned Word | + +## Syntax + +```asm +vmaxuw [VD], [VA], [VB] +``` + +## Encoding + +### `vmaxuw` — form `VX` + +- **Opcode word:** `0x10000082` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `130` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaxuw: read | Source A vector register. | +| `VB` | vmaxuw: read | Source B vector register. | +| `VD` | vmaxuw: write | Destination vector register. | + +## Register Effects + +### `vmaxuw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaxuw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaxuw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:875`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L875) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:101`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L101) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:449`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L449) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4455-4462`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4455-L4462) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaxuw => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i].max(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word unsigned max.** Four word lanes; `VD[i] = max(uint32(VA[i]), uint32(VB[i]))`. Lane 0 is the most-significant word. +- **Unsigned ordering.** `vmaxuw(0x8000_0000, 0x0000_0001) = 0x8000_0000`, opposite to [`vmaxsw`](vmaxsw.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Hashtable bucket capacity tracking, packed 32-bit ID upper bounds. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vminuw`](vminuw.md) — the matching minimum. +- [`vmaxsw`](vmaxsw.md) — same width, signed max. +- [`vmaxub`](vmaxub.md), [`vmaxuh`](vmaxuh.md) — unsigned max at byte / half width. +- [`vcmpgtuw`](vcmpgtuw.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vmaxuw` (Vector Maximum Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmaxuw-vector-maximum-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmhaddshs.md b/migration/project-root/ppc-manual/vmx/vmhaddshs.md new file mode 100644 index 0000000..9d1f62f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmhaddshs.md @@ -0,0 +1,146 @@ +# `vmhaddshs` — Vector Multiply-High and Add Signed Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000020` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmhaddshs` | `vmhaddshs` | — | Vector Multiply-High and Add Signed Signed Half Word Saturate | + +## Syntax + +```asm +vmhaddshs [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmhaddshs` — form `VA` + +- **Opcode word:** `0x10000020` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `32` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmhaddshs: read | Source A vector register. | +| `VB` | vmhaddshs: read | Source B vector register. | +| `VC` | vmhaddshs: read | Source C vector register / 3-bit selector. | +| `VD` | vmhaddshs: write | Destination vector register. | +| `VSCR` | vmhaddshs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vmhaddshs` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vmhaddshs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmhaddshs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmhaddshs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:883`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L883) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:102`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L102) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:576`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L576) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3519-3533`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3519-L3533) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmhaddshs => { + // vD[i] = sat_i16((vA[i] * vB[i]) >> 15 + vC[i]) + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let c = crate::vmx::as_i16x8(ctx.vr[instr.rc()]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..8 { + let prod = (a[i] as i32 * b[i] as i32) >> 15; + let (v, s) = crate::vmx::sat_i32_to_i16(prod + c[i] as i32); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Q15 fixed-point multiply-add, saturating.** Eight half-word lanes; per lane: + ``` + prod = (int16(VA[i]) * int16(VB[i])) >> 15 ; truncating, no rounding + VD[i] = clamp(prod + int16(VC[i]), -32768, +32767) + ``` + The "h" in the mnemonic is "high half" — only the upper 17 bits of the 32-bit signed product survive (after >>15), then the accumulator is added. +- **Truncating, not rounding.** Bit 14 of the product is discarded silently. Use [`vmhraddshs`](vmhraddshs.md) when half-up rounding is needed (it adds `0x4000` to the product before the shift). The two are otherwise identical. +- **`VSCR[SAT]` is sticky-set** if `prod + VC[i]` overflows `int16`. Cleared only by [`mtvscr`](mtvscr.md). Xenia uses `crate::vmx::sat_i32_to_i16` ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). +- **Pathological case `0x8000 * 0x8000 >> 15`.** Equals `0x10000` in the un-saturated product = `+32768` after the shift, which overflows `int16` even before adding `VC`. The clamp then produces `+32767` and stickies SAT. This is the classic Q15 "minus-one-times-minus-one" gotcha. +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **No XER changes, no exceptions.** +- **No VMX128 sibling.** +- **Common usage.** Q15 IIR / FIR filter taps, fixed-point matrix-vector multiplies for audio. + +## Related Instructions + +- [`vmhraddshs`](vmhraddshs.md) — same operation with rounded multiply (`+0x4000` before `>> 15`). +- [`vmladduhm`](vmladduhm.md) — same shape, modulo (no shift, no saturate), unsigned half lanes. +- [`vmsumshs`](vmsumshs.md), [`vmsumshm`](vmsumshm.md) — multiply-sum across pairs of lanes. +- [`vaddshs`](vaddshs.md), [`vmaxsh`](vmaxsh.md) — saturating add and max at the same lane width, useful in the same DSP kernels. + +## IBM Reference + +- [AIX 7.3 — `vmhaddshs` (Vector Multiply-High and Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmhaddshs-vector-multiply-high-add-signed-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmhraddshs.md b/migration/project-root/ppc-manual/vmx/vmhraddshs.md new file mode 100644 index 0000000..1396b82 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmhraddshs.md @@ -0,0 +1,146 @@ +# `vmhraddshs` — Vector Multiply-High Round and Add Signed Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000021` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmhraddshs` | `vmhraddshs` | — | Vector Multiply-High Round and Add Signed Signed Half Word Saturate | + +## Syntax + +```asm +vmhraddshs [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmhraddshs` — form `VA` + +- **Opcode word:** `0x10000021` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `33` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmhraddshs: read | Source A vector register. | +| `VB` | vmhraddshs: read | Source B vector register. | +| `VC` | vmhraddshs: read | Source C vector register / 3-bit selector. | +| `VD` | vmhraddshs: write | Destination vector register. | +| `VSCR` | vmhraddshs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vmhraddshs` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vmhraddshs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmhraddshs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmhraddshs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:888`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L888) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:102`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L102) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:577`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L577) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3534-3548`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3534-L3548) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmhraddshs => { + // Rounded multiply-add: (vA[i]*vB[i] + 0x4000) >> 15 + vC[i], saturating. + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let c = crate::vmx::as_i16x8(ctx.vr[instr.rc()]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..8 { + let prod = (a[i] as i32 * b[i] as i32 + 0x4000) >> 15; + let (v, s) = crate::vmx::sat_i32_to_i16(prod + c[i] as i32); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Rounded Q15 fixed-point multiply-add, saturating.** Eight half-word lanes; per lane: + ``` + prod = (int16(VA[i]) * int16(VB[i]) + 0x4000) >> 15 ; round half-up + VD[i] = clamp(prod + int16(VC[i]), -32768, +32767) + ``` + Identical to [`vmhaddshs`](vmhaddshs.md) except for the `+0x4000` rounding bias before the shift. +- **Half-up rounding to even-magnitude.** The `+0x4000` bias rounds the discarded low 15 bits *toward* the nearest representable value, with ties broken away from zero. For most DSP work this is the desired behaviour and gives lower mean error than the truncating variant. +- **`VSCR[SAT]` is sticky-set** if the final sum overflows `int16`. The rounding bias can itself push a lane that was at `+32767` past the cap — important for tight Q15 audio where the truncating form might not have saturated. +- **Same `0x8000 * 0x8000 >> 15` gotcha** as `vmhaddshs`: the product is `+32768.5` rounded to `+32769`, which still saturates. +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **No XER changes, no exceptions.** +- **No VMX128 sibling.** +- **Common usage.** High-quality Q15 audio filter taps where round-toward-nearest is preferred over truncate-toward-zero. + +## Related Instructions + +- [`vmhaddshs`](vmhaddshs.md) — same op without the rounding bias. +- [`vmladduhm`](vmladduhm.md) — same shape, modulo (no shift, no saturate), unsigned. +- [`vmsumshs`](vmsumshs.md), [`vmsumshm`](vmsumshm.md) — multiply-sum across pairs of lanes. +- [`vaddshs`](vaddshs.md), [`vmaxsh`](vmaxsh.md) — saturating add and max at same lane width. + +## IBM Reference + +- [AIX 7.3 — `vmhraddshs` (Vector Multiply-High Round and Add Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmhraddshs-vector-multiply-high-round-add-signed-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminfp.md b/migration/project-root/ppc-manual/vmx/vminfp.md new file mode 100644 index 0000000..127aa61 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminfp.md @@ -0,0 +1,182 @@ +# `vminfp` — Vector Minimum Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000044a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminfp` | `vminfp` | — | Vector Minimum Floating Point | +| `vminfp128` | `vminfp128` | — | Vector128 Minimum Floating Point | + +## Syntax + +```asm +vminfp [VD], [VA], [VB] +vminfp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vminfp` — form `VX` + +- **Opcode word:** `0x1000044a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1098` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vminfp128` — form `VX128` + +- **Opcode word:** `0x180002c0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `704` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminfp: read; vminfp128: read | Source A vector register. | +| `VB` | vminfp: read; vminfp128: read | Source B vector register. | +| `VD` | vminfp: write; vminfp128: write | Destination vector register. | + +## Register Effects + +### `vminfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vminfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:899`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L899) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:527`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L527) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2137-2144`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2137-L2144) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminfp => { + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = vmx::min_nan(a[i], b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vminfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:902`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L902) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:697`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L697) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2145-2152`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2145-L2152) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminfp128 => { + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = vmx::min_nan(a[i], b[i]); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane IEEE min.** Four word lanes; `VD[i] = (VA[i] < VB[i]) ? VA[i] : VB[i]`. +- **NaN propagation surprise.** Xenia uses `if a < b { a } else { b }`, so any NaN comparison evaluates false and the result is `VB`. The IBM manual specifies NaN-propagating min — i.e. NaN inputs should yield NaN. Hardware's `vminfp(NaN, x) = NaN` while xenia returns `x`. **Worth checking against `vmx.rs` for any future correctness fixes.** +- **Sign of zero.** `vminfp(+0, -0)` returns `-0` in xenia (since `+0 < -0` is false → returns `b = -0`); hardware likely returns the negative zero too via the same comparator. +- **`VSCR[NJ]` denormals.** With `NJ = 1` (Xenon default), denormal inputs are flushed to `±0` before comparison. +- **No `VSCR[SAT]` change, no XER change, no exceptions.** +- **Big-endian word lanes.** Lane 0 is the most-significant word. +- **Aliasing legal.** `vminfp v3, v3, v4` clamps `v3` from above by `v4`. +- **VMX128 sibling (`vminfp128`).** Identical comparator semantics with the extended encoding. + +## Related Instructions + +- [`vmaxfp`](vmaxfp.md) — the per-lane maximum. +- [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — combine masks with arbitrary alternatives. +- [`vmaddfp`](vmaddfp.md) — fused multiply-add when the min is part of a polynomial. +- [`vminsw`](vminsw.md) — integer-word min if the lanes are signed integers. + +## IBM Reference + +- [AIX 7.3 — `vminfp` (Vector Minimum Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminfp-vector-minimum-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminsb.md b/migration/project-root/ppc-manual/vmx/vminsb.md new file mode 100644 index 0000000..bf4598a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminsb.md @@ -0,0 +1,130 @@ +# `vminsb` — Vector Minimum Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000302` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminsb` | `vminsb` | — | Vector Minimum Signed Byte | + +## Syntax + +```asm +vminsb [VD], [VA], [VB] +``` + +## Encoding + +### `vminsb` — form `VX` + +- **Opcode word:** `0x10000302` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `770` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminsb: read | Source A vector register. | +| `VB` | vminsb: read | Source B vector register. | +| `VD` | vminsb: write | Destination vector register. | + +## Register Effects + +### `vminsb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:906`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L906) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:499`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L499) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4415-4422`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4415-L4422) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminsb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i8; 16]; + for i in 0..16 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte signed min.** Sixteen byte lanes; `VD[i] = min(int8(VA[i]), int8(VB[i]))`. Lane 0 is the most-significant byte. +- **Sign-aware ordering.** `vminsb(0xFF, 0x01) = 0xFF` (i.e. `min(-1, 1) = -1`), versus [`vminub`](vminub.md) which returns `0x01`. +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Pair with [`vmaxsb`](vmaxsb.md) for branchless `clamp(x, lo, hi)`. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxsb`](vmaxsb.md) — the matching maximum. +- [`vminub`](vminub.md) — same width, unsigned min. +- [`vminsh`](vminsh.md), [`vminsw`](vminsw.md) — signed min at half / word width. +- [`vcmpgtsb`](vcmpgtsb.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminsb` (Vector Minimum Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsb-vector-minimum-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminsh.md b/migration/project-root/ppc-manual/vmx/vminsh.md new file mode 100644 index 0000000..44fa219 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminsh.md @@ -0,0 +1,130 @@ +# `vminsh` — Vector Minimum Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000342` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminsh` | `vminsh` | — | Vector Minimum Signed Half Word | + +## Syntax + +```asm +vminsh [VD], [VA], [VB] +``` + +## Encoding + +### `vminsh` — form `VX` + +- **Opcode word:** `0x10000342` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `834` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminsh: read | Source A vector register. | +| `VB` | vminsh: read | Source B vector register. | +| `VD` | vminsh: write | Destination vector register. | + +## Register Effects + +### `vminsh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:913`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L913) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:506`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L506) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4447-4454`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4447-L4454) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminsh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half signed min.** Eight half-word lanes; `VD[i] = min(int16(VA[i]), int16(VB[i]))`. Lane 0 is the most-significant half. +- **Sign-aware ordering.** `vminsh(0x8000, 0x0001) = 0x8000` (i.e. `min(-32768, 1) = -32768`). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Q15 audio noise-floor computation; signed image-processing kernels. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxsh`](vmaxsh.md) — the matching maximum. +- [`vminuh`](vminuh.md) — same width, unsigned min. +- [`vminsb`](vminsb.md), [`vminsw`](vminsw.md) — signed min at byte / word width. +- [`vcmpgtsh`](vcmpgtsh.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminsh` (Vector Minimum Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsh-vector-minimum-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminsw.md b/migration/project-root/ppc-manual/vmx/vminsw.md new file mode 100644 index 0000000..1bfe6ef --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminsw.md @@ -0,0 +1,130 @@ +# `vminsw` — Vector Minimum Signed Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000382` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminsw` | `vminsw` | — | Vector Minimum Signed Word | + +## Syntax + +```asm +vminsw [VD], [VA], [VB] +``` + +## Encoding + +### `vminsw` — form `VX` + +- **Opcode word:** `0x10000382` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `898` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminsw: read | Source A vector register. | +| `VB` | vminsw: read | Source B vector register. | +| `VD` | vminsw: write | Destination vector register. | + +## Register Effects + +### `vminsw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminsw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminsw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:920`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L920) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:513`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L513) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4479-4486`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4479-L4486) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminsw => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word signed min.** Four word lanes; `VD[i] = min(int32(VA[i]), int32(VB[i]))`. Lane 0 is the most-significant word. +- **Sign-aware ordering.** `vminsw(0x8000_0000, 0x0000_0001) = 0x8000_0000` (i.e. `min(INT32_MIN, 1) = INT32_MIN`). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Z-buffer "keep furthest" updates, signed counter floors. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxsw`](vmaxsw.md) — the matching maximum. +- [`vminuw`](vminuw.md) — same width, unsigned min. +- [`vminsb`](vminsb.md), [`vminsh`](vminsh.md) — signed min at byte / half width. +- [`vcmpgtsw`](vcmpgtsw.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminsw` (Vector Minimum Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminsw-vector-minimum-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminub.md b/migration/project-root/ppc-manual/vmx/vminub.md new file mode 100644 index 0000000..0007edd --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminub.md @@ -0,0 +1,130 @@ +# `vminub` — Vector Minimum Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000202` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminub` | `vminub` | — | Vector Minimum Unsigned Byte | + +## Syntax + +```asm +vminub [VD], [VA], [VB] +``` + +## Encoding + +### `vminub` — form `VX` + +- **Opcode word:** `0x10000202` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `514` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminub: read | Source A vector register. | +| `VB` | vminub: read | Source B vector register. | +| `VD` | vminub: write | Destination vector register. | + +## Register Effects + +### `vminub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:927`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L927) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:476`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L476) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4399-4406`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4399-L4406) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte unsigned min.** Sixteen byte lanes; `VD[i] = min(uint8(VA[i]), uint8(VB[i]))`. Lane 0 is the most-significant byte. +- **Unsigned ordering.** `vminub(0xFF, 0x01) = 0x01`, opposite to [`vminsb`](vminsb.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Pixel "darker of two" channel selection; alpha mask intersection. +- **Aliasing legal.** `vminub v3, v3, v4` clamps `v3`'s upper bound to `v4`. +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxub`](vmaxub.md) — the matching maximum. +- [`vminsb`](vminsb.md) — same width, signed min. +- [`vminuh`](vminuh.md), [`vminuw`](vminuw.md) — unsigned min at half / word width. +- [`vcmpgtub`](vcmpgtub.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminub` (Vector Minimum Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminub-vector-minimum-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminuh.md b/migration/project-root/ppc-manual/vmx/vminuh.md new file mode 100644 index 0000000..0fe7632 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminuh.md @@ -0,0 +1,130 @@ +# `vminuh` — Vector Minimum Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000242` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminuh` | `vminuh` | — | Vector Minimum Unsigned Half Word | + +## Syntax + +```asm +vminuh [VD], [VA], [VB] +``` + +## Encoding + +### `vminuh` — form `VX` + +- **Opcode word:** `0x10000242` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `578` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminuh: read | Source A vector register. | +| `VB` | vminuh: read | Source B vector register. | +| `VD` | vminuh: write | Destination vector register. | + +## Register Effects + +### `vminuh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminuh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminuh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:935`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L935) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:483`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L483) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4431-4438`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4431-L4438) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminuh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-half unsigned min.** Eight half-word lanes; `VD[i] = min(uint16(VA[i]), uint16(VB[i]))`. Lane 0 is the most-significant half. +- **Unsigned ordering.** `vminuh(0xFFFF, 0x0001) = 0x0001`, opposite to [`vminsh`](vminsh.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Audio sample magnitude floor; UTF-16 codepoint lower bound. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxuh`](vmaxuh.md) — the matching maximum. +- [`vminsh`](vminsh.md) — same width, signed min. +- [`vminub`](vminub.md), [`vminuw`](vminuw.md) — unsigned min at byte / word width. +- [`vcmpgtuh`](vcmpgtuh.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminuh` (Vector Minimum Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminuh-vector-minimum-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vminuw.md b/migration/project-root/ppc-manual/vmx/vminuw.md new file mode 100644 index 0000000..2270f30 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vminuw.md @@ -0,0 +1,130 @@ +# `vminuw` — Vector Minimum Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000282` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vminuw` | `vminuw` | — | Vector Minimum Unsigned Word | + +## Syntax + +```asm +vminuw [VD], [VA], [VB] +``` + +## Encoding + +### `vminuw` — form `VX` + +- **Opcode word:** `0x10000282` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `642` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vminuw: read | Source A vector register. | +| `VB` | vminuw: read | Source B vector register. | +| `VD` | vminuw: write | Destination vector register. | + +## Register Effects + +### `vminuw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vminuw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vminuw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:943`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L943) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:103`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L103) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:490`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L490) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4463-4470`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4463-L4470) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vminuw => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i].min(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word unsigned min.** Four word lanes; `VD[i] = min(uint32(VA[i]), uint32(VB[i]))`. Lane 0 is the most-significant word. +- **Unsigned ordering.** `vminuw(0x8000_0000, 0x0000_0001) = 0x0000_0001`, opposite to [`vminsw`](vminsw.md). +- **No `VSCR` interaction, no XER, no exceptions.** +- **Common usage.** Hashtable bucket capacity floors, packed 32-bit ID lower bounds. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmaxuw`](vmaxuw.md) — the matching maximum. +- [`vminsw`](vminsw.md) — same width, signed min. +- [`vminub`](vminub.md), [`vminuh`](vminuh.md) — unsigned min at byte / half width. +- [`vcmpgtuw`](vcmpgtuw.md) — separate compare-and-mask path. +- [`vsel`](vsel.md) — alternative selection. + +## IBM Reference + +- [AIX 7.3 — `vminuw` (Vector Minimum Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vminuw-vector-minimum-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Min/Max](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmladduhm.md b/migration/project-root/ppc-manual/vmx/vmladduhm.md new file mode 100644 index 0000000..f8901b7 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmladduhm.md @@ -0,0 +1,141 @@ +# `vmladduhm` — Vector Multiply-Low and Add Unsigned Half Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000022` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmladduhm` | `vmladduhm` | — | Vector Multiply-Low and Add Unsigned Half Word Modulo | + +## Syntax + +```asm +vmladduhm [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmladduhm` — form `VA` + +- **Opcode word:** `0x10000022` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `34` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmladduhm: read | Source A vector register. | +| `VB` | vmladduhm: read | Source B vector register. | +| `VC` | vmladduhm: read | Source C vector register / 3-bit selector. | +| `VD` | vmladduhm: write | Destination vector register. | + +## Register Effects + +### `vmladduhm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmladduhm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmladduhm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:951`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L951) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:104`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L104) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:578`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L578) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3549-3560`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3549-L3560) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmladduhm => { + // Multiply-low add (modulo): vD[i] = u16(vA[i] * vB[i] + vC[i]). + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let c = ctx.vr[instr.rc()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { + r[i] = a[i].wrapping_mul(b[i]).wrapping_add(c[i]); + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Modulo multiply-low add.** Eight half-word lanes; per lane: + ``` + VD[i] = (uint16(VA[i]) * uint16(VB[i]) + uint16(VC[i])) mod 2^16 + ``` + Only the **low** 16 bits of the 32-bit product survive — the "ml" in the mnemonic = "multiply low" (versus `vmh*` for "multiply high"). This is the fastest of the multiply-add family because nothing saturates and nothing rounds. +- **Sign-agnostic.** Modulo multiply for signed `int16` and unsigned `u16` is bit-identical at the low 16 bits, so this single instruction serves both. +- **No `VSCR[SAT]` change.** Wrap is silent. +- **No XER, no exceptions.** +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **Aliasing legal.** `vmladduhm v3, v3, v4, v3` is the standard accumulate idiom (same register as both `VA` and `VC`). +- **No VMX128 sibling.** +- **Common usage.** Stride / index computation in vector loops, RGBA8 component recombination after a [`vupkhsb`](vupkhsb.md), per-element polynomial evaluation at half precision. + +## Related Instructions + +- [`vmhaddshs`](vmhaddshs.md) — saturating high-half signed multiply-add (Q15). +- [`vmhraddshs`](vmhraddshs.md) — same, with rounding. +- [`vmsumuhm`](vmsumuhm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum across pairs of lanes. +- [`vadduhm`](vadduhm.md), [`vmaxuh`](vmaxuh.md) — companion modulo / max ops at half width. + +## IBM Reference + +- [AIX 7.3 — `vmladduhm` (Vector Multiply-Low and Add Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmladduhm-vector-multiply-low-add-unsigned-half-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Add Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrghb.md b/migration/project-root/ppc-manual/vmx/vmrghb.md new file mode 100644 index 0000000..49c7d6c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrghb.md @@ -0,0 +1,131 @@ +# `vmrghb` — Vector Merge High Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrghb` | `vmrghb` | — | Vector Merge High Byte | + +## Syntax + +```asm +vmrghb [VD], [VA], [VB] +``` + +## Encoding + +### `vmrghb` — form `VX` + +- **Opcode word:** `0x1000000c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `12` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrghb: read | Source A vector register. | +| `VB` | vmrghb: read | Source B vector register. | +| `VD` | vmrghb: write | Destination vector register. | + +## Register Effects + +### `vmrghb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrghb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:956`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L956) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:439`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L439) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3982-3989`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3982-L3989) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrghb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..8 { r[2*i] = a[i]; r[2*i+1] = b[i]; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the high (most-significant) eight bytes** of two vectors. After execution, `VD = {VA[0], VB[0], VA[1], VB[1], …, VA[7], VB[7]}`, i.e. the eight high-order bytes of `VA` are interleaved with the eight high-order bytes of `VB`. Because lane 0 is the most-significant byte (big-endian indexing), "high" means the byte that appears at the lowest address after `stvx`. +- **Pairs with [`vmrglb`](vmrglb.md).** Together they cover all 32 input bytes — `vmrghb` produces output of bytes 0..7 from each source, `vmrglb` of bytes 8..15. Two `vmrg*` instructions plus a [`stvx`](stvx.md) of each output produces the AoS-from-SoA transpose. +- **Useful for unpacking 8-bit channels.** `vmrghb vRG, vR, vG` followed by `vmrghb vRGBA, vRG, vBA` interleaves four byte-streams into RGBA pixels. +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** `vmrghb v3, v3, v3` doubles each high byte of `v3`. +- **No VMX128 sibling.** +- **Equivalent to x86 `_mm_unpackhi_epi8`** with operand orientation swapped (Altivec uses big-endian lane numbering, x86 little-endian, so "high" on PPC ↔ "low" lane indices on x86). + +## Related Instructions + +- [`vmrglb`](vmrglb.md) — the "low half" mirror. +- [`vmrghh`](vmrghh.md), [`vmrghw`](vmrghw.md) — high-half merge at half / word width. +- [`vperm`](vperm.md) — fully programmable permute when neither merge half fits. +- [`vsldoi`](vsldoi.md) — static-offset shift-double, often paired with `vmrg*` for AoS↔SoA conversions. +- [`vupkhsb`](vupkhsb.md) — sign-extending unpack of the high half. + +## IBM Reference + +- [AIX 7.3 — `vmrghb` (Vector Merge High Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghb-vector-merge-high-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrghh.md b/migration/project-root/ppc-manual/vmx/vmrghh.md new file mode 100644 index 0000000..d0e2fbd --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrghh.md @@ -0,0 +1,131 @@ +# `vmrghh` — Vector Merge High Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000004c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrghh` | `vmrghh` | — | Vector Merge High Half Word | + +## Syntax + +```asm +vmrghh [VD], [VA], [VB] +``` + +## Encoding + +### `vmrghh` — form `VX` + +- **Opcode word:** `0x1000004c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `76` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrghh: read | Source A vector register. | +| `VB` | vmrghh: read | Source B vector register. | +| `VD` | vmrghh: write | Destination vector register. | + +## Register Effects + +### `vmrghh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrghh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:968`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L968) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:446`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L446) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3998-4005`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3998-L4005) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrghh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..4 { r[2*i] = a[i]; r[2*i+1] = b[i]; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the high (most-significant) four halves** of two vectors: `VD = {VA[0], VB[0], VA[1], VB[1], VA[2], VB[2], VA[3], VB[3]}`. +- **Pairs with [`vmrglh`](vmrglh.md)** to cover the eight halves of each source. Two-instruction transpose for half-word streams. +- **Common usage.** Interleave Q15 stereo audio: `vmrghh vL_R_high, vLeft, vRight` then `vmrglh vL_R_low, vLeft, vRight` and store to produce the natural L/R/L/R ordering. +- **Useful for half-precision colour split.** Merge two 4-channel half-precision streams. +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmrglh`](vmrglh.md) — the "low half" mirror. +- [`vmrghb`](vmrghb.md), [`vmrghw`](vmrghw.md) — high-half merge at byte / word width. +- [`vperm`](vperm.md) — programmable permute. +- [`vsldoi`](vsldoi.md) — static-offset shift-double. +- [`vupkhsh`](vupkhsh.md) — sign-extending unpack of the high half (4 halves → 4 words). + +## IBM Reference + +- [AIX 7.3 — `vmrghh` (Vector Merge High Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghh-vector-merge-high-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrghw.md b/migration/project-root/ppc-manual/vmx/vmrghw.md new file mode 100644 index 0000000..515d6ad --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrghw.md @@ -0,0 +1,180 @@ +# `vmrghw` — Vector Merge High Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000008c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrghw` | `vmrghw` | — | Vector Merge High Word | +| `vmrghw128` | `vmrghw128` | — | Vector128 Merge High Word | + +## Syntax + +```asm +vmrghw [VD], [VA], [VB] +vmrghw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmrghw` — form `VX` + +- **Opcode word:** `0x1000008c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `140` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vmrghw128` — form `VX128` + +- **Opcode word:** `0x18000300` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `768` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrghw: read; vmrghw128: read | Source A vector register. | +| `VB` | vmrghw: read; vmrghw128: read | Source B vector register. | +| `VD` | vmrghw: write; vmrghw128: write | Destination vector register. | + +## Register Effects + +### `vmrghw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vmrghw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrghw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:989`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L989) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:451`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L451) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2378-2385`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2378-L2385) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrghw | PpcOpcode::vmrghw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + // Merge high words: [a0, b0, a1, b1] + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[0], b[0], a[1], b[1]); + ctx.pc += 4; + } +``` +
+ +**`vmrghw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrghw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:992`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L992) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:698`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L698) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2378-2385`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2378-L2385) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrghw | PpcOpcode::vmrghw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + // Merge high words: [a0, b0, a1, b1] + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[0], b[0], a[1], b[1]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the high (most-significant) two words** of two vectors: `VD = {VA[0], VB[0], VA[1], VB[1]}`. Lane 0 (`VD[0..3]` after `stvx`) is the most-significant word. +- **Pairs with [`vmrglw`](vmrglw.md)** to cover the four words of each source. Two-instruction word-level transpose. +- **Common usage.** Interleave matrix rows during a 4×4 transpose: four `vmrgh*`/`vmrgl*` pairs swap rows and columns of a 4×4 packed-float matrix. +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** `vmrghw v3, v3, v3` doubles each high word. +- **VMX128 sibling (`vmrghw128`).** Identical semantics with the extended encoding; xenia routes via `vmx_reg_triple`. + +## Related Instructions + +- [`vmrglw`](vmrglw.md) — the "low half" mirror. +- [`vmrghb`](vmrghb.md), [`vmrghh`](vmrghh.md) — high-half merge at byte / half width. +- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives. +- [`vupkhsh`](vupkhsh.md) — sign-extending unpack of the high half (4 halves → 4 words). +- [`vspltw`](vspltw.md) — broadcast a single word for blending tasks. + +## IBM Reference + +- [AIX 7.3 — `vmrghw` (Vector Merge High Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrghw-vector-merge-high-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrglb.md b/migration/project-root/ppc-manual/vmx/vmrglb.md new file mode 100644 index 0000000..61cfbe7 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrglb.md @@ -0,0 +1,130 @@ +# `vmrglb` — Vector Merge Low Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrglb` | `vmrglb` | — | Vector Merge Low Byte | + +## Syntax + +```asm +vmrglb [VD], [VA], [VB] +``` + +## Encoding + +### `vmrglb` — form `VX` + +- **Opcode word:** `0x1000010c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `268` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrglb: read | Source A vector register. | +| `VB` | vmrglb: read | Source B vector register. | +| `VD` | vmrglb: write | Destination vector register. | + +## Register Effects + +### `vmrglb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrglb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:996`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L996) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:458`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L458) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3990-3997`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3990-L3997) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrglb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..8 { r[2*i] = a[8+i]; r[2*i+1] = b[8+i]; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the low (least-significant) eight bytes** of two vectors: `VD = {VA[8], VB[8], VA[9], VB[9], …, VA[15], VB[15]}`. "Low" in PPC big-endian terms means the eight bytes at the higher byte addresses after `stvx`. +- **Pairs with [`vmrghb`](vmrghb.md).** Together they cover all 32 input bytes — one `vmrghb` plus one `vmrglb` is the standard 16-byte "interleave-then-store" pattern. +- **Common usage.** Second half of an AoS-from-SoA transpose for 8-bit channels (the high half is produced by `vmrghb`, the low half by `vmrglb`). +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** `vmrglb v3, v3, v3` doubles each low byte of `v3`. +- **No VMX128 sibling.** +- **Equivalent to x86 `_mm_unpacklo_epi8`** modulo lane-numbering convention. + +## Related Instructions + +- [`vmrghb`](vmrghb.md) — the "high half" mirror. +- [`vmrglh`](vmrglh.md), [`vmrglw`](vmrglw.md) — low-half merge at half / word width. +- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives. +- [`vupklsb`](vupklsb.md) — sign-extending unpack of the low half. + +## IBM Reference + +- [AIX 7.3 — `vmrglb` (Vector Merge Low Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglb-vector-merge-low-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrglh.md b/migration/project-root/ppc-manual/vmx/vmrglh.md new file mode 100644 index 0000000..2be233f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrglh.md @@ -0,0 +1,129 @@ +# `vmrglh` — Vector Merge Low Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrglh` | `vmrglh` | — | Vector Merge Low Half Word | + +## Syntax + +```asm +vmrglh [VD], [VA], [VB] +``` + +## Encoding + +### `vmrglh` — form `VX` + +- **Opcode word:** `0x1000014c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `332` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrglh: read | Source A vector register. | +| `VB` | vmrglh: read | Source B vector register. | +| `VD` | vmrglh: write | Destination vector register. | + +## Register Effects + +### `vmrglh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrglh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1008`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1008) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:464`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L464) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4006-4013`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4006-L4013) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrglh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..4 { r[2*i] = a[4+i]; r[2*i+1] = b[4+i]; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the low (least-significant) four halves** of two vectors: `VD = {VA[4], VB[4], VA[5], VB[5], VA[6], VB[6], VA[7], VB[7]}`. +- **Pairs with [`vmrghh`](vmrghh.md)** to interleave the entire 8-half source range. The two instructions plus a [`stvx`](stvx.md) of each result produces an interleaved 16-half stream from two 8-half streams. +- **Common usage.** Stereo Q15 audio interleave (low half of stream); paired with `vupklsh` for sign-extending unpack. +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vmrghh`](vmrghh.md) — the "high half" mirror. +- [`vmrglb`](vmrglb.md), [`vmrglw`](vmrglw.md) — low-half merge at byte / word width. +- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives. +- [`vupklsh`](vupklsh.md) — sign-extending unpack of the low half (4 halves → 4 words). + +## IBM Reference + +- [AIX 7.3 — `vmrglh` (Vector Merge Low Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglh-vector-merge-low-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmrglw.md b/migration/project-root/ppc-manual/vmx/vmrglw.md new file mode 100644 index 0000000..5b6d6d0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmrglw.md @@ -0,0 +1,179 @@ +# `vmrglw` — Vector Merge Low Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmrglw` | `vmrglw` | — | Vector Merge Low Word | +| `vmrglw128` | `vmrglw128` | — | Vector128 Merge Low Word | + +## Syntax + +```asm +vmrglw [VD], [VA], [VB] +vmrglw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmrglw` — form `VX` + +- **Opcode word:** `0x1000018c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `396` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vmrglw128` — form `VX128` + +- **Opcode word:** `0x18000340` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `832` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmrglw: read; vmrglw128: read | Source A vector register. | +| `VB` | vmrglw: read; vmrglw128: read | Source B vector register. | +| `VD` | vmrglw: write; vmrglw128: write | Destination vector register. | + +## Register Effects + +### `vmrglw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vmrglw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmrglw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1030`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1030) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:470`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L470) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2386-2393`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2386-L2393) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrglw | PpcOpcode::vmrglw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + // Merge low words: [a2, b2, a3, b3] + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[2], b[2], a[3], b[3]); + ctx.pc += 4; + } +``` +
+ +**`vmrglw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmrglw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1033`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1033) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:105`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L105) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:699`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L699) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2386-2393`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2386-L2393) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmrglw | PpcOpcode::vmrglw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + // Merge low words: [a2, b2, a3, b3] + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(a[2], b[2], a[3], b[3]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Interleave the low (least-significant) two words** of two vectors: `VD = {VA[2], VB[2], VA[3], VB[3]}`. Lane 0 is the most-significant word. +- **Pairs with [`vmrghw`](vmrghw.md)** to cover the four words of each source. Two-instruction word-level transpose. +- **Common usage.** Bottom half of a 4×4 packed-float matrix transpose; second-half RGBA pixel re-pack after a `vmrghw`. +- **No `VSCR` interaction, no XER, no exceptions.** Pure permute. +- **Aliasing legal.** +- **VMX128 sibling (`vmrglw128`).** Identical semantics with the extended encoding; xenia routes both via `vmx_reg_triple`. + +## Related Instructions + +- [`vmrghw`](vmrghw.md) — the "high half" mirror. +- [`vmrglb`](vmrglb.md), [`vmrglh`](vmrglh.md) — low-half merge at byte / half width. +- [`vperm`](vperm.md), [`vsldoi`](vsldoi.md) — programmable / static permute primitives. +- [`vspltw`](vspltw.md) — broadcast a single word for blending tasks. + +## IBM Reference + +- [AIX 7.3 — `vmrglw` (Vector Merge Low Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmrglw-vector-merge-low-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute / Merge](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsummbm.md b/migration/project-root/ppc-manual/vmx/vmsummbm.md new file mode 100644 index 0000000..353879c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsummbm.md @@ -0,0 +1,146 @@ +# `vmsummbm` — Vector Multiply-Sum Mixed-Sign Byte Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000025` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsummbm` | `vmsummbm` | — | Vector Multiply-Sum Mixed-Sign Byte Modulo | + +## Syntax + +```asm +vmsummbm [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsummbm` — form `VA` + +- **Opcode word:** `0x10000025` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `37` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsummbm: read | Source A vector register. | +| `VB` | vmsummbm: read | Source B vector register. | +| `VC` | vmsummbm: read | Source C vector register / 3-bit selector. | +| `VD` | vmsummbm: write | Destination vector register. | + +## Register Effects + +### `vmsummbm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsummbm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsummbm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1037`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1037) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:580`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L580) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3579-3594`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3579-L3594) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsummbm => { + // signed bytes × unsigned bytes, signed accumulator + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = ctx.vr[instr.rb()].as_bytes(); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]); + let mut r = [0i32; 4]; + for i in 0..4 { + let mut s = c[i]; + for j in 0..4 { + s = s.wrapping_add(a[4*i+j] as i32 * b[4*i+j] as i32); + } + r[i] = s; + } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Mixed signed×unsigned multiply-sum, modulo.** The "m" / "b" / "m" of `vmsummbm` decode as: `m`=mixed (signed `VA` × unsigned `VB`), `b`=byte lanes, `m`=modulo accumulator. Per word lane: + ``` + VD[i] = (VC[i] + Σ_{j=0..3} int8(VA[4*i + j]) * uint8(VB[4*i + j])) mod 2^32 + ``` + Four signed-byte × unsigned-byte products are summed with a signed-word accumulator from `VC`, into a single signed word. +- **Mixed signedness is unique to this instruction** — it's the canonical "signed pixel weight × unsigned pixel value" combo for filter convolution. +- **No `VSCR[SAT]` change.** Modulo wrap; the saturating sibling for byte lanes does not exist (Altivec only provides a saturating `vmsum` for half-word widths). +- **Big-endian byte lanes.** Lane 0 is the most-significant byte; the four contributing bytes for output word `i` are bytes `4*i .. 4*i+3`. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Per-tile signed-weight pixel sums; H.264-style 4-tap signed filter on byte data. + +## Related Instructions + +- [`vmsumubm`](vmsumubm.md) — same shape, both sources unsigned (no signed weights). +- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — half-word × half-word multiply-sum (modulo / saturate). +- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half-word multiply-sum. +- [`vmladduhm`](vmladduhm.md) — per-lane multiply-add at half width (no horizontal reduction). +- [`vsumsws`](vsumsws.md), [`vsum4sbs`](vsum4sbs.md) — pure horizontal sums. + +## IBM Reference + +- [AIX 7.3 — `vmsummbm` (Vector Multiply-Sum Mixed-Sign Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsummbm-vector-multiply-sum-mixed-sign-byte-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsumshm.md b/migration/project-root/ppc-manual/vmx/vmsumshm.md new file mode 100644 index 0000000..2ac9b6f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsumshm.md @@ -0,0 +1,144 @@ +# `vmsumshm` — Vector Multiply-Sum Signed Half Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000028` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsumshm` | `vmsumshm` | — | Vector Multiply-Sum Signed Half Word Modulo | + +## Syntax + +```asm +vmsumshm [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsumshm` — form `VA` + +- **Opcode word:** `0x10000028` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `40` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsumshm: read | Source A vector register. | +| `VB` | vmsumshm: read | Source B vector register. | +| `VC` | vmsumshm: read | Source C vector register / 3-bit selector. | +| `VD` | vmsumshm: write | Destination vector register. | + +## Register Effects + +### `vmsumshm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsumshm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumshm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1042`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1042) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:583`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L583) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3625-3638`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3625-L3638) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsumshm => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]); + let mut r = [0i32; 4]; + for i in 0..4 { + let s = (a[2*i] as i32 * b[2*i] as i32) + .wrapping_add(a[2*i+1] as i32 * b[2*i+1] as i32) + .wrapping_add(c[i]); + r[i] = s; + } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed half-word multiply-sum, modulo.** Per word lane: + ``` + VD[i] = (VC[i] + int16(VA[2*i]) * int16(VB[2*i]) + + int16(VA[2*i+1]) * int16(VB[2*i+1])) mod 2^32 + ``` + Two signed-half × signed-half products plus a signed-word accumulator → one signed word per output lane. +- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched** — wraparound silently. Use [`vmsumshs`](vmsumshs.md) for the saturating variant. +- **Big-endian half lanes.** Lane 0 is the most-significant half; output word `i` consumes halves `2*i` and `2*i+1`. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Q15 dot products of paired audio samples, 2-tap signed FIR coefficients. + +## Related Instructions + +- [`vmsumshs`](vmsumshs.md) — same shape with saturating output. +- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum. +- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width. +- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — per-lane multiply-add (no horizontal reduction). +- [`vsum2sws`](vsum2sws.md), [`vsumsws`](vsumsws.md) — pure horizontal sums. + +## IBM Reference + +- [AIX 7.3 — `vmsumshm` (Vector Multiply-Sum Signed Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumshm-vector-multiply-sum-signed-half-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsumshs.md b/migration/project-root/ppc-manual/vmx/vmsumshs.md new file mode 100644 index 0000000..15de23d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsumshs.md @@ -0,0 +1,149 @@ +# `vmsumshs` — Vector Multiply-Sum Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000029` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsumshs` | `vmsumshs` | — | Vector Multiply-Sum Signed Half Word Saturate | + +## Syntax + +```asm +vmsumshs [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsumshs` — form `VA` + +- **Opcode word:** `0x10000029` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `41` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsumshs: read | Source A vector register. | +| `VB` | vmsumshs: read | Source B vector register. | +| `VC` | vmsumshs: read | Source C vector register / 3-bit selector. | +| `VD` | vmsumshs: write | Destination vector register. | +| `VSCR` | vmsumshs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vmsumshs` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vmsumshs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsumshs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumshs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1047`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1047) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:584`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L584) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3639-3655`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3639-L3655) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsumshs => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rc()]); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + // Running-sum saturation: accumulate in i64, clamp once at end. + let s = (a[2*i] as i64 * b[2*i] as i64) + + (a[2*i+1] as i64 * b[2*i+1] as i64) + + c[i] as i64; + let (v, o) = crate::vmx::sat_i64_to_i32(s); + r[i] = v; sat |= o; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed half-word multiply-sum, saturating.** Per word lane: + ``` + VD[i] = clamp(VC[i] + int16(VA[2*i]) * int16(VB[2*i]) + + int16(VA[2*i+1]) * int16(VB[2*i+1]), INT32_MIN, INT32_MAX) + ``` + Two signed-half × signed-half products plus a signed-word accumulator, clamped to `int32`. +- **Wide-then-clamp ordering.** Xenia accumulates into `i64` first and clamps the *final* sum to `int32`, exactly matching the IBM specification ([`crates/xenia-cpu/src/vmx.rs`](../../xenia-rs/crates/xenia-cpu/src/vmx.rs)). This avoids spurious mid-sum saturation that would happen if the products were clamped individually. +- **`VSCR[SAT]` is sticky-set** if any of the four lane sums saturates. Cleared only via [`mtvscr`](mtvscr.md). +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** High-precision dot products, audio FIR taps with overflow detection, signed-pixel filter convolution. + +## Related Instructions + +- [`vmsumshm`](vmsumshm.md) — same shape, modulo (no clamp, no SAT flag). +- [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum, saturating. +- [`vmsummbm`](vmsummbm.md), [`vmsumubm`](vmsumubm.md) — multiply-sum at byte width. +- [`vaddsws`](vaddsws.md) — saturating word add for further accumulation. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`. + +## IBM Reference + +- [AIX 7.3 — `vmsumshs` (Vector Multiply-Sum Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumshs-vector-multiply-sum-signed-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsumubm.md b/migration/project-root/ppc-manual/vmx/vmsumubm.md new file mode 100644 index 0000000..c27bf25 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsumubm.md @@ -0,0 +1,143 @@ +# `vmsumubm` — Vector Multiply-Sum Unsigned Byte Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000024` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsumubm` | `vmsumubm` | — | Vector Multiply-Sum Unsigned Byte Modulo | + +## Syntax + +```asm +vmsumubm [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsumubm` — form `VA` + +- **Opcode word:** `0x10000024` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `36` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsumubm: read | Source A vector register. | +| `VB` | vmsumubm: read | Source B vector register. | +| `VC` | vmsumubm: read | Source C vector register / 3-bit selector. | +| `VD` | vmsumubm: write | Destination vector register. | + +## Register Effects + +### `vmsumubm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsumubm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumubm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1052`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1052) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:579`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L579) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3564-3578`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3564-L3578) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsumubm => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let c = ctx.vr[instr.rc()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let mut s = c[i]; + for j in 0..4 { + s = s.wrapping_add(a[4*i+j] as u32 * b[4*i+j] as u32); + } + r[i] = s; + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned byte multiply-sum, modulo.** Per word lane: + ``` + VD[i] = (VC[i] + Σ_{j=0..3} uint8(VA[4*i + j]) * uint8(VB[4*i + j])) mod 2^32 + ``` + Four unsigned-byte × unsigned-byte products and an unsigned-word accumulator from `VC`, summed into one unsigned word per lane. +- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched** — wraparound silently. +- **Big-endian byte lanes.** Lane 0 is the most-significant byte; output word `i` consumes bytes `4*i .. 4*i+3`. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Pixel-component dot products (RGBA × weights packed as bytes); 4-tap unsigned convolution; per-pixel "intensity sum" where the weights are byte-quantised. + +## Related Instructions + +- [`vmsummbm`](vmsummbm.md) — same shape, signed × unsigned (mixed-sign). +- [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md) — unsigned half multiply-sum (modulo / saturate). +- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — signed half multiply-sum. +- [`vsum4ubs`](vsum4ubs.md) — pure horizontal sum of bytes into words. + +## IBM Reference + +- [AIX 7.3 — `vmsumubm` (Vector Multiply-Sum Unsigned Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumubm-vector-multiply-sum-unsigned-byte-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsumuhm.md b/migration/project-root/ppc-manual/vmx/vmsumuhm.md new file mode 100644 index 0000000..8e6f1d2 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsumuhm.md @@ -0,0 +1,143 @@ +# `vmsumuhm` — Vector Multiply-Sum Unsigned Half Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000026` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsumuhm` | `vmsumuhm` | — | Vector Multiply-Sum Unsigned Half Word Modulo | + +## Syntax + +```asm +vmsumuhm [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsumuhm` — form `VA` + +- **Opcode word:** `0x10000026` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `38` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsumuhm: read | Source A vector register. | +| `VB` | vmsumuhm: read | Source B vector register. | +| `VC` | vmsumuhm: read | Source C vector register / 3-bit selector. | +| `VD` | vmsumuhm: write | Destination vector register. | + +## Register Effects + +### `vmsumuhm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsumuhm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumuhm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1057`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1057) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:581`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L581) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3595-3608`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3595-L3608) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsumuhm => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let c = ctx.vr[instr.rc()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let s = (a[2*i] as u32 * b[2*i] as u32) + .wrapping_add(a[2*i+1] as u32 * b[2*i+1] as u32) + .wrapping_add(c[i]); + r[i] = s; + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word multiply-sum, modulo.** Per word lane: + ``` + VD[i] = (VC[i] + uint16(VA[2*i]) * uint16(VB[2*i]) + + uint16(VA[2*i+1]) * uint16(VB[2*i+1])) mod 2^32 + ``` + Two unsigned-half × unsigned-half products and an unsigned-word accumulator → one unsigned word per output lane. +- **Modulo wrap, never saturates.** **`VSCR[SAT]` is not touched.** +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Unsigned 16-bit FIR taps; pair-wise component sums for half-precision colour data. + +## Related Instructions + +- [`vmsumuhs`](vmsumuhs.md) — same shape, saturating output. +- [`vmsumshm`](vmsumshm.md) / [`vmsumshs`](vmsumshs.md) — signed half multiply-sum. +- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width. +- [`vmladduhm`](vmladduhm.md) — per-lane multiply-add at half width. + +## IBM Reference + +- [AIX 7.3 — `vmsumuhm` (Vector Multiply-Sum Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumuhm-vector-multiply-sum-unsigned-half-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmsumuhs.md b/migration/project-root/ppc-manual/vmx/vmsumuhs.md new file mode 100644 index 0000000..93508de --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmsumuhs.md @@ -0,0 +1,148 @@ +# `vmsumuhs` — Vector Multiply-Sum Unsigned Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x10000027` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsumuhs` | `vmsumuhs` | — | Vector Multiply-Sum Unsigned Half Word Saturate | + +## Syntax + +```asm +vmsumuhs [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vmsumuhs` — form `VA` + +- **Opcode word:** `0x10000027` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `39` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsumuhs: read | Source A vector register. | +| `VB` | vmsumuhs: read | Source B vector register. | +| `VC` | vmsumuhs: read | Source C vector register / 3-bit selector. | +| `VD` | vmsumuhs: write | Destination vector register. | +| `VSCR` | vmsumuhs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vmsumuhs` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vmsumuhs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsumuhs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsumuhs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1062`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1062) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:107`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L107) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:582`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L582) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3609-3624`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3609-L3624) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsumuhs => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let c = ctx.vr[instr.rc()].as_u32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let s = (a[2*i] as u64 * b[2*i] as u64) + + (a[2*i+1] as u64 * b[2*i+1] as u64) + + c[i] as u64; + let (v, overflow) = if s > u32::MAX as u64 { (u32::MAX, true) } else { (s as u32, false) }; + r[i] = v; sat |= overflow; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word multiply-sum, saturating.** Per word lane: + ``` + VD[i] = clamp(VC[i] + uint16(VA[2*i]) * uint16(VB[2*i]) + + uint16(VA[2*i+1]) * uint16(VB[2*i+1]), 0, UINT32_MAX) + ``` + Two unsigned-half × unsigned-half products plus an unsigned-word accumulator, clamped to `uint32`. +- **Wide-then-clamp ordering.** Xenia accumulates into `u64` first and clamps the *final* sum to `u32` ([`crates/xenia-cpu/src/interpreter.rs`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs)) — matches the IBM spec. +- **`VSCR[SAT]` is sticky-set** if any lane clamps. Only the upper bound `0xFFFF_FFFF` ever triggers; unsigned overflow on the low side is impossible. +- **Big-endian half lanes.** Lane 0 is the most-significant half. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Per-pixel summed-area calculations with overflow detection; high-precision unsigned-half FIR convolution. + +## Related Instructions + +- [`vmsumuhm`](vmsumuhm.md) — same shape, modulo (no clamp, no SAT flag). +- [`vmsumshs`](vmsumshs.md) — signed half multiply-sum, saturating. +- [`vmsumubm`](vmsumubm.md), [`vmsummbm`](vmsummbm.md) — multiply-sum at byte width. +- [`vadduws`](vadduws.md) — unsigned saturating word add for further accumulation. +- [`mtvscr`](mtvscr.md) / [`mfvscr`](mfvscr.md) — read or clear `VSCR[SAT]`. + +## IBM Reference + +- [AIX 7.3 — `vmsumuhs` (Vector Multiply-Sum Unsigned Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmsumuhs-vector-multiply-sum-unsigned-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply-Sum Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmulesb.md b/migration/project-root/ppc-manual/vmx/vmulesb.md new file mode 100644 index 0000000..7301605 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmulesb.md @@ -0,0 +1,136 @@ +# `vmulesb` — Vector Multiply Even Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000308` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulesb` | `vmulesb` | — | Vector Multiply Even Signed Byte | + +## Syntax + +```asm +vmulesb [VD], [VA], [VB] +``` + +## Encoding + +### `vmulesb` — form `VX` + +- **Opcode word:** `0x10000308` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `776` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulesb: read | Source A vector register. | +| `VB` | vmulesb: read | Source B vector register. | +| `VD` | vmulesb: write | Destination vector register. | + +## Register Effects + +### `vmulesb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulesb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulesb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1086`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1086) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:501`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L501) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3469-3476`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3469-L3476) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulesb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = a[2 * i] as i16 * b[2 * i] as i16; } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Even-byte signed multiply, half-word result.** Per output half lane: + ``` + VD[i] = int16(int8(VA[2*i]) * int8(VB[2*i])) ; for i = 0..7 + ``` + Only the eight **even-indexed** byte lanes (lanes 0, 2, 4, …, 14 in big-endian) are read from each source. Each `int8 × int8` product is widened to `int16`, producing eight half-word results that fill all of `VD`. +- **No saturation, no `VSCR[SAT]`.** The full 16-bit product of two signed bytes always fits in `int16` (range `-127*-128 = +16256 .. +127*+127 = +16129` is well within `±32767`), so no clipping is needed — even at the bit-pattern extremes `(-128) * (-128) = +16384` is representable. +- **Pairs with [`vmulosb`](vmulosb.md)** (odd-byte sibling). Together they consume all 16 bytes; two instructions are needed for a "multiply every lane" 16×16-bit byte multiply. +- **Big-endian byte indexing.** Even byte indices `0, 2, 4, …, 14` correspond to the high-order halves of each half-word slot. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Signed-coefficient byte multiply for image filters; first half of a 16-byte signed multiply when paired with `vmulosb`. + +## Related Instructions + +- [`vmulosb`](vmulosb.md) — odd-byte sibling (lanes 1, 3, …, 15). +- [`vmuleub`](vmuleub.md), [`vmuloub`](vmuloub.md) — same split, unsigned. +- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — same family at half width (→ word results). +- [`vmladduhm`](vmladduhm.md) — per-lane modulo multiply-add (low half only). +- [`vpkshus`](vpkshus.md) — saturating pack down from `int16` halves to `uint8` bytes (combine with `vmule*` for "scale + clamp" pipelines). + +## IBM Reference + +- [AIX 7.3 — `vmulesb` (Vector Multiply Even Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulesb-vector-multiply-even-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmulesh.md b/migration/project-root/ppc-manual/vmx/vmulesh.md new file mode 100644 index 0000000..9e00d6d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmulesh.md @@ -0,0 +1,136 @@ +# `vmulesh` — Vector Multiply Even Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000348` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulesh` | `vmulesh` | — | Vector Multiply Even Signed Half Word | + +## Syntax + +```asm +vmulesh [VD], [VA], [VB] +``` + +## Encoding + +### `vmulesh` — form `VX` + +- **Opcode word:** `0x10000348` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `840` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulesh: read | Source A vector register. | +| `VB` | vmulesh: read | Source B vector register. | +| `VD` | vmulesh: write | Destination vector register. | + +## Register Effects + +### `vmulesh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulesh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulesh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1091`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1091) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:508`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L508) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3501-3508`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3501-L3508) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulesh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = a[2 * i] as i32 * b[2 * i] as i32; } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Even-half signed multiply, word result.** Per output word lane: + ``` + VD[i] = int32(int16(VA[2*i]) * int16(VB[2*i])) ; for i = 0..3 + ``` + Only the four **even-indexed** half lanes (lanes 0, 2, 4, 6 in big-endian) are read. Each `int16 × int16` product is widened to `int32`, producing four word results. +- **No saturation, no `VSCR[SAT]`.** The full 32-bit product of two signed `int16` always fits — even `(-32768) * (-32768) = +1_073_741_824` is well within `INT32_MAX`. +- **Pairs with [`vmulosh`](vmulosh.md)** (odd-half sibling). Two instructions to cover all eight half lanes. +- **Big-endian half indexing.** Even half indices `0, 2, 4, 6` correspond to the high-order words of each word slot. +- **No XER, no exceptions.** +- **Aliasing legal.** +- **No VMX128 sibling.** +- **Common usage.** Q15 × Q15 dot products with full 32-bit precision; signed-half-coefficient FIR taps; first half of a "multiply every half lane" sequence when paired with `vmulosh`. + +## Related Instructions + +- [`vmulosh`](vmulosh.md) — odd-half sibling (lanes 1, 3, 5, 7). +- [`vmuleuh`](vmuleuh.md), [`vmulouh`](vmulouh.md) — same split, unsigned. +- [`vmulesb`](vmulesb.md), [`vmulosb`](vmulosb.md) — same family at byte width (→ half-word results). +- [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — signed half multiply-sum across pairs (different shape). +- [`vmladduhm`](vmladduhm.md) — per-lane modulo multiply-add (low half only). + +## IBM Reference + +- [AIX 7.3 — `vmulesh` (Vector Multiply Even Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulesh-vector-multiply-even-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Multiply Family](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmuleub.md b/migration/project-root/ppc-manual/vmx/vmuleub.md new file mode 100644 index 0000000..5e06b8b --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmuleub.md @@ -0,0 +1,131 @@ +# `vmuleub` — Vector Multiply Even Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000208` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmuleub` | `vmuleub` | — | Vector Multiply Even Unsigned Byte | + +## Syntax + +```asm +vmuleub [VD], [VA], [VB] +``` + +## Encoding + +### `vmuleub` — form `VX` + +- **Opcode word:** `0x10000208` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `520` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmuleub: read | Source A vector register. | +| `VB` | vmuleub: read | Source B vector register. | +| `VD` | vmuleub: write | Destination vector register. | + +## Register Effects + +### `vmuleub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmuleub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuleub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1096`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1096) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:478`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L478) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3453-3460`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3453-L3460) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmuleub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[2 * i] as u16 * b[2 * i] as u16; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Even-lane multiply.** Only the *even-indexed* bytes of `VA` and `VB` participate — lanes 0, 2, 4, 6, 8, 10, 12, 14 (big-endian indexing, MSB-first). Each unsigned-byte × unsigned-byte product widens to an unsigned 16-bit half-word and is written to the corresponding half-word of `VD`. The odd lanes are ignored. +- **Lane-count reduction.** Input has 16 byte lanes; output has 8 half-word lanes. The pairing is `VD.h[i] = VA.b[2*i] * VB.b[2*i]` for `i ∈ 0..7`. +- **No overflow possible.** 8-bit × 8-bit unsigned ≤ `0xFF * 0xFF = 0xFE01`, which fits in 16 bits. `VSCR[SAT]` is **not** touched; this is a modulo-equivalent op even though no modulo is needed. +- **Pair with [`vmuloub`](vmuloub.md) to get all 16 products.** Software that wants every byte × byte product typically issues `vmuleub` + `vmuloub` and then interleaves the two half-word vectors (`vmrghh`/`vmrglh`) or sums them (`vmsumubm`). +- **No `Rc`, no XER, no FPSCR.** VMX multiply never touches CR, CA, OV, or VSCR. +- **No VMX128 sibling.** Xbox 360 code that needs this pattern typically goes through [`vmsumubm`](vmsumubm.md) instead. + +## Related Instructions + +- [`vmuloub`](vmuloub.md) — odd-lane twin (bytes 1, 3, …, 15). +- [`vmulesb`](vmulesb.md), [`vmulosb`](vmulosb.md) — signed-byte even/odd multiplies. +- [`vmuleuh`](vmuleuh.md), [`vmulouh`](vmulouh.md) — unsigned-half-word even/odd multiplies (→ word lanes). +- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — signed-half-word even/odd. +- [`vmsumubm`](vmsumubm.md) — fused multiply-sum unsigned-byte-modulo; often replaces the even/odd pair when the caller only needs the sum. +- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave the even/odd half-word results. + +## IBM Reference + +- [AIX 7.3 — `vmuleub` (Vector Multiply Even Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuleub-vector-multiply-even-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmuleuh.md b/migration/project-root/ppc-manual/vmx/vmuleuh.md new file mode 100644 index 0000000..917fd61 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmuleuh.md @@ -0,0 +1,130 @@ +# `vmuleuh` — Vector Multiply Even Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000248` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmuleuh` | `vmuleuh` | — | Vector Multiply Even Unsigned Half Word | + +## Syntax + +```asm +vmuleuh [VD], [VA], [VB] +``` + +## Encoding + +### `vmuleuh` — form `VX` + +- **Opcode word:** `0x10000248` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `584` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmuleuh: read | Source A vector register. | +| `VB` | vmuleuh: read | Source B vector register. | +| `VD` | vmuleuh: write | Destination vector register. | + +## Register Effects + +### `vmuleuh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmuleuh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuleuh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1101`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1101) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:485`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L485) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3485-3492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3485-L3492) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmuleuh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[2 * i] as u32 * b[2 * i] as u32; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Even-lane half-word multiply.** Only half-word lanes 0, 2, 4, 6 of `VA` and `VB` participate (big-endian indexing). Each 16×16 unsigned product widens to an unsigned 32-bit word and is written to the corresponding word lane of `VD`. The odd half-words are ignored. +- **Lane-count reduction.** 8 half-word input lanes → 4 word output lanes. Pairing is `VD.w[i] = VA.h[2*i] * VB.h[2*i]` for `i ∈ 0..3`. +- **No overflow possible.** `0xFFFF * 0xFFFF = 0xFFFE0001` — fits in 32 bits. `VSCR[SAT]` is untouched. +- **Pair with [`vmulouh`](vmulouh.md)** to multiply every half-word lane. Interleave the two vectors with `vmrghw`/`vmrglw` (word-granularity) to rebuild the full element order, or feed both into [`vmsumuhm`](vmsumuhm.md) variants. +- **No `Rc`, no XER, no FPSCR.** +- **No VMX128 sibling.** Xenon code that needs 16-bit lane multiplies usually goes through [`vmsumuhm`](vmsumuhm.md) / [`vmsumuhs`](vmsumuhs.md). + +## Related Instructions + +- [`vmulouh`](vmulouh.md) — odd-half-word twin. +- [`vmulesh`](vmulesh.md), [`vmulosh`](vmulosh.md) — signed-half-word even/odd. +- [`vmuleub`](vmuleub.md), [`vmuloub`](vmuloub.md) — byte-granularity even/odd (→ half-word lanes). +- [`vmsumuhm`](vmsumuhm.md), [`vmsumuhs`](vmsumuhs.md) — fused multiply-sum unsigned-half-word (modulo / saturating). +- [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — interleave results. + +## IBM Reference + +- [AIX 7.3 — `vmuleuh` (Vector Multiply Even Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuleuh-vector-multiply-even-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmulosb.md b/migration/project-root/ppc-manual/vmx/vmulosb.md new file mode 100644 index 0000000..ac908fd --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmulosb.md @@ -0,0 +1,130 @@ +# `vmulosb` — Vector Multiply Odd Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000108` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulosb` | `vmulosb` | — | Vector Multiply Odd Signed Byte | + +## Syntax + +```asm +vmulosb [VD], [VA], [VB] +``` + +## Encoding + +### `vmulosb` — form `VX` + +- **Opcode word:** `0x10000108` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `264` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulosb: read | Source A vector register. | +| `VB` | vmulosb: read | Source B vector register. | +| `VD` | vmulosb: write | Destination vector register. | + +## Register Effects + +### `vmulosb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulosb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulosb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1106`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1106) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:456`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L456) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3477-3484`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3477-L3484) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulosb => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = a[2 * i + 1] as i16 * b[2 * i + 1] as i16; } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Odd-lane signed-byte multiply.** Only the odd-indexed byte lanes (1, 3, 5, 7, 9, 11, 13, 15 — big-endian numbering) of `VA` and `VB` participate. Each pair is treated as signed 8-bit, multiplied, and sign-extended to a signed 16-bit result in the corresponding half-word of `VD`. Pairing: `VD.h[i] = (int8)VA.b[2*i+1] * (int8)VB.b[2*i+1]` for `i ∈ 0..7`. +- **Lane-count reduction.** 16 byte lanes → 8 half-word lanes. +- **No overflow.** `(-128) * (-128) = 0x4000`, `(127) * (127) = 0x3F01` — both fit in int16. `VSCR[SAT]` is **not** set. +- **Pair with [`vmulesb`](vmulesb.md)** to get every signed byte × byte product; interleave via `vmrghh`/`vmrglh`, or feed into [`vmsummbm`](vmsummbm.md) for a multiply-accumulate. +- **Signed vs. unsigned distinction.** The `s` in `vmulosb` makes the product arithmetic: negative operands sign-extend. Compare with [`vmuloub`](vmuloub.md) which zero-extends. +- **No `Rc`, no XER, no VSCR side-effect.** No VMX128 sibling. + +## Related Instructions + +- [`vmulesb`](vmulesb.md) — even-lane signed byte multiply. +- [`vmuloub`](vmuloub.md), [`vmuleub`](vmuleub.md) — unsigned byte twins. +- [`vmulosh`](vmulosh.md), [`vmulesh`](vmulesh.md) — signed half-word even/odd. +- [`vmsummbm`](vmsummbm.md) — fused signed-byte multiply-sum modulo. +- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave the even/odd half-word results. + +## IBM Reference + +- [AIX 7.3 — `vmulosb` (Vector Multiply Odd Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulosb-vector-multiply-odd-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmulosh.md b/migration/project-root/ppc-manual/vmx/vmulosh.md new file mode 100644 index 0000000..164977c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmulosh.md @@ -0,0 +1,130 @@ +# `vmulosh` — Vector Multiply Odd Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000148` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulosh` | `vmulosh` | — | Vector Multiply Odd Signed Half Word | + +## Syntax + +```asm +vmulosh [VD], [VA], [VB] +``` + +## Encoding + +### `vmulosh` — form `VX` + +- **Opcode word:** `0x10000148` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `328` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulosh: read | Source A vector register. | +| `VB` | vmulosh: read | Source B vector register. | +| `VD` | vmulosh: write | Destination vector register. | + +## Register Effects + +### `vmulosh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulosh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulosh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1111`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1111) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:462`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L462) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3509-3516`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3509-L3516) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulosh => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = a[2 * i + 1] as i32 * b[2 * i + 1] as i32; } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Odd-lane signed half-word multiply.** Only half-word lanes 1, 3, 5, 7 of `VA` and `VB` (big-endian numbering) participate. Each pair is treated as signed 16-bit, multiplied, and sign-extended to a signed 32-bit word in `VD`. Pairing: `VD.w[i] = (int16)VA.h[2*i+1] * (int16)VB.h[2*i+1]` for `i ∈ 0..3`. +- **Lane-count reduction.** 8 half-word lanes → 4 word lanes. +- **No overflow.** `(-32768)*(-32768) = 0x40000000` — fits in int32. `VSCR[SAT]` is untouched. +- **Pair with [`vmulesh`](vmulesh.md)** for all eight products, then interleave with `vmrghw`/`vmrglw`. Feed into [`vmsumshm`](vmsumshm.md)/[`vmsumshs`](vmsumshs.md) for accumulation. +- **Signed arithmetic.** Negative inputs sign-extend before multiplication; contrast with [`vmulouh`](vmulouh.md). +- **No `Rc`, no XER.** No VMX128 sibling. + +## Related Instructions + +- [`vmulesh`](vmulesh.md) — even-lane signed half-word multiply. +- [`vmulouh`](vmulouh.md), [`vmuleuh`](vmuleuh.md) — unsigned half-word twins. +- [`vmulosb`](vmulosb.md), [`vmulesb`](vmulesb.md) — signed byte even/odd. +- [`vmhaddshs`](vmhaddshs.md), [`vmhraddshs`](vmhraddshs.md) — fused half-word fixed-point MAC variants. +- [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — signed multiply-sum modulo / saturating. + +## IBM Reference + +- [AIX 7.3 — `vmulosh` (Vector Multiply Odd Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulosh-vector-multiply-odd-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmuloub.md b/migration/project-root/ppc-manual/vmx/vmuloub.md new file mode 100644 index 0000000..8b3b527 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmuloub.md @@ -0,0 +1,130 @@ +# `vmuloub` — Vector Multiply Odd Unsigned Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000008` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmuloub` | `vmuloub` | — | Vector Multiply Odd Unsigned Byte | + +## Syntax + +```asm +vmuloub [VD], [VA], [VB] +``` + +## Encoding + +### `vmuloub` — form `VX` + +- **Opcode word:** `0x10000008` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `8` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmuloub: read | Source A vector register. | +| `VB` | vmuloub: read | Source B vector register. | +| `VD` | vmuloub: write | Destination vector register. | + +## Register Effects + +### `vmuloub` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmuloub`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmuloub"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1116`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1116) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:437`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L437) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3461-3468`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3461-L3468) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmuloub => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[2 * i + 1] as u16 * b[2 * i + 1] as u16; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Odd-lane unsigned byte multiply.** Only the odd-indexed byte lanes (1, 3, 5, 7, 9, 11, 13, 15 — big-endian) of `VA` and `VB` participate. Each 8×8 unsigned product widens to an unsigned 16-bit half-word in the corresponding half-word of `VD`. Pairing: `VD.h[i] = VA.b[2*i+1] * VB.b[2*i+1]` for `i ∈ 0..7`. +- **Lane-count reduction.** 16 byte lanes → 8 half-word lanes. +- **No overflow.** `0xFF * 0xFF = 0xFE01` fits in 16 bits. `VSCR[SAT]` is not touched. +- **Pair with [`vmuleub`](vmuleub.md)** to get every byte product; re-interleave with `vmrghh`/`vmrglh`, or use [`vmsumubm`](vmsumubm.md) for a fused multiply-accumulate. +- **Unsigned arithmetic.** No sign-extension; negatives don't exist for `b` lanes in this op. Contrast with [`vmulosb`](vmulosb.md). +- **No `Rc`, no XER.** No VMX128 sibling. + +## Related Instructions + +- [`vmuleub`](vmuleub.md) — even-lane unsigned byte twin. +- [`vmulosb`](vmulosb.md), [`vmulesb`](vmulesb.md) — signed byte even/odd. +- [`vmulouh`](vmulouh.md), [`vmuleuh`](vmuleuh.md) — unsigned half-word even/odd. +- [`vmsumubm`](vmsumubm.md) — fused unsigned-byte multiply-sum. +- [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md) — interleave even/odd half-word products. + +## IBM Reference + +- [AIX 7.3 — `vmuloub` (Vector Multiply Odd Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmuloub-vector-multiply-odd-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vmulouh.md b/migration/project-root/ppc-manual/vmx/vmulouh.md new file mode 100644 index 0000000..590154f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vmulouh.md @@ -0,0 +1,130 @@ +# `vmulouh` — Vector Multiply Odd Unsigned Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000048` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulouh` | `vmulouh` | — | Vector Multiply Odd Unsigned Half Word | + +## Syntax + +```asm +vmulouh [VD], [VA], [VB] +``` + +## Encoding + +### `vmulouh` — form `VX` + +- **Opcode word:** `0x10000048` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `72` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulouh: read | Source A vector register. | +| `VB` | vmulouh: read | Source B vector register. | +| `VD` | vmulouh: write | Destination vector register. | + +## Register Effects + +### `vmulouh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulouh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulouh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1121`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1121) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:109`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L109) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:444`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L444) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3493-3500`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3493-L3500) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulouh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[2 * i + 1] as u32 * b[2 * i + 1] as u32; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Odd-lane unsigned half-word multiply.** Only half-word lanes 1, 3, 5, 7 (big-endian) of `VA` and `VB` participate. Each 16×16 unsigned product widens to a 32-bit word in `VD`. Pairing: `VD.w[i] = VA.h[2*i+1] * VB.h[2*i+1]` for `i ∈ 0..3`. +- **Lane-count reduction.** 8 half-word lanes → 4 word lanes. +- **No overflow.** `0xFFFF * 0xFFFF = 0xFFFE0001` fits in uint32. `VSCR[SAT]` is untouched. +- **Pair with [`vmuleuh`](vmuleuh.md)** to multiply every half-word; interleave via `vmrghw`/`vmrglw`, or feed into [`vmsumuhm`](vmsumuhm.md)/[`vmsumuhs`](vmsumuhs.md). +- **Unsigned arithmetic.** Zero-extension; contrast with [`vmulosh`](vmulosh.md). +- **No `Rc`, no XER.** No VMX128 sibling. + +## Related Instructions + +- [`vmuleuh`](vmuleuh.md) — even-lane unsigned half-word twin. +- [`vmulosh`](vmulosh.md), [`vmulesh`](vmulesh.md) — signed half-word even/odd. +- [`vmuloub`](vmuloub.md), [`vmuleub`](vmuleub.md) — byte-granularity even/odd. +- [`vmsumuhm`](vmsumuhm.md), [`vmsumuhs`](vmsumuhs.md) — fused unsigned multiply-sum modulo / saturating. +- [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — interleave word results. + +## IBM Reference + +- [AIX 7.3 — `vmulouh` (Vector Multiply Odd Unsigned Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vmulouh-vector-multiply-odd-unsigned-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vnmsubfp.md b/migration/project-root/ppc-manual/vmx/vnmsubfp.md new file mode 100644 index 0000000..3479117 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vnmsubfp.md @@ -0,0 +1,194 @@ +# `vnmsubfp` — Vector Negative Multiply-Subtract Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002f` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vnmsubfp` | `vnmsubfp` | — | Vector Negative Multiply-Subtract Floating Point | +| `vnmsubfp128` | `vnmsubfp128` | — | Vector128 Negative Multiply-Subtract Floating Point | + +## Syntax + +```asm +vnmsubfp [VD], [VA], [VC], [VB] +vnmsubfp128 [VD], [VA], [VD], [VB] +``` + +## Encoding + +### `vnmsubfp` — form `VA` + +- **Opcode word:** `0x1000002f` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `47` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +### `vnmsubfp128` — form `VX128` + +- **Opcode word:** `0x14000150` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `336` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vnmsubfp: read; vnmsubfp128: read | Source A vector register. | +| `VC` | vnmsubfp: read | Source C vector register / 3-bit selector. | +| `VB` | vnmsubfp: read; vnmsubfp128: read | Source B vector register. | +| `VD` | vnmsubfp: write; vnmsubfp128: read; vnmsubfp128: write | Destination vector register. | + +## Register Effects + +### `vnmsubfp` + +- **Reads (always):** `VA`, `VC`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vnmsubfp128` + +- **Reads (always):** `VA`, `VD`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +for each 32-bit float lane i in 0..3: + VD[i] <- −((VA[i] * VC[i]) − VB[i]) +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vnmsubfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1154`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1154) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:589`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L589) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2074-2089`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2074-L2089) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vnmsubfp => { + // vD = -(vA * vC - vB) = vB - vA * vC. Same denorm-flush rule as vmaddfp. + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let c = ctx.vr[instr.rc()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + let ci = vmx::flush_denorm(c[i]); + // PPCBUG-426: single FMA rounding instead of two-step (b - a*c). + r[i] = vmx::flush_denorm(-ai.mul_add(ci, -bi)); + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vnmsubfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnmsubfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1157`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1157) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:615`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L615) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2090-2107`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2090-L2107) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vnmsubfp128 => { + // VMX128 form: vD <- -((vA * vB) - vD) = vD - (vA * vB). Canary + // routes through `InstrEmit_vnmsubfp_` with the same arg-swap, + // which flushes all inputs unconditionally. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let d = ctx.vr[instr.vd128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + let di = vmx::flush_denorm(d[i]); + // PPCBUG-427: single FMA rounding. + r[i] = vmx::flush_denorm(-ai.mul_add(bi, -di)); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Lane-wise negative multiply-subtract.** Each of the four lanes computes `VD[i] = −((VA[i] × VC[i]) − VB[i])`, i.e. `VB[i] − VA[i] × VC[i]`. The multiply and the subsequent add are **not** a single fused rounding step in xenia — they're a multiply, a subtract, then a negate — but the PowerPC ISA specifies the sequence to behave *as if* it were fused (single IEEE-754 rounding). Hardware Xenon indeed rounds only once. +- **IEEE-754 binary32 lanes.** Follows `VSCR[NJ]`: denormal inputs/outputs flush to zero when `NJ = 1`. +- **No VSCR[SAT] update.** VMX float ops never set saturation. +- **No FPSCR effect.** Unlike scalar `fnmsub[s]`, `vnmsubfp` does not touch FPSCR. +- **NaN propagation.** A NaN in any of `VA`, `VB`, or `VC` yields a NaN in the corresponding lane. Sign-of-NaN is unspecified but stable in xenia (matches the x86 host's `vfnmadd`-family output). +- **Big-endian lane indexing.** Lane 0 is the MSB-most 4 bytes. +- **VMX128 sibling: [`vnmsubfp128`](vnmsubfp128.md).** Identical operation with access to `v0..v127`. +- **No `Rc` bit** on this opcode; it never touches CR. + +## Related Instructions + +- [`vmaddfp`](vmaddfp.md) — the positive-rounded fused MAC `(VA × VC) + VB`. +- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — the underlying adds/subs. +- [`vmulfp`](vmulfp.md) — xenia-convenience lane-wise float multiply (no native Altivec form; usually encoded as `vmaddfp VD, VA, VC, v0_zero`). +- [`vrefp`](vrefp.md), [`vrsqrtefp`](vrsqrtefp.md) — Newton iterations that pair with `vnmsubfp`. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — the other float-arithmetic primitives. + +## IBM Reference + +- [AIX 7.3 — `vnmsubfp` (Vector Negative Multiply-Subtract Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vnmsubfp-vector-negative-multiply-subtract-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vnor.md b/migration/project-root/ppc-manual/vmx/vnor.md new file mode 100644 index 0000000..a4fbbcf --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vnor.md @@ -0,0 +1,182 @@ +# `vnor` — Vector Logical NOR + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000504` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vnor` | `vnor` | — | Vector Logical NOR | +| `vnor128` | `vnor128` | — | Vector128 Logical NOR | + +## Syntax + +```asm +vnor [VD], [VA], [VB] +vnor128 [VD], [VA], [VB] +``` + +## Encoding + +### `vnor` — form `VX` + +- **Opcode word:** `0x10000504` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1284` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vnor128` — form `VX128` + +- **Opcode word:** `0x14000290` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `656` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vnor: read; vnor128: read | Source A vector register. | +| `VB` | vnor: read; vnor128: read | Source B vector register. | +| `VD` | vnor: write; vnor128: write | Destination vector register. | + +## Register Effects + +### `vnor` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vnor128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vnor`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnor"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1168`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1168) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:534`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L534) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2244-2252`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2244-L2252) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vnor | PpcOpcode::vnor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = !(a[i] | b[i]); } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vnor128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vnor128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1171`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1171) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:110`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L110) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:623`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L623) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2244-2252`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2244-L2252) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vnor | PpcOpcode::vnor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = !(a[i] | b[i]); } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bitwise NOR across the full 128-bit register.** `VD = ~(VA | VB)`. The operation is lane-agnostic; PPC documents it per-bit, xenia implements it as four 32-bit lanes for convenience but the result is identical to 16-byte or 8-half-word decomposition. +- **`vnor VD, VA, VA` is the idiomatic `vnot`** (bitwise complement of `VA`). No dedicated `vnot` exists in base Altivec. +- **Aliasing is legal.** `vnor v3, v3, v4` or `vnor v3, v3, v3` are well-defined and common. +- **No flags.** No CR, XER, VSCR side-effect. +- **VMX128 sibling [`vnor128`](vnor128.md)** provides the same op with access to `v0..v127`; xenia shares the interpreter arm (`vmx_reg_triple` selects the right encoding helper). +- **Useful for mask inversion.** When a compare result needs to be inverted — e.g. "where not equal" — `vnor` of the compare result with itself is cheaper than a dedicated inversion. + +## Related Instructions + +- [`vand`](vand.md) — bitwise AND. +- [`vandc`](vandc.md) — `VA & ~VB` (useful as a fused inversion on the B side). +- [`vor`](vor.md), [`vxor`](vxor.md) — complete the boolean-primitive set. +- [`vsel`](vsel.md) — three-input bit-select; often fed by the inverse of a compare mask. +- [`vcmpequb`](vcmpequb.md) and related compares — produce the masks `vnor` is typically applied to. + +## IBM Reference + +- [AIX 7.3 — `vnor` (Vector Logical NOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vnor-vector-logical-nor-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vor.md b/migration/project-root/ppc-manual/vmx/vor.md new file mode 100644 index 0000000..479c4a3 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vor.md @@ -0,0 +1,181 @@ +# `vor` — Vector Logical OR + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000484` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vor` | `vor` | — | Vector Logical OR | +| `vor128` | `vor128` | — | Vector128 Logical OR | + +## Syntax + +```asm +vor [VD], [VA], [VB] +vor128 [VD], [VA], [VB] +``` + +## Encoding + +### `vor` — form `VX` + +- **Opcode word:** `0x10000484` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1156` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vor128` — form `VX128` + +- **Opcode word:** `0x140002d0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `720` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vor: read; vor128: read | Source A vector register. | +| `VB` | vor: read; vor128: read | Source B vector register. | +| `VD` | vor: write; vor128: write | Destination vector register. | + +## Register Effects + +### `vor` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vor128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vor`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vor"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1186`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1186) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:111`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L111) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:531`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L531) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2226-2234`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2226-L2234) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vor | PpcOpcode::vor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] | b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vor128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vor128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1189`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1189) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:111`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L111) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:625`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L625) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2226-2234`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2226-L2234) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vor | PpcOpcode::vor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] | b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bitwise OR across the full 128-bit register.** Lane-agnostic; xenia implements it as four 32-bit lanes but the result is identical at any granularity. +- **`vor VD, VA, VA` is the idiomatic register move.** No dedicated "vmr" exists in base Altivec; compilers recognise the `vor v3, v4, v4` pattern as a move and schedule accordingly. +- **Aliasing is legal.** `vor v3, v3, v4` merges the mask in `v4` into `v3`. +- **No flags, no VSCR effect.** +- **VMX128 sibling [`vor128`](vor128.md).** Same operation, wider register file. +- **Common pattern: ORing a compare mask with a data vector** to force specific lanes to all-ones without needing a select. + +## Related Instructions + +- [`vand`](vand.md), [`vandc`](vandc.md) — AND / AND-with-complement. +- [`vnor`](vnor.md) — NOR, includes the idiom for bitwise NOT. +- [`vxor`](vxor.md) — XOR; also a common "zero register" via `vxor vD, vD, vD`. +- [`vsel`](vsel.md) — three-operand bit-select, often combined with OR of masks. + +## IBM Reference + +- [AIX 7.3 — `vor` (Vector Logical OR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vor-vector-logical-or-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vperm.md b/migration/project-root/ppc-manual/vmx/vperm.md new file mode 100644 index 0000000..e13be40 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vperm.md @@ -0,0 +1,216 @@ +# `vperm` — Vector Permute + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002b` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vperm` | `vperm` | — | Vector Permute | +| `vperm128` | `vperm128` | — | Vector128 Permute | + +## Syntax + +```asm +vperm [VD], [VA], [VB], [VC] +vperm128 [VD], [VA], [VB], [VC] +``` + +## Encoding + +### `vperm` — form `VA` + +- **Opcode word:** `0x1000002b` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `43` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +### `vperm128` — form `VX128_2` + +- **Opcode word:** `0x14000000` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `0` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 23–25 | `VC` | source C 3-bit field | +| 26 | `VA128h` | source A middle bit | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vperm: read; vperm128: read | Source A vector register. | +| `VB` | vperm: read; vperm128: read | Source B vector register. | +| `VC` | vperm: read; vperm128: read | Source C vector register / 3-bit selector. | +| `VD` | vperm: write; vperm128: write | Destination vector register. | + +## Register Effects + +### `vperm` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vperm128` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vperm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1199`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1199) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:586`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L586) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vperm | PpcOpcode::vperm128 => { + let (va, vb, vd); + let vc; + if matches!(instr.opcode, PpcOpcode::vperm128) { + va = instr.va128(); + vb = instr.vb128(); + vd = instr.vd128(); + vc = instr.vc128_2(); + } else { + va = instr.ra(); + vb = instr.rb(); + vd = instr.rd(); + vc = instr.rc(); + } + let a_bytes = ctx.vr[va].as_bytes(); + let b_bytes = ctx.vr[vb].as_bytes(); + let c_bytes = ctx.vr[vc].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { + let idx = (c_bytes[i] & 0x1F) as usize; + r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] }; + } + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`vperm128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vperm128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1202`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1202) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:605`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L605) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2278-2302`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2278-L2302) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vperm | PpcOpcode::vperm128 => { + let (va, vb, vd); + let vc; + if matches!(instr.opcode, PpcOpcode::vperm128) { + va = instr.va128(); + vb = instr.vb128(); + vd = instr.vd128(); + vc = instr.vc128_2(); + } else { + va = instr.ra(); + vb = instr.rb(); + vd = instr.rd(); + vc = instr.rc(); + } + let a_bytes = ctx.vr[va].as_bytes(); + let b_bytes = ctx.vr[vb].as_bytes(); + let c_bytes = ctx.vr[vc].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { + let idx = (c_bytes[i] & 0x1F) as usize; + r[i] = if idx < 16 { a_bytes[idx] } else { b_bytes[idx - 16] }; + } + ctx.vr[vd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-byte selector drives a cross-vector permute.** Each byte of `VC` is a 5-bit selector (low 5 bits used, upper 3 bits ignored). Bit 3 of that 5-bit field (i.e. the "16 bit") chooses which source: 0 selects from `VA`, 1 selects from `VB`. The low 4 bits index a byte within the chosen 16-byte operand. +- **`vperm` is the universal "16-byte reshuffle" primitive.** It can express any byte-level permutation of 32 source bytes (`VA ‖ VB`) down to 16 destination bytes, including duplicates and drops. +- **Big-endian byte indexing.** `VC.b[0]` controls `VD.b[0]` (the MSB byte). Selector value 0 picks `VA.b[0]`, value 15 picks `VA.b[15]`, value 16 picks `VB.b[0]`, value 31 picks `VB.b[15]`. +- **Upper 3 bits of each `VC` byte are ignored.** Only bits 3..7 (the low 5) are consulted, so values like 0x1F and 0x5F both mean "byte 15 of VB". Software can use those upper bits for its own tagging. +- **Pair with [`lvsl`](lvsl.md) / [`lvsr`](lvsr.md) for unaligned 16-byte loads.** `lvsl` produces the selector that shifts "left" by `EA & 0xF` bytes; feeding that into `vperm` with two aligned `lvx` results yields the unaligned 16-byte view. +- **Aliasing legal.** `VD` may equal `VA` or `VB`. +- **VMX128 sibling [`vperm128`](vperm128.md).** Same shape with the 7-bit register file. The VMX128 encoding carries `VC` in the 3-bit `VC` sub-field of the `VX128_2` form — which only lets `VC` select one of **8** specific registers, not 128. In xenia's decoder this is `vc128()`. +- **No flags, no VSCR side-effect.** + +## Related Instructions + +- [`vsldoi`](vsldoi.md) — static-shift-by-`SHB` form; when the shift is a compile-time constant this is cheaper than `lvsl`+`vperm`. +- [`lvsl`](lvsl.md), [`lvsr`](lvsr.md) — generate the permute mask from an effective address. +- [`vmrghb`](vmrghb.md), [`vmrglb`](vmrglb.md), [`vmrghh`](vmrghh.md), [`vmrglh`](vmrglh.md), [`vmrghw`](vmrghw.md), [`vmrglw`](vmrglw.md) — dedicated merges that are a subset of `vperm`. +- [`vspltb`](vspltb.md), [`vsplth`](vsplth.md), [`vspltw`](vspltw.md) — splat-from-lane, also expressible via `vperm` + a constant mask. +- [`vpkuhum`](vpkuhum.md) and other `vpk*` — narrower-lane packs whose pattern can also be encoded in `vperm`. + +## IBM Reference + +- [AIX 7.3 — `vperm` (Vector Permute)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vperm-vector-permute-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkpx.md b/migration/project-root/ppc-manual/vmx/vpkpx.md new file mode 100644 index 0000000..28eb4a0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkpx.md @@ -0,0 +1,130 @@ +# `vpkpx` — Vector Pack Pixel + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000030e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkp` | `vpkpx` | — | Vector Pack Pixel | + +## Syntax + +```asm +vpkpx [VD], [VA], [VB] +``` + +## Encoding + +### `vpkpx` — form `VX` + +- **Opcode word:** `0x1000030e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `782` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkpx: read | Source A vector register. | +| `VB` | vpkpx: read | Source B vector register. | +| `VD` | vpkpx: write | Destination vector register. | + +## Register Effects + +### `vpkpx` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkpx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkpx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1810`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1810) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:504`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L504) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4123-4131`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4123-L4131) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkpx => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u16; 8]; + for i in 0..4 { r[i] = crate::vmx::pack_pixel_555(a[i]); } + for i in 0..4 { r[4 + i] = crate::vmx::pack_pixel_555(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Pack 4×4 pixel words → 8×16-bit 1-5-5-5 pixels.** For each 32-bit word lane, three bit-fields are sampled and concatenated into a 16-bit `1.5.5.5` (A.R.G.B) format, losing precision but not saturating. +- **Bit-layout of each output half-word.** Bit 0 of the output = bit 7 of the source byte (alpha); the next 5 bits come from the red channel's top 5 bits (bits 8..12 of the source word); then 5 bits of green (bits 16..20); then 5 bits of blue (bits 24..28). Xenia's helper is `vmx::pack_pixel_555` (in `crates/xenia-cpu/src/vmx.rs`). +- **No saturation / no rounding.** The op truncates the lower bits of each channel; `VSCR[SAT]` is **not** affected. +- **Big-endian lane order.** `VA`'s 4 words produce the first 4 output half-words (`VD.h[0..3]`); `VB`'s 4 words fill `VD.h[4..7]`. +- **Paired with [`vupkhpx`](vupkhpx.md) / [`vupklpx`](vupklpx.md)** — these unpack a 1-5-5-5 pixel back into a word-lane `0x00RRGGBB`-like form for further arithmetic. +- **No `Rc`, no XER.** No VMX128 sibling — game code that needs 555-pixel packing on Xenon either uses the scalar path or runs into the `vpkd3d128` family for richer D3D formats. + +## Related Instructions + +- [`vupkhpx`](vupkhpx.md), [`vupklpx`](vupklpx.md) — the inverse unpacks. +- [`vpkuhum`](vpkuhum.md), [`vpkuhus`](vpkuhus.md), [`vpkuwum`](vpkuwum.md), [`vpkuwus`](vpkuwus.md) — lane-halving packs (unsigned modulo / saturating). +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md), [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — signed-input saturating packs. +- [`vpkd3d128`](../vmx128/vpkd3d128.md) — VMX128-exclusive D3D-format pack (richer pixel formats than 1-5-5-5). + +## IBM Reference + +- [AIX 7.3 — `vpkpx` (Vector Pack Pixel32)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkpx-vector-pack-pixel32-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkshss.md b/migration/project-root/ppc-manual/vmx/vpkshss.md new file mode 100644 index 0000000..b968c36 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkshss.md @@ -0,0 +1,193 @@ +# `vpkshss` — Vector Pack Signed Half Word Signed Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000018e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkshss` | `vpkshss` | — | Vector Pack Signed Half Word Signed Saturate | +| `vpkshss128` | `vpkshss128` | — | Vector128 Pack Signed Half Word Signed Saturate | + +## Syntax + +```asm +vpkshss [VD], [VA], [VB] +vpkshss128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkshss` — form `VX` + +- **Opcode word:** `0x1000018e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `398` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkshss128` — form `VX128` + +- **Opcode word:** `0x14000200` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `512` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkshss: read; vpkshss128: read | Source A vector register. | +| `VB` | vpkshss: read; vpkshss128: read | Source B vector register. | +| `VD` | vpkshss: write; vpkshss128: write | Destination vector register. | +| `VSCR` | vpkshss: write; vpkshss128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkshss` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkshss128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkshss`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkshss128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkshss`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshss"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1845`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1845) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:471`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L471) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4070-4082`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4070-L4082) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkshss | PpcOpcode::vpkshss128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkshss128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i16x8(ctx.vr[ra]); + let b = crate::vmx::as_i16x8(ctx.vr[rb]); + let mut r = [0i8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ +**`vpkshss128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshss128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1848`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1848) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:618`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L618) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4070-4082`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4070-L4082) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkshss | PpcOpcode::vpkshss128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkshss128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i16x8(ctx.vr[ra]); + let b = crate::vmx::as_i16x8(ctx.vr[rb]); + let mut r = [0i8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_i8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed half-word → signed byte saturating pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is clamped to the `int8` range `[−128, +127]`. Values outside that range produce the nearest extreme and **set the sticky `VSCR[SAT]`** bit. +- **Lane-count doubling.** 8+8 = 16 half-word lanes → 16 byte lanes in `VD`. +- **Big-endian ordering.** `VA`'s 8 half-words fill `VD.b[0..7]`; `VB`'s 8 fill `VD.b[8..15]`. +- **Signed vs. unsigned output.** `vpkshss` has signed input and signed output. Compare with [`vpkshus`](vpkshus.md), which keeps signed input but clamps to `uint8` (`[0, 255]`). +- **`VSCR[SAT]` is sticky.** Once set it remains set until an `mtvscr` clears it. Software that needs a per-block saturation signal must clear before the kernel and test after. +- **No `Rc`, no XER / FPSCR.** +- **VMX128 sibling [`vpkshss128`](vpkshss128.md).** Same semantics, wider register file. + +## Related Instructions + +- [`vpkshus`](vpkshus.md) — signed → unsigned saturating (same half-word input). +- [`vpkuhus`](vpkuhus.md), [`vpkuhum`](vpkuhum.md) — unsigned half-word input, unsigned byte output (saturating / modulo). +- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — the word → half-word analogues. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — the inverse unpacks that sign-extend bytes back to half-words. +- [`vaddsbs`](vaddsbs.md), [`vsubsbs`](vsubsbs.md) — other sources of byte-saturating arithmetic. + +## IBM Reference + +- [AIX 7.3 — `vpkshss` (Vector Pack Signed Half Word Signed Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkshss-vector-pack-signed-half-word-signed-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkshus.md b/migration/project-root/ppc-manual/vmx/vpkshus.md new file mode 100644 index 0000000..64723f9 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkshus.md @@ -0,0 +1,192 @@ +# `vpkshus` — Vector Pack Signed Half Word Unsigned Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkshus` | `vpkshus` | — | Vector Pack Signed Half Word Unsigned Saturate | +| `vpkshus128` | `vpkshus128` | — | Vector128 Pack Signed Half Word Unsigned Saturate | + +## Syntax + +```asm +vpkshus [VD], [VA], [VB] +vpkshus128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkshus` — form `VX` + +- **Opcode word:** `0x1000010e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `270` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkshus128` — form `VX128` + +- **Opcode word:** `0x14000240` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `576` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkshus: read; vpkshus128: read | Source A vector register. | +| `VB` | vpkshus: read; vpkshus128: read | Source B vector register. | +| `VD` | vpkshus: write; vpkshus128: write | Destination vector register. | +| `VSCR` | vpkshus: write; vpkshus128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkshus` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkshus128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkshus`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkshus128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkshus`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshus"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1953`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1953) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:459`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L459) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4057-4069`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4057-L4069) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkshus | PpcOpcode::vpkshus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkshus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i16x8(ctx.vr[ra]); + let b = crate::vmx::as_i16x8(ctx.vr[rb]); + let mut r = [0u8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`vpkshus128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkshus128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1956`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1956) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:113`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L113) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:620`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L620) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4057-4069`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4057-L4069) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkshus | PpcOpcode::vpkshus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkshus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i16x8(ctx.vr[ra]); + let b = crate::vmx::as_i16x8(ctx.vr[rb]); + let mut r = [0u8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_i16_to_u8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed half-word → unsigned byte saturating pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is interpreted as `int16` and clamped to `[0, 255]`. Negative values → 0; values above 255 → 255. Clamping any lane sticky-sets `VSCR[SAT]`. +- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes, `VA` filling `VD.b[0..7]` and `VB` filling `VD.b[8..15]`. +- **Difference from [`vpkshss`](vpkshss.md).** Both take signed half-words; `shss` clamps to `int8`, `shus` clamps to `uint8`. Choose `shus` when the signed negative half is not physically meaningful (e.g. after subtracting a clamped-at-zero value). +- **`VSCR[SAT]` is sticky.** +- **No `Rc`, no XER / FPSCR.** +- **VMX128 sibling [`vpkshus128`](vpkshus128.md).** Same behaviour with wider register file. + +## Related Instructions + +- [`vpkshss`](vpkshss.md) — signed → signed clamp. +- [`vpkuhus`](vpkuhus.md) — unsigned input → unsigned byte. +- [`vpkuhum`](vpkuhum.md) — unsigned input, truncating (modulo) pack. +- [`vpkswus`](vpkswus.md) — the word → half-word signed→unsigned analogue. +- [`vupkhub`](vupkhub.md)-family unpacks (if present) — the inverse. + +## IBM Reference + +- [AIX 7.3 — `vpkshus` (Vector Pack Signed Half Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkshus-vector-pack-signed-half-word-unsigned-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkswss.md b/migration/project-root/ppc-manual/vmx/vpkswss.md new file mode 100644 index 0000000..0ae74c3 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkswss.md @@ -0,0 +1,193 @@ +# `vpkswss` — Vector Pack Signed Word Signed Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100001ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkswss` | `vpkswss` | — | Vector Pack Signed Word Signed Saturate | +| `vpkswss128` | `vpkswss128` | — | Vector128 Pack Signed Word Signed Saturate | + +## Syntax + +```asm +vpkswss [VD], [VA], [VB] +vpkswss128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkswss` — form `VX` + +- **Opcode word:** `0x100001ce` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `462` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkswss128` — form `VX128` + +- **Opcode word:** `0x14000280` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `640` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkswss: read; vpkswss128: read | Source A vector register. | +| `VB` | vpkswss: read; vpkswss128: read | Source B vector register. | +| `VD` | vpkswss: write; vpkswss128: write | Destination vector register. | +| `VSCR` | vpkswss: write; vpkswss128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkswss` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkswss128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkswss`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkswss128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkswss`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswss"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1867`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1867) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:474`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L474) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4109-4121`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4109-L4121) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkswss | PpcOpcode::vpkswss128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkswss128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i32x4(ctx.vr[ra]); + let b = crate::vmx::as_i32x4(ctx.vr[rb]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ +**`vpkswss128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswss128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1870`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1870) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:622`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L622) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4109-4121`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4109-L4121) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkswss | PpcOpcode::vpkswss128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkswss128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i32x4(ctx.vr[ra]); + let b = crate::vmx::as_i32x4(ctx.vr[rb]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_i16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed word → signed half-word saturating pack.** Each of the 8 input word lanes (4 from `VA`, 4 from `VB`) is clamped to `[−32768, +32767]`. Out-of-range clamping sticky-sets `VSCR[SAT]`. +- **Lane-count doubling.** 4+4 = 8 word lanes → 8 half-word lanes. +- **Ordering.** `VA`'s four words produce `VD.h[0..3]`; `VB`'s produce `VD.h[4..7]`. +- **Signed vs. unsigned output.** `vpkswss` preserves sign; [`vpkswus`](vpkswus.md) clamps the same signed-word input to `uint16`. +- **`VSCR[SAT]` is sticky.** +- **No `Rc`, no XER / FPSCR.** +- **VMX128 sibling [`vpkswss128`](vpkswss128.md).** + +## Related Instructions + +- [`vpkswus`](vpkswus.md) — signed → unsigned saturating. +- [`vpkuwus`](vpkuwus.md), [`vpkuwum`](vpkuwum.md) — unsigned word input. +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — half-word input → byte output analogues. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — the signed-half-word → word unpacks that reverse a `vpkswss` pair. +- [`vaddsws`](vaddsws.md), [`vsubsws`](vsubsws.md) — word-saturating arithmetic producers. + +## IBM Reference + +- [AIX 7.3 — `vpkswss` (Vector Pack Signed Word Signed Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkswss-vector-pack-signed-word-signed-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkswus.md b/migration/project-root/ppc-manual/vmx/vpkswus.md new file mode 100644 index 0000000..e2863af --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkswus.md @@ -0,0 +1,192 @@ +# `vpkswus` — Vector Pack Signed Word Unsigned Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkswus` | `vpkswus` | — | Vector Pack Signed Word Unsigned Saturate | +| `vpkswus128` | `vpkswus128` | — | Vector128 Pack Signed Word Unsigned Saturate | + +## Syntax + +```asm +vpkswus [VD], [VA], [VB] +vpkswus128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkswus` — form `VX` + +- **Opcode word:** `0x1000014e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `334` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkswus128` — form `VX128` + +- **Opcode word:** `0x140002c0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `704` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkswus: read; vpkswus128: read | Source A vector register. | +| `VB` | vpkswus: read; vpkswus128: read | Source B vector register. | +| `VD` | vpkswus: write; vpkswus128: write | Destination vector register. | +| `VSCR` | vpkswus: write; vpkswus128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkswus` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkswus128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkswus`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkswus128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkswus`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswus"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1889`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1889) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:465`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L465) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4096-4108`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4096-L4108) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkswus | PpcOpcode::vpkswus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkswus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i32x4(ctx.vr[ra]); + let b = crate::vmx::as_i32x4(ctx.vr[rb]); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ +**`vpkswus128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkswus128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1892`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1892) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:114`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L114) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:624`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L624) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4096-4108`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4096-L4108) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkswus | PpcOpcode::vpkswus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkswus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = crate::vmx::as_i32x4(ctx.vr[ra]); + let b = crate::vmx::as_i32x4(ctx.vr[rb]); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_i32_to_u16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed word → unsigned half-word saturating pack.** Each of the 8 input word lanes is interpreted as `int32` and clamped to `[0, 65535]`. Negatives → 0, values above 65535 → 65535, sticky-setting `VSCR[SAT]`. +- **Lane-count doubling.** 8 word lanes → 8 half-word lanes, ordered as `VA` then `VB`. +- **Choose over [`vpkswss`](vpkswss.md)** when negative results shouldn't survive — e.g. clamped colour or intensity values that happen to have arrived in `int32` form. +- **`VSCR[SAT]` is sticky.** +- **No `Rc`, no XER / FPSCR.** +- **VMX128 sibling [`vpkswus128`](vpkswus128.md).** + +## Related Instructions + +- [`vpkswss`](vpkswss.md) — signed → signed clamp. +- [`vpkuwus`](vpkuwus.md) — unsigned word input → unsigned half-word. +- [`vpkuwum`](vpkuwum.md) — modulo (truncate) pack. +- [`vpkshus`](vpkshus.md) — the half-word → byte analogue. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — signed-half-word unpacks. + +## IBM Reference + +- [AIX 7.3 — `vpkswus` (Vector Pack Signed Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkswus-vector-pack-signed-word-unsigned-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkuhum.md b/migration/project-root/ppc-manual/vmx/vpkuhum.md new file mode 100644 index 0000000..a673979 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkuhum.md @@ -0,0 +1,188 @@ +# `vpkuhum` — Vector Pack Unsigned Half Word Unsigned Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000000e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkuhum` | `vpkuhum` | — | Vector Pack Unsigned Half Word Unsigned Modulo | +| `vpkuhum128` | `vpkuhum128` | — | Vector128 Pack Unsigned Half Word Unsigned Modulo | + +## Syntax + +```asm +vpkuhum [VD], [VA], [VB] +vpkuhum128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkuhum` — form `VX` + +- **Opcode word:** `0x1000000e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `14` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkuhum128` — form `VX128` + +- **Opcode word:** `0x14000300` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `768` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkuhum: read; vpkuhum128: read | Source A vector register. | +| `VB` | vpkuhum: read; vpkuhum128: read | Source B vector register. | +| `VD` | vpkuhum: write; vpkuhum128: write | Destination vector register. | + +## Register Effects + +### `vpkuhum` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vpkuhum128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkuhum`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhum"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1909`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1909) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:440`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L440) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4019-4030`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4019-L4030) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuhum | PpcOpcode::vpkuhum128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhum128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u16x8(); + let b = ctx.vr[rb].as_u16x8(); + let mut r = [0u8; 16]; + for i in 0..8 { r[i] = a[i] as u8; } + for i in 0..8 { r[8 + i] = b[i] as u8; } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`vpkuhum128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhum128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1912`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1912) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:626`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L626) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4019-4030`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4019-L4030) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuhum | PpcOpcode::vpkuhum128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhum128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u16x8(); + let b = ctx.vr[rb].as_u16x8(); + let mut r = [0u8; 16]; + for i in 0..8 { r[i] = a[i] as u8; } + for i in 0..8 { r[8 + i] = b[i] as u8; } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word → byte modulo pack.** Each of the 16 input half-word lanes (8 from `VA`, 8 from `VB`) is truncated to its low 8 bits. No saturation; values above 255 wrap modulo 256. +- **`VSCR[SAT]` is never touched.** This is the `-m` (modulo) form. For saturation use [`vpkuhus`](vpkuhus.md). +- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes, `VA`'s 8 half-words into `VD.b[0..7]` and `VB`'s 8 into `VD.b[8..15]`. +- **Cheap "low-byte extract" primitive.** Often used to repack per-channel results after a half-word arithmetic step. Contrast with shifting + masking. +- **No `Rc`, no XER.** +- **VMX128 sibling [`vpkuhum128`](vpkuhum128.md).** + +## Related Instructions + +- [`vpkuhus`](vpkuhus.md) — the saturating sibling. +- [`vpkuwum`](vpkuwum.md) — word → half-word modulo pack. +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — signed half-word packs. +- [`vupkhub`](vupkhub.md) / [`vupklub`](vupklub.md) (if present) — zero-extending byte → half-word unpacks that reverse this op. +- [`vperm`](vperm.md) — general-purpose alternative when the packing pattern is irregular. + +## IBM Reference + +- [AIX 7.3 — `vpkuhum` (Vector Pack Unsigned Half Word Unsigned Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuhum-vector-pack-unsigned-half-word-unsigned-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkuhus.md b/migration/project-root/ppc-manual/vmx/vpkuhus.md new file mode 100644 index 0000000..d70d8d6 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkuhus.md @@ -0,0 +1,192 @@ +# `vpkuhus` — Vector Pack Unsigned Half Word Unsigned Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000008e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkuhus` | `vpkuhus` | — | Vector Pack Unsigned Half Word Unsigned Saturate | +| `vpkuhus128` | `vpkuhus128` | — | Vector128 Pack Unsigned Half Word Unsigned Saturate | + +## Syntax + +```asm +vpkuhus [VD], [VA], [VB] +vpkuhus128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkuhus` — form `VX` + +- **Opcode word:** `0x1000008e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `142` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkuhus128` — form `VX128` + +- **Opcode word:** `0x14000340` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `832` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkuhus: read; vpkuhus128: read | Source A vector register. | +| `VB` | vpkuhus: read; vpkuhus128: read | Source B vector register. | +| `VD` | vpkuhus: write; vpkuhus128: write | Destination vector register. | +| `VSCR` | vpkuhus: write; vpkuhus128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkuhus` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkuhus128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkuhus`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkuhus128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkuhus`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhus"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1931`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1931) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:452`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L452) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4044-4056`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4044-L4056) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuhus | PpcOpcode::vpkuhus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u16x8(); + let b = ctx.vr[rb].as_u16x8(); + let mut r = [0u8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`vpkuhus128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuhus128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1934`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1934) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:115`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L115) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:628`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L628) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4044-4056`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4044-L4056) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuhus | PpcOpcode::vpkuhus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuhus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u16x8(); + let b = ctx.vr[rb].as_u16x8(); + let mut r = [0u8; 16]; let mut sat = false; + for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(a[i]); r[i] = v; sat |= s; } + for i in 0..8 { let (v, s) = crate::vmx::sat_u16_to_u8(b[i]); r[8 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word → unsigned byte saturating pack.** Each of the 16 input half-word lanes (interpreted as `uint16`) is clamped to `[0, 255]`. Values above 255 produce 255 and sticky-set `VSCR[SAT]`. +- **Lane-count doubling.** 16 half-word lanes → 16 byte lanes; `VA` first, then `VB`. +- **Pair with [`vpkuhum`](vpkuhum.md)** when saturation is not desired (truncate the low byte instead). +- **`VSCR[SAT]` is sticky.** +- **No `Rc`, no XER.** +- **VMX128 sibling [`vpkuhus128`](vpkuhus128.md).** + +## Related Instructions + +- [`vpkuhum`](vpkuhum.md) — modulo counterpart. +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — signed-input half-word packs. +- [`vpkuwus`](vpkuwus.md), [`vpkuwum`](vpkuwum.md) — word → half-word analogues. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — sign-extending byte → half-word unpacks. +- [`vperm`](vperm.md) — programmable alternative for irregular packs. + +## IBM Reference + +- [AIX 7.3 — `vpkuhus` (Vector Pack Unsigned Half Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuhus-vector-pack-unsigned-half-word-unsigned-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkuwum.md b/migration/project-root/ppc-manual/vmx/vpkuwum.md new file mode 100644 index 0000000..68747b4 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkuwum.md @@ -0,0 +1,188 @@ +# `vpkuwum` — Vector Pack Unsigned Word Unsigned Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000004e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkuwum` | `vpkuwum` | — | Vector Pack Unsigned Word Unsigned Modulo | +| `vpkuwum128` | `vpkuwum128` | — | Vector128 Pack Unsigned Word Unsigned Modulo | + +## Syntax + +```asm +vpkuwum [VD], [VA], [VB] +vpkuwum128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkuwum` — form `VX` + +- **Opcode word:** `0x1000004e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `78` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkuwum128` — form `VX128` + +- **Opcode word:** `0x14000380` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `896` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkuwum: read; vpkuwum128: read | Source A vector register. | +| `VB` | vpkuwum: read; vpkuwum128: read | Source B vector register. | +| `VD` | vpkuwum: write; vpkuwum128: write | Destination vector register. | + +## Register Effects + +### `vpkuwum` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vpkuwum128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkuwum`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwum"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1973`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1973) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:447`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L447) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4031-4042`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4031-L4042) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuwum | PpcOpcode::vpkuwum128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwum128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u32x4(); + let b = ctx.vr[rb].as_u32x4(); + let mut r = [0u16; 8]; + for i in 0..4 { r[i] = a[i] as u16; } + for i in 0..4 { r[4 + i] = b[i] as u16; } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ +**`vpkuwum128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwum128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1976`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1976) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:630`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L630) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4031-4042`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4031-L4042) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuwum | PpcOpcode::vpkuwum128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwum128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u32x4(); + let b = ctx.vr[rb].as_u32x4(); + let mut r = [0u16; 8]; + for i in 0..4 { r[i] = a[i] as u16; } + for i in 0..4 { r[4 + i] = b[i] as u16; } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned word → half-word modulo pack.** Each of the 8 input word lanes is truncated to its low 16 bits. +- **Lane-count doubling.** 4+4 word lanes → 8 half-word lanes; `VA`'s 4 words into `VD.h[0..3]`, `VB`'s into `VD.h[4..7]`. +- **`VSCR[SAT]` never touched** (modulo variant). Use [`vpkuwus`](vpkuwus.md) for saturation. +- **Cheap low-half extract.** Typical after a 32-bit lane accumulator is "narrowed" back down to 16-bit for storage. +- **No `Rc`, no XER.** +- **VMX128 sibling [`vpkuwum128`](vpkuwum128.md).** + +## Related Instructions + +- [`vpkuwus`](vpkuwus.md) — saturating counterpart. +- [`vpkuhum`](vpkuhum.md) — half-word → byte modulo. +- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — signed word inputs. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — signed-half-word → word unpacks. +- [`vperm`](vperm.md) — alternative for irregular patterns. + +## IBM Reference + +- [AIX 7.3 — `vpkuwum` (Vector Pack Unsigned Word Unsigned Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuwum-vector-pack-unsigned-word-unsigned-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vpkuwus.md b/migration/project-root/ppc-manual/vmx/vpkuwus.md new file mode 100644 index 0000000..beef684 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vpkuwus.md @@ -0,0 +1,192 @@ +# `vpkuwus` — Vector Pack Unsigned Word Unsigned Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100000ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkuwus` | `vpkuwus` | — | Vector Pack Unsigned Word Unsigned Saturate | +| `vpkuwus128` | `vpkuwus128` | — | Vector128 Pack Unsigned Word Unsigned Saturate | + +## Syntax + +```asm +vpkuwus [VD], [VA], [VB] +vpkuwus128 [VD], [VA], [VB] +``` + +## Encoding + +### `vpkuwus` — form `VX` + +- **Opcode word:** `0x100000ce` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `206` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vpkuwus128` — form `VX128` + +- **Opcode word:** `0x140003c0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `960` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vpkuwus: read; vpkuwus128: read | Source A vector register. | +| `VB` | vpkuwus: read; vpkuwus128: read | Source B vector register. | +| `VD` | vpkuwus: write; vpkuwus128: write | Destination vector register. | +| `VSCR` | vpkuwus: write; vpkuwus128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vpkuwus` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +### `vpkuwus128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vpkuwus`: **VSCR[SAT]** may be stickied on saturating vector operations. +- `vpkuwus128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkuwus`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwus"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1995`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1995) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:453`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L453) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4083-4095`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4083-L4095) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuwus | PpcOpcode::vpkuwus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u32x4(); + let b = ctx.vr[rb].as_u32x4(); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ +**`vpkuwus128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkuwus128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1998`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1998) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:116`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L116) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:632`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L632) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4083-4095`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4083-L4095) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkuwus | PpcOpcode::vpkuwus128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vpkuwus128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = ctx.vr[ra].as_u32x4(); + let b = ctx.vr[rb].as_u32x4(); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(a[i]); r[i] = v; sat |= s; } + for i in 0..4 { let (v, s) = crate::vmx::sat_u32_to_u16(b[i]); r[4 + i] = v; sat |= s; } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[rd] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned word → unsigned half-word saturating pack.** Each of the 8 input word lanes (interpreted as `uint32`) is clamped to `[0, 65535]`. Overflow sticky-sets `VSCR[SAT]`. +- **Lane-count doubling.** 8 word lanes → 8 half-word lanes; `VA` then `VB`. +- **Pair with [`vpkuwum`](vpkuwum.md)** when a modulo wrap is required. +- **`VSCR[SAT]` is sticky.** +- **No `Rc`, no XER.** +- **VMX128 sibling [`vpkuwus128`](vpkuwus128.md).** + +## Related Instructions + +- [`vpkuwum`](vpkuwum.md) — modulo counterpart. +- [`vpkswus`](vpkswus.md), [`vpkswss`](vpkswss.md) — signed-word input. +- [`vpkuhus`](vpkuhus.md) — half-word → byte saturating pack. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — inverse unpack (sign-extending). +- [`vperm`](vperm.md) — irregular pack. + +## IBM Reference + +- [AIX 7.3 — `vpkuwus` (Vector Pack Unsigned Word Unsigned Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vpkuwus-vector-pack-unsigned-word-unsigned-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrefp.md b/migration/project-root/ppc-manual/vmx/vrefp.md new file mode 100644 index 0000000..15f9fec --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrefp.md @@ -0,0 +1,176 @@ +# `vrefp` — Vector Reciprocal Estimate Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000010a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrefp` | `vrefp` | — | Vector Reciprocal Estimate Floating Point | +| `vrefp128` | `vrefp128` | — | Vector128 Reciprocal Estimate Floating Point | + +## Syntax + +```asm +vrefp [VD], [VB] +vrefp128 [VD], [VB] +``` + +## Encoding + +### `vrefp` — form `VX` + +- **Opcode word:** `0x1000010a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `266` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrefp128` — form `VX128_3` + +- **Opcode word:** `0x18000630` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1584` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrefp: read; vrefp128: read | Source B vector register. | +| `VD` | vrefp: write; vrefp128: write | Destination vector register. | + +## Register Effects + +### `vrefp` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrefp128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrefp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrefp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1227`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1227) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:117`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L117) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:457`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L457) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2153-2161`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2153-L2161) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrefp | PpcOpcode::vrefp128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = 1.0 / b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrefp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrefp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1230`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1230) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:117`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L117) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:664`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L664) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2153-2161`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2153-L2161) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrefp | PpcOpcode::vrefp128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrefp128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = 1.0 / b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Lane-wise reciprocal *estimate*.** Each 32-bit float lane of `VB` is approximated by `1.0 / VB[i]`. The PowerPC spec permits an **estimate** accurate to about 1/4096 (≈12 bits); xenia-rs produces the *exact* IEEE-754 reciprocal by dividing, trading accuracy for simplicity. Game code that cares about bit-reproducible behaviour should Newton-iterate with [`vnmsubfp`](vnmsubfp.md) regardless of which backend computes the seed. +- **Standard Newton iteration.** `x₁ = x₀ * (2 − VB * x₀)`, expressible as `vnmsubfp x₁, x₀, VB, 2.0f` followed by `vmaddfp x₁, x₀, x₁, 0.0f` (or similar). One iteration roughly doubles the valid bit count. +- **IEEE-754 binary32 lanes; `VSCR[NJ]` honoured** (denormals flush to zero when `NJ = 1`). +- **No VSCR[SAT] update, no FPSCR update, no exception.** Division by zero yields ±∞; division of zero yields ±∞ too (same sign convention). +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrefp128`](vrefp128.md).** + +## Related Instructions + +- [`vrsqrtefp`](vrsqrtefp.md) — reciprocal *square root* estimate, used with the same Newton scheme. +- [`vmaddfp`](vmaddfp.md), [`vnmsubfp`](vnmsubfp.md) — the building blocks of the Newton iteration. +- [`vexptefp`](vexptefp.md), [`vlogefp`](vlogefp.md) — other "estimate"-style transcendentals. +- [`vaddfp`](vaddfp.md), [`vsubfp`](vsubfp.md) — the float add/sub. + +## IBM Reference + +- [AIX 7.3 — `vrefp` (Vector Reciprocal Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrefp-vector-reciprocal-estimate-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrfim.md b/migration/project-root/ppc-manual/vmx/vrfim.md new file mode 100644 index 0000000..47163e5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrfim.md @@ -0,0 +1,177 @@ +# `vrfim` — Vector Round to Floating-Point Integer toward -Infinity + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100002ca` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrfim` | `vrfim` | — | Vector Round to Floating-Point Integer toward -Infinity | +| `vrfim128` | `vrfim128` | — | Vector128 Round to Floating-Point Integer toward -Infinity | + +## Syntax + +```asm +vrfim [VD], [VB] +vrfim128 [VD], [VB] +``` + +## Encoding + +### `vrfim` — form `VX` + +- **Opcode word:** `0x100002ca` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `714` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrfim128` — form `VX128_3` + +- **Opcode word:** `0x18000330` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `816` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrfim: read; vrfim128: read | Source B vector register. | +| `VD` | vrfim: write; vrfim128: write | Destination vector register. | + +## Register Effects + +### `vrfim` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrfim128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrfim`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfim"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1240`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1240) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:496`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L496) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2493-2501`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2493-L2501) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfim | PpcOpcode::vrfim128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].floor(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrfim128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfim128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1243`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1243) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:660`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L660) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2493-2501`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2493-L2501) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfim | PpcOpcode::vrfim128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfim128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].floor(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Round toward minus-infinity (floor).** Each 32-bit float lane of `VB` is rounded down to the nearest integer value still representable as a float. `3.2 → 3.0`, `−3.2 → −4.0`. +- **IEEE-754 binary32 output; `VSCR[NJ]` honoured** (denormal flush-to-zero). +- **Integer-too-big lanes are a no-op:** values already ≥ 2²³ in magnitude are all-integer and unchanged. +- **NaN propagation.** NaN input → NaN output. `±∞` → `±∞`. +- **No VSCR[SAT], no FPSCR update.** No "inexact" trap flag; this is the VMX rounding variant that deliberately ignores FPSCR's rounding mode. +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrfim128`](vrfim128.md).** + +## Related Instructions + +- [`vrfin`](vrfin.md) — round to nearest (ties-to-even). +- [`vrfip`](vrfip.md) — round toward +∞ (ceiling). +- [`vrfiz`](vrfiz.md) — round toward zero (truncate). +- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer conversion with explicit scale. + +## IBM Reference + +- [AIX 7.3 — `vrfim` (Vector Round to Floating-Point Integer toward Minus Infinity)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfim-vector-round-floating-point-integer-toward-minus-infinity-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrfin.md b/migration/project-root/ppc-manual/vmx/vrfin.md new file mode 100644 index 0000000..a30db05 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrfin.md @@ -0,0 +1,181 @@ +# `vrfin` — Vector Round to Floating-Point Integer Nearest + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000020a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrfin` | `vrfin` | — | Vector Round to Floating-Point Integer Nearest | +| `vrfin128` | `vrfin128` | — | Vector128 Round to Floating-Point Integer Nearest | + +## Syntax + +```asm +vrfin [VD], [VB] +vrfin128 [VD], [VB] +``` + +## Encoding + +### `vrfin` — form `VX` + +- **Opcode word:** `0x1000020a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `522` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrfin128` — form `VX128_3` + +- **Opcode word:** `0x18000370` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `880` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrfin: read; vrfin128: read | Source B vector register. | +| `VD` | vrfin: write; vrfin128: write | Destination vector register. | + +## Register Effects + +### `vrfin` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrfin128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrfin`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfin"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1253`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1253) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:479`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L479) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2473-2483`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2473-L2483) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfin | PpcOpcode::vrfin128 => { + // PPCBUG-432: ISA round-to-nearest-even, NOT Rust's `round()` + // (which is round-half-away-from-zero). + let vb = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].round_ties_even(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrfin128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfin128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1256`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1256) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:661`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L661) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2473-2483`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2473-L2483) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfin | PpcOpcode::vrfin128 => { + // PPCBUG-432: ISA round-to-nearest-even, NOT Rust's `round()` + // (which is round-half-away-from-zero). + let vb = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfin128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].round_ties_even(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Round to nearest integer.** Each 32-bit float lane of `VB` is rounded to the nearest representable integer value. Xenia-rs uses Rust's `f32::round`, which rounds half-away-from-zero; the hardware Xenon actually implements round-ties-to-even. This is a known small mismatch tracked in xenia. +- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.** +- **Integer-too-big lanes are no-ops** (|x| ≥ 2²³). +- **NaN and ±∞** pass through unchanged. +- **No VSCR[SAT], no FPSCR update.** +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrfin128`](vrfin128.md).** + +## Related Instructions + +- [`vrfim`](vrfim.md) — round toward −∞. +- [`vrfip`](vrfip.md) — round toward +∞. +- [`vrfiz`](vrfiz.md) — round toward zero. +- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer. + +## IBM Reference + +- [AIX 7.3 — `vrfin` (Vector Round to Floating-Point Integer to Nearest)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfin-vector-round-floating-point-integer-nearest-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrfip.md b/migration/project-root/ppc-manual/vmx/vrfip.md new file mode 100644 index 0000000..3c358b8 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrfip.md @@ -0,0 +1,177 @@ +# `vrfip` — Vector Round to Floating-Point Integer toward +Infinity + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000028a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrfip` | `vrfip` | — | Vector Round to Floating-Point Integer toward +Infinity | +| `vrfip128` | `vrfip128` | — | Vector128 Round to Floating-Point Integer toward +Infinity | + +## Syntax + +```asm +vrfip [VD], [VB] +vrfip128 [VD], [VB] +``` + +## Encoding + +### `vrfip` — form `VX` + +- **Opcode word:** `0x1000028a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `650` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrfip128` — form `VX128_3` + +- **Opcode word:** `0x180003b0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `944` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrfip: read; vrfip128: read | Source B vector register. | +| `VD` | vrfip: write; vrfip128: write | Destination vector register. | + +## Register Effects + +### `vrfip` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrfip128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrfip`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfip"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1266`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1266) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:492`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L492) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2484-2492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2484-L2492) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfip | PpcOpcode::vrfip128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].ceil(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrfip128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfip128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1269`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1269) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:662`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L662) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2484-2492`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2484-L2492) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfip | PpcOpcode::vrfip128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfip128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].ceil(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Round toward plus-infinity (ceiling).** Each 32-bit float lane of `VB` is rounded up. `3.2 → 4.0`, `−3.2 → −3.0`. +- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.** +- **Integer-too-big lanes are no-ops.** +- **NaN and ±∞** pass through. +- **No VSCR[SAT], no FPSCR update.** +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrfip128`](vrfip128.md).** + +## Related Instructions + +- [`vrfim`](vrfim.md) — round toward −∞ (the symmetric partner). +- [`vrfin`](vrfin.md) — round to nearest. +- [`vrfiz`](vrfiz.md) — round toward zero. +- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → fixed-point integer. + +## IBM Reference + +- [AIX 7.3 — `vrfip` (Vector Round to Floating-Point Integer toward Plus Infinity)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfip-vector-round-floating-point-integer-toward-plus-infinity-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrfiz.md b/migration/project-root/ppc-manual/vmx/vrfiz.md new file mode 100644 index 0000000..3e45e08 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrfiz.md @@ -0,0 +1,175 @@ +# `vrfiz` — Vector Round to Floating-Point Integer toward Zero + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000024a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrfiz` | `vrfiz` | — | Vector Round to Floating-Point Integer toward Zero | +| `vrfiz128` | `vrfiz128` | — | Vector128 Round to Floating-Point Integer toward Zero | + +## Syntax + +```asm +vrfiz [VD], [VB] +vrfiz128 [VD], [VB] +``` + +## Encoding + +### `vrfiz` — form `VX` + +- **Opcode word:** `0x1000024a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `586` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrfiz128` — form `VX128_3` + +- **Opcode word:** `0x180003f0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1008` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrfiz: read; vrfiz128: read | Source B vector register. | +| `VD` | vrfiz: write; vrfiz128: write | Destination vector register. | + +## Register Effects + +### `vrfiz` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrfiz128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrfiz`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfiz"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1279`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1279) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:486`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L486) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2464-2472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2464-L2472) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfiz | PpcOpcode::vrfiz128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].trunc(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrfiz128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrfiz128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1282`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1282) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:118`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L118) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:663`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L663) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2464-2472`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2464-L2472) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrfiz | PpcOpcode::vrfiz128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrfiz128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = b[i].trunc(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Round toward zero (truncate).** Each 32-bit float lane of `VB` has its fractional part dropped. `3.7 → 3.0`, `−3.7 → −3.0`. +- **IEEE-754 binary32 output; `VSCR[NJ]` honoured.** +- **Integer-too-big lanes are no-ops.** +- **NaN and ±∞** pass through. +- **No VSCR[SAT], no FPSCR update.** `vrfiz` is the VMX analogue of C's `truncf`. +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrfiz128`](vrfiz128.md).** + +## Related Instructions + +- [`vrfin`](vrfin.md), [`vrfim`](vrfim.md), [`vrfip`](vrfip.md) — the other three rounding modes. +- [`vctsxs`](vctsxs.md), [`vctuxs`](vctuxs.md) — float → signed / unsigned fixed-point (these truncate too, and also apply a `UIMM` power-of-two scale). + +## IBM Reference + +- [AIX 7.3 — `vrfiz` (Vector Round to Floating-Point Integer toward Zero)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrfiz-vector-round-floating-point-integer-toward-zero-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrlb.md b/migration/project-root/ppc-manual/vmx/vrlb.md new file mode 100644 index 0000000..49e1440 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrlb.md @@ -0,0 +1,129 @@ +# `vrlb` — Vector Rotate Left Integer Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000004` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrlb` | `vrlb` | — | Vector Rotate Left Integer Byte | + +## Syntax + +```asm +vrlb [VD], [VA], [VB] +``` + +## Encoding + +### `vrlb` — form `VX` + +- **Opcode word:** `0x10000004` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `4` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vrlb: read | Source A vector register. | +| `VB` | vrlb: read | Source B vector register. | +| `VD` | vrlb: write | Destination vector register. | + +## Register Effects + +### `vrlb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrlb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1286`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1286) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:436`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L436) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3876-3883`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3876-L3883) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrlb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i].rotate_left((b[i] & 7) as u32); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane left-rotate of bytes.** For each of the 16 byte lanes, `VD.b[i] = rotate_left(VA.b[i], VB.b[i] & 7)`. The low 3 bits of each shift-count byte are used; upper bits are ignored. +- **Shift counts are per-lane, not scalar.** Unlike most CPUs' vector rotate, Altivec's shift/rotate takes a whole vector as the shift-count. If you want a uniform rotate, splat first with [`vspltb`](vspltb.md). +- **Big-endian byte lanes.** `VA.b[0]` is the most significant byte. +- **No overflow, no sticky saturation.** Rotation is information-preserving. +- **No `Rc`, no XER, no VSCR side-effect.** +- **No VMX128 sibling.** Xenon software that needs per-byte rotation typically uses `vrlw` on pre-swizzled data. + +## Related Instructions + +- [`vrlh`](vrlh.md), [`vrlw`](vrlw.md) — half-word / word rotates (same "per-lane rotate count" pattern). +- [`vslb`](vslb.md), [`vsrb`](vsrb.md), [`vsrab`](vsrab.md) — byte shift-left / logical-right / arithmetic-right. +- [`vsl`](vsl.md), [`vsr`](vsr.md), [`vslo`](vslo.md), [`vsro`](vsro.md) — whole-register shifts. +- [`vspltb`](vspltb.md) — splat to build a uniform shift-count vector. + +## IBM Reference + +- [AIX 7.3 — `vrlb` (Vector Rotate Left Integer Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlb-vector-rotate-left-integer-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrlh.md b/migration/project-root/ppc-manual/vmx/vrlh.md new file mode 100644 index 0000000..1688528 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrlh.md @@ -0,0 +1,129 @@ +# `vrlh` — Vector Rotate Left Integer Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000044` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrlh` | `vrlh` | — | Vector Rotate Left Integer Half Word | + +## Syntax + +```asm +vrlh [VD], [VA], [VB] +``` + +## Encoding + +### `vrlh` — form `VX` + +- **Opcode word:** `0x10000044` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `68` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vrlh: read | Source A vector register. | +| `VB` | vrlh: read | Source B vector register. | +| `VD` | vrlh: write | Destination vector register. | + +## Register Effects + +### `vrlh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrlh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1294`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1294) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:443`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L443) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3908-3915`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3908-L3915) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrlh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i].rotate_left((b[i] & 0xF) as u32); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane left-rotate of half-words.** For each of the 8 half-word lanes, `VD.h[i] = rotate_left(VA.h[i], VB.h[i] & 0xF)`. Low 4 bits of each shift-count half-word are used. +- **Per-lane shift counts.** Splat first (`vsplth`) if a uniform rotate is needed. +- **Big-endian half-word lanes.** Lane 0 is the most significant pair of bytes. +- **No overflow, no saturation.** Rotation is information-preserving. +- **No `Rc`, no XER, no VSCR effect.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vrlb`](vrlb.md), [`vrlw`](vrlw.md) — byte / word rotate siblings. +- [`vslh`](vslh.md), [`vsrh`](vsrh.md), [`vsrah`](vsrah.md) — half-word shift-left / logical-right / arithmetic-right. +- [`vsl`](vsl.md), [`vsr`](vsr.md) — bit-level whole-register shifts. +- [`vsplth`](vsplth.md) — splat to build a uniform shift-count vector. + +## IBM Reference + +- [AIX 7.3 — `vrlh` (Vector Rotate Left Integer Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlh-vector-rotate-left-integer-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrlw.md b/migration/project-root/ppc-manual/vmx/vrlw.md new file mode 100644 index 0000000..73b65e9 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrlw.md @@ -0,0 +1,189 @@ +# `vrlw` — Vector Rotate Left Integer Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000084` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrlw` | `vrlw` | — | Vector Rotate Left Integer Word | +| `vrlw128` | `vrlw128` | — | Vector128 Rotate Left Word | + +## Syntax + +```asm +vrlw [VD], [VA], [VB] +vrlw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vrlw` — form `VX` + +- **Opcode word:** `0x10000084` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `132` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrlw128` — form `VX128` + +- **Opcode word:** `0x18000050` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `80` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vrlw: read; vrlw128: read | Source A vector register. | +| `VB` | vrlw: read; vrlw128: read | Source B vector register. | +| `VD` | vrlw: write; vrlw128: write | Destination vector register. | + +## Register Effects + +### `vrlw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrlw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrlw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1308`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1308) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:450`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L450) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2450-2461`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2450-L2461) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrlw | PpcOpcode::vrlw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i].rotate_left(sh); + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrlw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1311`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1311) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:692`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L692) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2450-2461`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2450-L2461) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrlw | PpcOpcode::vrlw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i].rotate_left(sh); + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane left-rotate of words.** For each of the 4 word lanes, `VD.w[i] = rotate_left(VA.w[i], VB.w[i] & 0x1F)`. Low 5 bits of each shift-count word are used. +- **Per-lane shift counts.** Splat with [`vspltw`](vspltw.md) or [`vspltisw`](vspltisw.md) for uniform rotation. +- **Big-endian word lanes.** Lane 0 is the most significant 4 bytes. +- **No overflow, no saturation.** +- **No `Rc`, no XER, no VSCR effect.** +- **VMX128 sibling [`vrlw128`](vrlw128.md)** — same op with the wider register file. +- **Building block for [`vrlimi128`](../vmx128/vrlimi128.md).** VMX128 fuses a rotate with an immediate-masked insert for cheaper bitfield shuffles; `vrlw` is the plain variant that the XDK uses for scalar-style 32-bit rotates. + +## Related Instructions + +- [`vrlb`](vrlb.md), [`vrlh`](vrlh.md) — byte / half-word rotate siblings. +- [`vslw`](vslw.md), [`vsrw`](vsrw.md), [`vsraw`](vsraw.md) — word shift-left / logical-right / arithmetic-right. +- [`vsl`](vsl.md), [`vsr`](vsr.md) — bit-level whole-register shifts. +- [`vspltw`](vspltw.md), [`vspltisw`](vspltisw.md) — splat sources for uniform shift counts. +- [`vrlimi128`](../vmx128/vrlimi128.md) — rotate + mask-insert (VMX128-exclusive). + +## IBM Reference + +- [AIX 7.3 — `vrlw` (Vector Rotate Left Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrlw-vector-rotate-left-integer-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vrsqrtefp.md b/migration/project-root/ppc-manual/vmx/vrsqrtefp.md new file mode 100644 index 0000000..7a6173d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vrsqrtefp.md @@ -0,0 +1,176 @@ +# `vrsqrtefp` — Vector Reciprocal Square Root Estimate Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000014a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrsqrtefp` | `vrsqrtefp` | — | Vector Reciprocal Square Root Estimate Floating Point | +| `vrsqrtefp128` | `vrsqrtefp128` | — | Vector128 Reciprocal Square Root Estimate Floating Point | + +## Syntax + +```asm +vrsqrtefp [VD], [VB] +vrsqrtefp128 [VD], [VB] +``` + +## Encoding + +### `vrsqrtefp` — form `VX` + +- **Opcode word:** `0x1000014a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `330` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vrsqrtefp128` — form `VX128_3` + +- **Opcode word:** `0x18000670` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1648` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrsqrtefp: read; vrsqrtefp128: read | Source B vector register. | +| `VD` | vrsqrtefp: write; vrsqrtefp128: write | Destination vector register. | + +## Register Effects + +### `vrsqrtefp` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vrsqrtefp128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrsqrtefp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrsqrtefp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1371`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1371) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:120`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L120) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:463`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L463) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2162-2170`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2162-L2170) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrsqrtefp | PpcOpcode::vrsqrtefp128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = 1.0 / b[i].sqrt(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vrsqrtefp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrsqrtefp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1374`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1374) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:120`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L120) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:665`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L665) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2162-2170`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2162-L2170) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrsqrtefp | PpcOpcode::vrsqrtefp128 => { + let vb = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vb128() } else { instr.rb() }; + let vd = if matches!(instr.opcode, PpcOpcode::vrsqrtefp128) { instr.vd128() } else { instr.rd() }; + let b = ctx.vr[vb].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = 1.0 / b[i].sqrt(); } + ctx.vr[vd] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Lane-wise reciprocal-square-root *estimate*.** Each 32-bit float lane of `VB` is approximated by `1.0 / sqrt(VB[i])`. The PowerPC spec permits a 12-bit estimate; xenia-rs computes the exact IEEE-754 result. Games that depend on Xenon's low-precision estimate may need a helper to truncate bits to match hardware. +- **Standard Newton iteration (Quake-style):** `x₁ = x₀ * (1.5 − 0.5 * VB * x₀²)`. One pass produces ~24 bits of precision — essentially indistinguishable from a true `1/sqrt`. +- **Negative input is a trap** in math terms but not in ISA terms: the hardware returns a QNaN. `sqrt(−x)` for `x > 0` → QNaN. Zero produces `+∞` (and may sticky-set no bits). +- **IEEE-754 binary32; `VSCR[NJ]` honoured.** +- **No VSCR[SAT], no FPSCR update, no exception.** +- **Big-endian lane indexing.** +- **VMX128 sibling [`vrsqrtefp128`](vrsqrtefp128.md).** + +## Related Instructions + +- [`vrefp`](vrefp.md) — plain reciprocal estimate. +- [`vmaddfp`](vmaddfp.md), [`vnmsubfp`](vnmsubfp.md) — the Newton iteration primitives. +- [`vexptefp`](vexptefp.md), [`vlogefp`](vlogefp.md) — the other transcendental estimates. + +## IBM Reference + +- [AIX 7.3 — `vrsqrtefp` (Vector Reciprocal Square Root Estimate Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vrsqrtefp-vector-reciprocal-square-root-estimate-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsel.md b/migration/project-root/ppc-manual/vmx/vsel.md new file mode 100644 index 0000000..01b23c7 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsel.md @@ -0,0 +1,210 @@ +# `vsel` — Vector Conditional Select + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsel` | `vsel` | — | Vector Conditional Select | +| `vsel128` | `vsel128` | — | Vector128 Conditional Select | + +## Syntax + +```asm +vsel [VD], [VA], [VB], [VC] +vsel128 [VD], [VA], [VB], [VD] +``` + +## Encoding + +### `vsel` — form `VA` + +- **Opcode word:** `0x1000002a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `42` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +### `vsel128` — form `VX128` + +- **Opcode word:** `0x14000350` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `848` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsel: read; vsel128: read | Source A vector register. | +| `VB` | vsel: read; vsel128: read | Source B vector register. | +| `VC` | vsel: read | Source C vector register / 3-bit selector. | +| `VD` | vsel: write; vsel128: read; vsel128: write | Destination vector register. | + +## Register Effects + +### `vsel` + +- **Reads (always):** `VA`, `VB`, `VC` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsel128` + +- **Reads (always):** `VA`, `VB`, `VD` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsel`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsel"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1386`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1386) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:121`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L121) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:585`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L585) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2253-2275`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2253-L2275) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsel | PpcOpcode::vsel128 => { + // vD = (vA & ~vC) | (vB & vC) + let (va, vb, vd); + let vc; + if matches!(instr.opcode, PpcOpcode::vsel128) { + va = instr.va128(); + vb = instr.vb128(); + vd = instr.vd128(); + vc = vd; // for 128, vC is encoded in vD field + } else { + va = instr.ra(); + vb = instr.rb(); + vd = instr.rd(); + vc = instr.rc(); + } + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let c = ctx.vr[vc].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = (a[i] & !c[i]) | (b[i] & c[i]); } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vsel128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsel128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1389`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1389) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:121`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L121) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:629`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L629) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2253-2275`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2253-L2275) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsel | PpcOpcode::vsel128 => { + // vD = (vA & ~vC) | (vB & vC) + let (va, vb, vd); + let vc; + if matches!(instr.opcode, PpcOpcode::vsel128) { + va = instr.va128(); + vb = instr.vb128(); + vd = instr.vd128(); + vc = vd; // for 128, vC is encoded in vD field + } else { + va = instr.ra(); + vb = instr.rb(); + vd = instr.rd(); + vc = instr.rc(); + } + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let c = ctx.vr[vc].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = (a[i] & !c[i]) | (b[i] & c[i]); } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-bit select.** `VD = (VA & ~VC) | (VB & VC)`. Evaluated bit-by-bit across the full 128-bit register — not per-lane. Any granularity (byte / half / word) is valid because each bit is independent. +- **Classic "bitwise conditional move".** The canonical use is: compare produces an all-ones / all-zeros mask in `VC`, then `vsel` picks between two data vectors. Because the mask is all-or-nothing per lane, `vsel` behaves identically to a per-lane conditional move in that common case. +- **Mask does not need to be all-ones / all-zeros.** Partial masks produce interleaved bits, which is useful for bitfield merges. +- **`vsel128` read pattern is atypical:** the destination `VD` is **also an input**. The VMX128 encoding reuses the destination's 7 bits to carry one of the three source operands (xenia's interpreter arm handles this — see `vsel128` Register Effects above). Compilers express this as `vsel v3, v4, v5, v3` even though `v3` is also the destination. +- **No flags, no VSCR.** No dedicated VMX128 separate-control-register sibling; `vsel128` covers the VMX128 case. +- **Cheaper than `vand` + `vandc` + `vor`.** `vsel` is a single-cycle primitive on Xenon. + +## Related Instructions + +- [`vand`](vand.md), [`vandc`](vandc.md), [`vor`](vor.md), [`vnor`](vnor.md), [`vxor`](vxor.md) — the boolean primitives `vsel` replaces when composed. +- [`vcmpequb`](vcmpequb.md), [`vcmpequh`](vcmpequh.md), [`vcmpequw`](vcmpequw.md), [`vcmpgtsb`](vcmpgtsb.md) and relatives — the usual source of the select mask. +- [`vperm`](vperm.md) — byte-level permute; uses an index vector rather than a boolean mask. + +## IBM Reference + +- [AIX 7.3 — `vsel` (Vector Conditional Select)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsel-vector-conditional-select-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsl.md b/migration/project-root/ppc-manual/vmx/vsl.md new file mode 100644 index 0000000..db5ffe0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsl.md @@ -0,0 +1,128 @@ +# `vsl` — Vector Shift Left + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100001c4` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsl` | `vsl` | — | Vector Shift Left | + +## Syntax + +```asm +vsl [VD], [VA], [VB] +``` + +## Encoding + +### `vsl` — form `VX` + +- **Opcode word:** `0x100001c4` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `452` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsl: read | Source A vector register. | +| `VB` | vsl: read | Source B vector register. | +| `VD` | vsl: write | Destination vector register. | + +## Register Effects + +### `vsl` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsl`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsl"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1402`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1402) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:472`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L472) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3920-3926`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3920-L3926) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsl => { + let a = u128::from_be_bytes(ctx.vr[instr.ra()].as_bytes()); + let shift = (ctx.vr[instr.rb()].as_bytes()[15] & 7) as u32; + let r = if shift == 0 { a } else { a << shift }; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Whole-register bit shift-left.** The 128-bit value `VA` is shifted left by `N` bits, where `N = VB.b[15] & 7` — i.e. the low 3 bits of the **last** (least-significant) byte of `VB`. Bits shifted out the top are discarded; zero-fill on the right. +- **Shift count constraint.** The ISA requires the same 3-bit shift count in all 16 bytes of `VB`; behaviour is undefined otherwise (xenia-rs reads only byte 15 as above). Compilers guarantee this by splatting the shift count first. +- **Combine with [`vslo`](vslo.md) for up to 127-bit shifts.** `vslo` handles the byte-granular component; `vsl` picks up the remaining 0..7 bits. The canonical 128-bit shift-left is `vslo` followed by `vsl`. +- **Big-endian.** "Left" means toward the MSB end of the register. +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vslo`](vslo.md) — whole-register shift-left by *octets* (bytes). +- [`vsr`](vsr.md), [`vsro`](vsro.md) — the right-shift counterparts. +- [`vsldoi`](vsldoi.md) — static-immediate byte shift of `VA ‖ VB`. +- [`vslb`](vslb.md), [`vslh`](vslh.md), [`vslw`](vslw.md) — per-lane logical shifts. + +## IBM Reference + +- [AIX 7.3 — `vsl` (Vector Shift Left)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsl-vector-shift-left-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vslb.md b/migration/project-root/ppc-manual/vmx/vslb.md new file mode 100644 index 0000000..ee7c334 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vslb.md @@ -0,0 +1,131 @@ +# `vslb` — Vector Shift Left Integer Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000104` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vslb` | `vslb` | — | Vector Shift Left Integer Byte | + +## Syntax + +```asm +vslb [VD], [VA], [VB] +``` + +## Encoding + +### `vslb` — form `VX` + +- **Opcode word:** `0x10000104` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `260` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vslb: read | Source A vector register. | +| `VB` | vslb: read | Source B vector register. | +| `VD` | vslb: write | Destination vector register. | + +## Register Effects + +### `vslb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vslb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1413`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1413) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:455`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L455) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3852-3859`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3852-L3859) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i] << (b[i] & 7); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical shift-left of bytes.** `VD.b[i] = VA.b[i] << (VB.b[i] & 7)` for `i ∈ 0..15`. Only the low 3 bits of each shift-count byte are honoured. +- **Per-lane shift counts.** To shift every byte by the same amount, splat with [`vspltb`](vspltb.md) or use an immediate splat [`vspltisb`](vspltisb.md). +- **Zero-fill.** Bits shifted out the top are discarded; the right end is zero-filled. Contrast with the rotate [`vrlb`](vrlb.md) which wraps. +- **Big-endian byte indexing.** +- **No flags, no VSCR.** No overflow signal — bits are silently lost. +- **No VMX128 sibling.** Xenon software uses `vslw`-on-prepackaged-data or [`vrlimi128`](../vmx128/vrlimi128.md) for common cases. + +## Related Instructions + +- [`vsrb`](vsrb.md) — logical-right twin. +- [`vsrab`](vsrab.md) — arithmetic-right (sign-extending) byte shift. +- [`vrlb`](vrlb.md) — left-rotate byte. +- [`vslh`](vslh.md), [`vslw`](vslw.md) — half-word / word logical-left shifts. +- [`vsl`](vsl.md), [`vslo`](vslo.md) — bit- and octet-level whole-register shifts. +- [`vspltb`](vspltb.md), [`vspltisb`](vspltisb.md) — splat sources for uniform shift counts. + +## IBM Reference + +- [AIX 7.3 — `vslb` (Vector Shift Left Integer Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vslb-vector-shift-left-integer-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsldoi.md b/migration/project-root/ppc-manual/vmx/vsldoi.md new file mode 100644 index 0000000..43f45f5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsldoi.md @@ -0,0 +1,190 @@ +# `vsldoi` — Vector Shift Left Double by Octet Immediate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VA](../forms/VA.md) · **Opcode:** `0x1000002c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsldoi` | `vsldoi` | — | Vector Shift Left Double by Octet Immediate | +| `vsldoi128` | `vsldoi128` | — | Vector128 Shift Left Double by Octet Immediate | + +## Syntax + +```asm +vsldoi [VD], [VA], [VB], [SHB] +vsldoi128 [VD], [VA], [VB], [SHB] +``` + +## Encoding + +### `vsldoi` — form `VA` + +- **Opcode word:** `0x1000002c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `44` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT` | destination vector register | +| 11–15 | `VRA` | source A | +| 16–20 | `VRB` | source B | +| 21–25 | `VRC` | source C / shift | +| 26–31 | `XO` | extended opcode (6 bits) | + +### `vsldoi128` — form `VX128_5` + +- **Opcode word:** `0x10000010` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `16` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22–25 | `SH` | 4-bit shift amount | +| 26 | `VA128h` | source A middle bit | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsldoi: read; vsldoi128: read | Source A vector register. | +| `VB` | vsldoi: read; vsldoi128: read | Source B vector register. | +| `SHB` | vsldoi: read; vsldoi128: read | Shift amount (byte granularity, `vsldoi`). | +| `VD` | vsldoi: write; vsldoi128: write | Destination vector register. | + +## Register Effects + +### `vsldoi` + +- **Reads (always):** `VA`, `VB`, `SHB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsldoi128` + +- **Reads (always):** `VA`, `VB`, `SHB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsldoi`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsldoi"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1477`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1477) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:587`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L587) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2303-2314`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2303-L2314) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsldoi => { + let a_bytes = ctx.vr[instr.ra()].as_bytes(); + let b_bytes = ctx.vr[instr.rb()].as_bytes(); + let sh = ((instr.raw >> 6) & 0xF) as usize; // SH field bits 6-9 + let mut concat = [0u8; 32]; + concat[..16].copy_from_slice(&a_bytes); + concat[16..].copy_from_slice(&b_bytes); + let mut r = [0u8; 16]; + r.copy_from_slice(&concat[sh..sh + 16]); + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ +**`vsldoi128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsldoi128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1480`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1480) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:595`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L595) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2315-2327`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2315-L2327) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsldoi128 => { + let a_bytes = ctx.vr[instr.va128()].as_bytes(); + let b_bytes = ctx.vr[instr.vb128()].as_bytes(); + let sh = instr.vx128_5_sh() as usize; + let mut concat = [0u8; 32]; + concat[..16].copy_from_slice(&a_bytes); + concat[16..].copy_from_slice(&b_bytes); + let mut r = [0u8; 16]; + let sh = sh.min(16); + r.copy_from_slice(&concat[sh..sh + 16]); + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Static byte-level shift of `VA ‖ VB`.** The 4-bit `SHB` immediate names a byte offset into the 32-byte concatenation `VA ‖ VB`. The destination `VD` is the 16-byte window starting at that offset. Equivalently: `VD = (VA << (8 * SHB)) | (VB >> (8 * (16 − SHB)))`, treating the 32-byte concatenation as a single big-endian value. +- **`SHB = 0` is a register move** from `VA` to `VD`. `SHB = 16` is ill-formed; the field is 4 bits (0..15) so the range is `SHB ∈ 0..=15`. +- **Compile-time shift only.** Unlike `vperm` / `vslo` / `vsro`, the shift is an immediate. When the shift is known at compile time, `vsldoi` is strictly cheaper than an `lvsl` + `vperm` pair. +- **Unaligned-load idiom.** `vsldoi` is the static-offset counterpart to the dynamic `lvsl` + `vperm` pattern. When the misalignment is known, emit `vsldoi vD, vAL, vAH, SHB` after two aligned `lvx` loads. +- **Big-endian byte indexing.** Lane 0 is the MSB. +- **No flags, no VSCR.** +- **VMX128 sibling [`vsldoi128`](vsldoi128.md)** with the wider register file; same 4-bit `SHB` immediate. + +## Related Instructions + +- [`vslo`](vslo.md), [`vsro`](vsro.md) — byte-level (octet) shifts using a per-register count, dynamic. +- [`vsl`](vsl.md), [`vsr`](vsr.md) — bit-level whole-register shifts. +- [`vperm`](vperm.md) — general-purpose programmable byte permute. +- [`lvsl`](lvsl.md), [`lvsr`](lvsr.md) — dynamic permute-control generators. +- [`vmrghb`](vmrghb.md), [`vmrglb`](vmrglb.md) — byte-granularity merges. + +## IBM Reference + +- [AIX 7.3 — `vsldoi` (Vector Shift Left Double by Octet Immediate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsldoi-vector-shift-left-double-by-octet-immediate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vslh.md b/migration/project-root/ppc-manual/vmx/vslh.md new file mode 100644 index 0000000..42a342c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vslh.md @@ -0,0 +1,130 @@ +# `vslh` — Vector Shift Left Integer Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000144` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vslh` | `vslh` | — | Vector Shift Left Integer Half Word | + +## Syntax + +```asm +vslh [VD], [VA], [VB] +``` + +## Encoding + +### `vslh` — form `VX` + +- **Opcode word:** `0x10000144` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `324` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vslh: read | Source A vector register. | +| `VB` | vslh: read | Source B vector register. | +| `VD` | vslh: write | Destination vector register. | + +## Register Effects + +### `vslh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vslh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1419`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1419) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:461`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L461) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3884-3891`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3884-L3891) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i] << (b[i] & 0xF); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical shift-left of half-words.** `VD.h[i] = VA.h[i] << (VB.h[i] & 0xF)` for `i ∈ 0..7`. Low 4 bits of each shift-count half-word are honoured. +- **Per-lane shift counts.** Splat with [`vsplth`](vsplth.md) / [`vspltish`](vspltish.md) for uniform shifts. +- **Zero-fill on the right.** Bits lost off the top. No sign propagation — use [`vsrah`](vsrah.md) if you need arithmetic right shift. +- **Big-endian half-word indexing.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsrh`](vsrh.md) — logical-right half-word. +- [`vsrah`](vsrah.md) — arithmetic-right half-word. +- [`vrlh`](vrlh.md) — half-word rotate. +- [`vslb`](vslb.md), [`vslw`](vslw.md) — byte / word logical-left shifts. +- [`vsplth`](vsplth.md), [`vspltish`](vspltish.md) — splats for shift counts. + +## IBM Reference + +- [AIX 7.3 — `vslh` (Vector Shift Left Integer Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vslh-vector-shift-left-integer-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vslo.md b/migration/project-root/ppc-manual/vmx/vslo.md new file mode 100644 index 0000000..9cffe32 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vslo.md @@ -0,0 +1,183 @@ +# `vslo` — Vector Shift Left by Octet + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000040c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vslo` | `vslo` | — | Vector Shift Left by Octet | +| `vslo128` | `vslo128` | — | Vector128 Shift Left Octet | + +## Syntax + +```asm +vslo [VD], [VA], [VB] +vslo128 [VD], [VA], [VB] +``` + +## Encoding + +### `vslo` — form `VX` + +- **Opcode word:** `0x1000040c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1036` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vslo128` — form `VX128` + +- **Opcode word:** `0x14000390` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `912` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vslo: read; vslo128: read | Source A vector register. | +| `VB` | vslo: read; vslo128: read | Source B vector register. | +| `VD` | vslo: write; vslo128: write | Destination vector register. | + +## Register Effects + +### `vslo` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vslo128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vslo`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslo"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1496`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1496) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:523`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L523) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3935-3944`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3935-L3944) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslo | PpcOpcode::vslo128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vslo128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = u128::from_be_bytes(ctx.vr[ra].as_bytes()); + let nbytes = ((ctx.vr[rb].as_bytes()[15] >> 3) & 0xF) as u32; + let r = if nbytes == 0 { a } else { a << (nbytes * 8) }; + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ +**`vslo128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslo128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1499`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1499) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:631`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L631) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3935-3944`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3935-L3944) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslo | PpcOpcode::vslo128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vslo128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = u128::from_be_bytes(ctx.vr[ra].as_bytes()); + let nbytes = ((ctx.vr[rb].as_bytes()[15] >> 3) & 0xF) as u32; + let r = if nbytes == 0 { a } else { a << (nbytes * 8) }; + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Whole-register shift-left by octets (bytes).** `VA` is shifted left by `N` bytes, where `N = (VB.b[15] >> 3) & 0xF` — bits 1..4 of the last byte of `VB`. Right end is zero-filled. `N` saturates at 15 because only 4 bits are honoured. +- **Shift count constraint.** The ISA mandates a uniform 4-bit count across all of `VB`; xenia-rs reads only byte 15. Splat with [`vspltb`](vspltb.md) before invoking when the count is derived dynamically. +- **Pair with [`vsl`](vsl.md) for full bit-level shifts.** `vslo` contributes the byte-granular part; `vsl` contributes the 0..7 residual bits. +- **Big-endian.** "Left" = toward MSB = toward `VD.b[0]`. +- **No flags, no VSCR.** +- **VMX128 sibling [`vslo128`](vslo128.md).** + +## Related Instructions + +- [`vsl`](vsl.md) — the bit-level whole-register shift-left. +- [`vsro`](vsro.md) — shift-right by octets. +- [`vsldoi`](vsldoi.md) — static-immediate variant. +- [`vslb`](vslb.md), [`vslh`](vslh.md), [`vslw`](vslw.md) — per-lane shifts. + +## IBM Reference + +- [AIX 7.3 — `vslo` (Vector Shift Left by Octet)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vslo-vector-shift-left-by-octet-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vslw.md b/migration/project-root/ppc-manual/vmx/vslw.md new file mode 100644 index 0000000..b936316 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vslw.md @@ -0,0 +1,188 @@ +# `vslw` — Vector Shift Left Integer Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000184` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vslw` | `vslw` | — | Vector Shift Left Integer Word | +| `vslw128` | `vslw128` | — | Vector128 Shift Left Integer Word | + +## Syntax + +```asm +vslw [VD], [VA], [VB] +vslw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vslw` — form `VX` + +- **Opcode word:** `0x10000184` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `388` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vslw128` — form `VX128` + +- **Opcode word:** `0x180000d0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `208` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vslw: read; vslw128: read | Source A vector register. | +| `VB` | vslw: read; vslw128: read | Source B vector register. | +| `VD` | vslw: write; vslw128: write | Destination vector register. | + +## Register Effects + +### `vslw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vslw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vslw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1433`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1433) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:468`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L468) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2414-2425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2414-L2425) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslw | PpcOpcode::vslw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i] << sh; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vslw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vslw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1436`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1436) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:122`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L122) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:693`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L693) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2414-2425`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2414-L2425) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vslw | PpcOpcode::vslw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i] << sh; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical shift-left of words.** `VD.w[i] = VA.w[i] << (VB.w[i] & 0x1F)` for `i ∈ 0..3`. Low 5 bits of each shift-count word are honoured. +- **Per-lane shift counts.** Splat with [`vspltw`](vspltw.md) / [`vspltisw`](vspltisw.md) for uniform shifts. +- **Zero-fill right.** Arithmetic right shift is [`vsraw`](vsraw.md). +- **Big-endian word indexing.** +- **No flags, no VSCR.** +- **VMX128 sibling [`vslw128`](vslw128.md).** + +## Related Instructions + +- [`vsrw`](vsrw.md) — logical-right word. +- [`vsraw`](vsraw.md) — arithmetic-right word. +- [`vrlw`](vrlw.md) — word rotate. +- [`vslb`](vslb.md), [`vslh`](vslh.md) — byte / half-word logical-left. +- [`vspltw`](vspltw.md), [`vspltisw`](vspltisw.md) — splats for shift counts. + +## IBM Reference + +- [AIX 7.3 — `vslw` (Vector Shift Left Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vslw-vector-shift-left-integer-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vspltb.md b/migration/project-root/ppc-manual/vmx/vspltb.md new file mode 100644 index 0000000..4ea9198 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vspltb.md @@ -0,0 +1,127 @@ +# `vspltb` — Vector Splat Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000020c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vspltb` | `vspltb` | — | Vector Splat Byte | + +## Syntax + +```asm +vspltb [VD], [VB], [UIMM] +``` + +## Encoding + +### `vspltb` — form `VX` + +- **Opcode word:** `0x1000020c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `524` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vspltb: read | Source B vector register. | +| `UIMM` | vspltb: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vspltb: write | Destination vector register. | + +## Register Effects + +### `vspltb` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vspltb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1503`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1503) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:480`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L480) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2349-2355`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2349-L2355) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltb => { + let uimm = ((instr.raw >> 16) & 0xF) as usize; + let b = ctx.vr[instr.rb()].as_bytes(); + let val = b[uimm]; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes([val; 16]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat one byte across all 16 lanes.** The `UIMM` field (bits 11–15) selects which byte of `VB` to replicate — `UIMM = 0` picks the MSB lane, `UIMM = 15` picks the LSB. Only the low 4 bits are meaningful. +- **Big-endian index.** `UIMM = 0` → `VB.b[0]`, the most significant byte. This matches the layout after a `stvx` / `lvx` round-trip. +- **Typical use: broadcast a comparison selector or a shift count** so that a per-lane op (e.g. [`vslb`](vslb.md)) behaves as a scalar-style shift. +- **No flags, no VSCR.** +- **No VMX128 sibling.** Xenon replaces this with [`vspltisb`](vspltisb.md) for immediate constants, or with [`vpermwi128`](../vmx128/vpermwi128.md) / [`vperm`](vperm.md) for more complex splats. + +## Related Instructions + +- [`vsplth`](vsplth.md), [`vspltw`](vspltw.md) — half-word / word splat siblings. +- [`vspltisb`](vspltisb.md), [`vspltish`](vspltish.md), [`vspltisw`](vspltisw.md) — immediate splats (no source register needed). +- [`vperm`](vperm.md) — programmable byte permute; a splat is the special case where `VC = {k, k, …, k}`. +- [`vpermwi128`](../vmx128/vpermwi128.md) — word-level 4-way permute via 8-bit immediate (VMX128-only). + +## IBM Reference + +- [AIX 7.3 — `vspltb` (Vector Splat Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vspltb-vector-splat-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsplth.md b/migration/project-root/ppc-manual/vmx/vsplth.md new file mode 100644 index 0000000..857422f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsplth.md @@ -0,0 +1,127 @@ +# `vsplth` — Vector Splat Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000024c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsplth` | `vsplth` | — | Vector Splat Half Word | + +## Syntax + +```asm +vsplth [VD], [VB], [UIMM] +``` + +## Encoding + +### `vsplth` — form `VX` + +- **Opcode word:** `0x1000024c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `588` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vsplth: read | Source B vector register. | +| `UIMM` | vsplth: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vsplth: write | Destination vector register. | + +## Register Effects + +### `vsplth` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsplth`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsplth"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1513`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1513) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:487`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L487) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2342-2348`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2342-L2348) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsplth => { + let uimm = ((instr.raw >> 16) & 0x7) as usize; + let b = ctx.vr[instr.rb()].as_u16x8(); + let val = b[uimm]; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array([val; 8]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat one half-word across all 8 lanes.** `UIMM` (bits 11–15, low 3 bits honoured) selects which of `VB`'s 8 half-word lanes is replicated. +- **Big-endian index.** `UIMM = 0` → `VB.h[0]`, the most significant half-word. +- **Typical use: broadcast a 16-bit shift count or comparison operand.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vspltb`](vspltb.md), [`vspltw`](vspltw.md) — byte / word splat siblings. +- [`vspltish`](vspltish.md) — immediate-operand splat (no source register). +- [`vperm`](vperm.md) — programmable permute; a half-word splat maps to a per-byte selector of `{2k, 2k+1, 2k, 2k+1, …}`. +- [`vpermwi128`](../vmx128/vpermwi128.md) — word-level permute (VMX128-only). + +## IBM Reference + +- [AIX 7.3 — `vsplth` (Vector Splat Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsplth-vector-splat-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vspltisb.md b/migration/project-root/ppc-manual/vmx/vspltisb.md new file mode 100644 index 0000000..8b2134b --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vspltisb.md @@ -0,0 +1,125 @@ +# `vspltisb` — Vector Splat Immediate Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000030c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vspltisb` | `vspltisb` | — | Vector Splat Immediate Signed Byte | + +## Syntax + +```asm +vspltisb [VD], [SIMM] +``` + +## Encoding + +### `vspltisb` — form `VX` + +- **Opcode word:** `0x1000030c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `780` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `SIMM` | vspltisb: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `VD` | vspltisb: write | Destination vector register. | + +## Register Effects + +### `vspltisb` + +- **Reads (always):** `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vspltisb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltisb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1536`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1536) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:503`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L503) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2364-2369`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2364-L2369) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltisb => { + let simm = ((instr.raw >> 16) & 0x1F) as i8; + let simm = if simm & 0x10 != 0 { simm | !0x1F } else { simm }; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes([simm as u8; 16]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat a 5-bit signed immediate across all 16 byte lanes.** The `SIMM` field (bits 11–15) is sign-extended from 5 bits to 8 — so the representable range is `[−16, +15]`. Values `0x10..0x1F` decode as negative (`−16..−1`). +- **Constant-generation primitive.** `vspltisb vD, 0` is the canonical "all-bytes-zero" vector (same net effect as `vxor vD, vD, vD`). `vspltisb vD, -1` is the all-ones mask. `vspltisb vD, 1` broadcasts `{1, 1, …, 1}` for vector-increment tricks. +- **No source register.** The op doesn't read `VA` / `VB`; this saves a register-file read port and keeps constant-generation cheap. +- **Big-endian lane order** (all lanes identical anyway). +- **No flags, no VSCR.** +- **No VMX128 sibling.** Xenon uses the same encoding. + +## Related Instructions + +- [`vspltish`](vspltish.md), [`vspltisw`](vspltisw.md) — half-word and word immediate splats (still sign-extended from 5 bits). +- [`vspltb`](vspltb.md), [`vsplth`](vsplth.md), [`vspltw`](vspltw.md) — register-indexed splats. +- [`vxor`](vxor.md) — alternative "zero vector" idiom when `vD` is already known. + +## IBM Reference + +- [AIX 7.3 — `vspltisb` (Vector Splat Immediate Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vspltisb-vector-splat-immediate-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vspltish.md b/migration/project-root/ppc-manual/vmx/vspltish.md new file mode 100644 index 0000000..82e009a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vspltish.md @@ -0,0 +1,124 @@ +# `vspltish` — Vector Splat Immediate Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000034c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vspltish` | `vspltish` | — | Vector Splat Immediate Signed Half Word | + +## Syntax + +```asm +vspltish [VD], [SIMM] +``` + +## Encoding + +### `vspltish` — form `VX` + +- **Opcode word:** `0x1000034c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `844` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `SIMM` | vspltish: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `VD` | vspltish: write | Destination vector register. | + +## Register Effects + +### `vspltish` + +- **Reads (always):** `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vspltish`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltish"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1551`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1551) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:510`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L510) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2370-2375`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2370-L2375) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltish => { + let simm = ((instr.raw >> 16) & 0x1F) as i16; + let simm = if simm & 0x10 != 0 { simm | !0x1F } else { simm }; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array([simm as u16; 8]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat a 5-bit signed immediate across all 8 half-word lanes.** `SIMM` is sign-extended from 5 bits to 16, so the representable range is `[−16, +15]`. +- **Constant-generation primitive.** `vspltish vD, 0` is "all-zero half-words"; `vspltish vD, -1` is `{0xFFFF, …}`; `vspltish vD, 1` is `{0x0001, …}` (typical for "increment every lane" patterns). +- **No source register.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vspltisb`](vspltisb.md), [`vspltisw`](vspltisw.md) — byte / word immediate splats. +- [`vsplth`](vsplth.md) — register-indexed half-word splat. +- [`vxor`](vxor.md) — alternative zero-vector idiom. + +## IBM Reference + +- [AIX 7.3 — `vspltish` (Vector Splat Immediate Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vspltish-vector-splat-immediate-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vspltisw.md b/migration/project-root/ppc-manual/vmx/vspltisw.md new file mode 100644 index 0000000..0aa243b --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vspltisw.md @@ -0,0 +1,172 @@ +# `vspltisw` — Vector Splat Immediate Signed Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000038c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vspltisw` | `vspltisw` | — | Vector Splat Immediate Signed Word | +| `vspltisw128` | `vspltisw128` | — | Vector128 Splat Immediate Signed Word | + +## Syntax + +```asm +vspltisw [VD], [SIMM] +vspltisw128 [VD], [SIMM] +``` + +## Encoding + +### `vspltisw` — form `VX` + +- **Opcode word:** `0x1000038c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `908` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vspltisw128` — form `VX128_3` + +- **Opcode word:** `0x18000770` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1904` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `SIMM` | vspltisw: read; vspltisw128: read | 16-bit signed immediate. Sign-extended to 64 bits before use. | +| `VD` | vspltisw: write; vspltisw128: write | Destination vector register. | + +## Register Effects + +### `vspltisw` + +- **Reads (always):** `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vspltisw128` + +- **Reads (always):** `SIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vspltisw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltisw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1580`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1580) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:516`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L516) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2356-2363`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2356-L2363) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltisw | PpcOpcode::vspltisw128 => { + let simm = ((instr.raw >> 16) & 0x1F) as i32; + let simm = if simm & 0x10 != 0 { simm | !0x1F } else { simm }; // sign extend 5-bit + let val = simm as u32; + let vd = if matches!(instr.opcode, PpcOpcode::vspltisw128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(val, val, val, val); + ctx.pc += 4; + } +``` +
+ +**`vspltisw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltisw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1583`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1583) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:669`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L669) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2356-2363`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2356-L2363) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltisw | PpcOpcode::vspltisw128 => { + let simm = ((instr.raw >> 16) & 0x1F) as i32; + let simm = if simm & 0x10 != 0 { simm | !0x1F } else { simm }; // sign extend 5-bit + let val = simm as u32; + let vd = if matches!(instr.opcode, PpcOpcode::vspltisw128) { instr.vd128() } else { instr.rd() }; + ctx.vr[vd] = xenia_types::Vec128::from_u32x4(val, val, val, val); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat a 5-bit signed immediate across all 4 word lanes.** `SIMM` is sign-extended from 5 bits to 32, so the representable range is `[−16, +15]`. +- **Constant-generation primitive.** `vspltisw vD, 0` zeroes every lane; `vspltisw vD, -1` generates `{0xFFFFFFFF, …}` (the all-ones vector); `vspltisw vD, 1` is `{1, 1, 1, 1}` — useful for "lane index = 0, 1, 2, 3" constructions via an `lvewx`-style preload followed by this. +- **No source register.** +- **No flags, no VSCR.** +- **VMX128 sibling [`vspltisw128`](vspltisw128.md).** + +## Related Instructions + +- [`vspltisb`](vspltisb.md), [`vspltish`](vspltish.md) — byte / half-word immediate splats. +- [`vspltw`](vspltw.md) — register-indexed word splat. +- [`vxor`](vxor.md) — alternative zero-vector idiom. + +## IBM Reference + +- [AIX 7.3 — `vspltisw` (Vector Splat Immediate Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vspltisw-vector-splat-immediate-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vspltw.md b/migration/project-root/ppc-manual/vmx/vspltw.md new file mode 100644 index 0000000..5ff1a51 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vspltw.md @@ -0,0 +1,173 @@ +# `vspltw` — Vector Splat Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000028c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vspltw` | `vspltw` | — | Vector Splat Word | +| `vspltw128` | `vspltw128` | — | Vector128 Splat Word | + +## Syntax + +```asm +vspltw [VD], [VB], [UIMM] +vspltw128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vspltw` — form `VX` + +- **Opcode word:** `0x1000028c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `652` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vspltw128` — form `VX128_3` + +- **Opcode word:** `0x18000730` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1840` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vspltw: read; vspltw128: read | Source B vector register. | +| `UIMM` | vspltw: read; vspltw128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vspltw: write; vspltw128: write | Destination vector register. | + +## Register Effects + +### `vspltw` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vspltw128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vspltw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1529`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1529) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:493`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L493) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2328-2334`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2328-L2334) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltw => { + let uimm = ((instr.raw >> 16) & 0x3) as usize; // UIMM (2 bits for word index) + let b = ctx.vr[instr.rb()].as_u32x4(); + let val = b[uimm]; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4(val, val, val, val); + ctx.pc += 4; + } +``` +
+ +**`vspltw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vspltw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1532`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1532) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:123`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L123) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:668`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L668) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2335-2341`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2335-L2341) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vspltw128 => { + let uimm = ((instr.raw >> 16) & 0x3) as usize; + let b = ctx.vr[instr.vb128()].as_u32x4(); + let val = b[uimm]; + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4(val, val, val, val); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Splat one word across all 4 lanes.** `UIMM` (bits 11–15, low 2 bits honoured) picks which of `VB`'s 4 word lanes is replicated. +- **Big-endian index.** `UIMM = 0` → `VB.w[0]` (most significant word). +- **Typical use: broadcast a float or a 32-bit constant** (e.g. splatting a scalar result before feeding it to a per-lane multiply). +- **No flags, no VSCR.** +- **VMX128 sibling [`vspltw128`](vspltw128.md).** +- **Compares with [`vpermwi128`](../vmx128/vpermwi128.md):** `vpermwi128` generalises word splat to any 4-of-4 permutation using an 8-bit immediate (2 bits per output word). + +## Related Instructions + +- [`vspltb`](vspltb.md), [`vsplth`](vsplth.md) — byte / half-word splat siblings. +- [`vspltisw`](vspltisw.md) — immediate splat counterpart. +- [`vpermwi128`](../vmx128/vpermwi128.md) — VMX128 full word permute. +- [`vperm`](vperm.md) — byte-granular programmable permute. + +## IBM Reference + +- [AIX 7.3 — `vspltw` (Vector Splat Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vspltw-vector-splat-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsr.md b/migration/project-root/ppc-manual/vmx/vsr.md new file mode 100644 index 0000000..4fc18de --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsr.md @@ -0,0 +1,128 @@ +# `vsr` — Vector Shift Right + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100002c4` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsr` | `vsr` | — | Vector Shift Right | + +## Syntax + +```asm +vsr [VD], [VA], [VB] +``` + +## Encoding + +### `vsr` — form `VX` + +- **Opcode word:** `0x100002c4` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `708` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsr: read | Source A vector register. | +| `VB` | vsr: read | Source B vector register. | +| `VD` | vsr: write | Destination vector register. | + +## Register Effects + +### `vsr` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsr`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsr"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1587`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1587) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:495`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L495) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3927-3933`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3927-L3933) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsr => { + let a = u128::from_be_bytes(ctx.vr[instr.ra()].as_bytes()); + let shift = (ctx.vr[instr.rb()].as_bytes()[15] & 7) as u32; + let r = if shift == 0 { a } else { a >> shift }; + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Whole-register bit shift-right.** The 128-bit value `VA` is shifted right (toward the LSB end) by `N` bits, where `N = VB.b[15] & 7` — the low 3 bits of the last byte of `VB`. Bits shifted out the bottom are discarded; zero-fill on the top. +- **Shift count constraint.** The ISA mandates the same 3-bit count across all of `VB`; xenia-rs reads only byte 15. Splat the count before use. +- **Pair with [`vsro`](vsro.md) for up to 127-bit shifts.** `vsro` contributes the byte-granular component; `vsr` the 0..7 residual bits. +- **Big-endian.** "Right" means toward the LSB end (`VD.b[15]`). +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsro`](vsro.md) — whole-register shift-right by octets. +- [`vsl`](vsl.md), [`vslo`](vslo.md) — left-shift counterparts. +- [`vsldoi`](vsldoi.md) — static-immediate byte shift of `VA ‖ VB`. +- [`vsrb`](vsrb.md), [`vsrh`](vsrh.md), [`vsrw`](vsrw.md) — per-lane logical shifts. + +## IBM Reference + +- [AIX 7.3 — `vsr` (Vector Shift Right)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsr-vector-shift-right-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsrab.md b/migration/project-root/ppc-manual/vmx/vsrab.md new file mode 100644 index 0000000..3b5c2af --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsrab.md @@ -0,0 +1,129 @@ +# `vsrab` — Vector Shift Right Algebraic Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000304` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsrab` | `vsrab` | — | Vector Shift Right Algebraic Byte | + +## Syntax + +```asm +vsrab [VD], [VA], [VB] +``` + +## Encoding + +### `vsrab` — form `VX` + +- **Opcode word:** `0x10000304` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `772` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsrab: read | Source A vector register. | +| `VB` | vsrab: read | Source B vector register. | +| `VD` | vsrab: write | Destination vector register. | + +## Register Effects + +### `vsrab` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsrab`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrab"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1599`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1599) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:500`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L500) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3868-3875`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3868-L3875) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrab => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0i8; 16]; + for i in 0..16 { r[i] = a[i] >> (b[i] & 7); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane arithmetic right-shift of signed bytes.** `VD.b[i] = (int8)VA.b[i] >> (VB.b[i] & 7)` — sign bit propagates into vacated high-order bits. Low 3 bits of each shift-count byte are honoured. +- **Per-lane shift counts.** Splat via [`vspltb`](vspltb.md) / [`vspltisb`](vspltisb.md) for uniform shifts. +- **Sign extension.** Negative inputs produce values that stay negative (toward `-1`), unlike logical shift [`vsrb`](vsrb.md) which zero-fills. +- **Big-endian byte lanes.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsrb`](vsrb.md) — logical-right byte shift. +- [`vslb`](vslb.md) — byte logical-left. +- [`vrlb`](vrlb.md) — byte rotate. +- [`vsrah`](vsrah.md), [`vsraw`](vsraw.md) — arithmetic-right half-word / word. + +## IBM Reference + +- [AIX 7.3 — `vsrab` (Vector Shift Right Arithmetic Integer Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsrab-vector-shift-right-algebraic-integer-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsrah.md b/migration/project-root/ppc-manual/vmx/vsrah.md new file mode 100644 index 0000000..576c6dc --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsrah.md @@ -0,0 +1,129 @@ +# `vsrah` — Vector Shift Right Algebraic Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000344` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsrah` | `vsrah` | — | Vector Shift Right Algebraic Half Word | + +## Syntax + +```asm +vsrah [VD], [VA], [VB] +``` + +## Encoding + +### `vsrah` — form `VX` + +- **Opcode word:** `0x10000344` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `836` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsrah: read | Source A vector register. | +| `VB` | vsrah: read | Source B vector register. | +| `VD` | vsrah: write | Destination vector register. | + +## Register Effects + +### `vsrah` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsrah`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrah"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1606`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1606) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:507`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L507) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3900-3907`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3900-L3907) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrah => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = a[i] >> (b[i] & 0xF); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane arithmetic right-shift of signed half-words.** `VD.h[i] = (int16)VA.h[i] >> (VB.h[i] & 0xF)` — sign bit propagates. Low 4 bits of each shift-count half-word are honoured. +- **Per-lane shift counts.** Splat via [`vsplth`](vsplth.md) / [`vspltish`](vspltish.md) for uniform shifts. +- **Sign extension.** Distinct from the logical [`vsrh`](vsrh.md) which zero-fills. +- **Big-endian half-word lanes.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsrh`](vsrh.md) — logical-right half-word. +- [`vslh`](vslh.md) — half-word logical-left. +- [`vrlh`](vrlh.md) — half-word rotate. +- [`vsrab`](vsrab.md), [`vsraw`](vsraw.md) — arithmetic-right byte / word. + +## IBM Reference + +- [AIX 7.3 — `vsrah` (Vector Shift Right Arithmetic Integer Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsrah-vector-shift-right-algebraic-integer-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsraw.md b/migration/project-root/ppc-manual/vmx/vsraw.md new file mode 100644 index 0000000..69b66ea --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsraw.md @@ -0,0 +1,187 @@ +# `vsraw` — Vector Shift Right Algebraic Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000384` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsraw` | `vsraw` | — | Vector Shift Right Algebraic Word | +| `vsraw128` | `vsraw128` | — | Vector128 Shift Right Arithmetic Word | + +## Syntax + +```asm +vsraw [VD], [VA], [VB] +vsraw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vsraw` — form `VX` + +- **Opcode word:** `0x10000384` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `900` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vsraw128` — form `VX128` + +- **Opcode word:** `0x18000150` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `336` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsraw: read; vsraw128: read | Source A vector register. | +| `VB` | vsraw: read; vsraw128: read | Source B vector register. | +| `VD` | vsraw: write; vsraw128: write | Destination vector register. | + +## Register Effects + +### `vsraw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsraw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsraw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsraw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1619`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1619) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:514`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L514) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2438-2449`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2438-L2449) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsraw | PpcOpcode::vsraw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = (a[i] as i32 >> sh) as u32; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vsraw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsraw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1622`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1622) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:694`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L694) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2438-2449`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2438-L2449) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsraw | PpcOpcode::vsraw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = (a[i] as i32 >> sh) as u32; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane arithmetic right-shift of signed words.** `VD.w[i] = (int32)VA.w[i] >> (VB.w[i] & 0x1F)` — sign bit propagates. Low 5 bits of each shift-count word are honoured. +- **Per-lane shift counts.** Splat via [`vspltw`](vspltw.md) / [`vspltisw`](vspltisw.md) for uniform shifts. +- **Sign extension** — distinct from [`vsrw`](vsrw.md) (zero-fill). +- **Big-endian word lanes.** +- **No flags, no VSCR.** +- **VMX128 sibling [`vsraw128`](vsraw128.md).** + +## Related Instructions + +- [`vsrw`](vsrw.md) — logical-right word. +- [`vslw`](vslw.md) — word logical-left. +- [`vrlw`](vrlw.md) — word rotate. +- [`vsrab`](vsrab.md), [`vsrah`](vsrah.md) — byte / half-word arithmetic-right. + +## IBM Reference + +- [AIX 7.3 — `vsraw` (Vector Shift Right Arithmetic Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsraw-vector-shift-right-algebraic-integer-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsrb.md b/migration/project-root/ppc-manual/vmx/vsrb.md new file mode 100644 index 0000000..bb9c41d --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsrb.md @@ -0,0 +1,129 @@ +# `vsrb` — Vector Shift Right Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000204` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsrb` | `vsrb` | — | Vector Shift Right Byte | + +## Syntax + +```asm +vsrb [VD], [VA], [VB] +``` + +## Encoding + +### `vsrb` — form `VX` + +- **Opcode word:** `0x10000204` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `516` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsrb: read | Source A vector register. | +| `VB` | vsrb: read | Source B vector register. | +| `VD` | vsrb: write | Destination vector register. | + +## Register Effects + +### `vsrb` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsrb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1626`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1626) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:477`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L477) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3860-3867`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3860-L3867) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrb => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i] >> (b[i] & 7); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical right-shift of bytes.** `VD.b[i] = VA.b[i] >> (VB.b[i] & 7)` — zero-fills the vacated high-order bits. Low 3 bits of each shift-count byte are honoured. +- **Per-lane shift counts.** Splat via [`vspltb`](vspltb.md) / [`vspltisb`](vspltisb.md) for uniform shifts. +- **Zero-fill.** For sign-preserving shift use [`vsrab`](vsrab.md). +- **Big-endian byte lanes.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsrab`](vsrab.md) — arithmetic-right (sign-extending) byte shift. +- [`vslb`](vslb.md) — byte logical-left. +- [`vrlb`](vrlb.md) — byte rotate. +- [`vsrh`](vsrh.md), [`vsrw`](vsrw.md) — half-word / word logical-right shifts. + +## IBM Reference + +- [AIX 7.3 — `vsrb` (Vector Shift Right Integer Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsrb-vector-shift-right-integer-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsrh.md b/migration/project-root/ppc-manual/vmx/vsrh.md new file mode 100644 index 0000000..821ab6c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsrh.md @@ -0,0 +1,129 @@ +# `vsrh` — Vector Shift Right Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000244` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsrh` | `vsrh` | — | Vector Shift Right Half Word | + +## Syntax + +```asm +vsrh [VD], [VA], [VB] +``` + +## Encoding + +### `vsrh` — form `VX` + +- **Opcode word:** `0x10000244` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `580` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsrh: read | Source A vector register. | +| `VB` | vsrh: read | Source B vector register. | +| `VD` | vsrh: write | Destination vector register. | + +## Register Effects + +### `vsrh` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsrh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1633`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1633) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:484`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L484) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3892-3899`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3892-L3899) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrh => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i] >> (b[i] & 0xF); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical right-shift of half-words.** `VD.h[i] = VA.h[i] >> (VB.h[i] & 0xF)` — zero-fill. Low 4 bits of each shift-count half-word honoured. +- **Per-lane shift counts.** Splat via [`vsplth`](vsplth.md) / [`vspltish`](vspltish.md) for uniform shifts. +- **Zero-fill.** Use [`vsrah`](vsrah.md) for sign-preserving right shift. +- **Big-endian half-word lanes.** +- **No flags, no VSCR.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsrah`](vsrah.md) — arithmetic-right half-word. +- [`vslh`](vslh.md) — logical-left half-word. +- [`vrlh`](vrlh.md) — half-word rotate. +- [`vsrb`](vsrb.md), [`vsrw`](vsrw.md) — byte / word logical-right. + +## IBM Reference + +- [AIX 7.3 — `vsrh` (Vector Shift Right Integer Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsrh-vector-shift-right-integer-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsro.md b/migration/project-root/ppc-manual/vmx/vsro.md new file mode 100644 index 0000000..d0d7431 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsro.md @@ -0,0 +1,183 @@ +# `vsro` — Vector Shift Right Octet + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000044c` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsro` | `vsro` | — | Vector Shift Right Octet | +| `vsro128` | `vsro128` | — | Vector128 Shift Right Octet | + +## Syntax + +```asm +vsro [VD], [VA], [VB] +vsro128 [VD], [VA], [VB] +``` + +## Encoding + +### `vsro` — form `VX` + +- **Opcode word:** `0x1000044c` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1100` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vsro128` — form `VX128` + +- **Opcode word:** `0x140003d0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `976` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsro: read; vsro128: read | Source A vector register. | +| `VB` | vsro: read; vsro128: read | Source B vector register. | +| `VD` | vsro: write; vsro128: write | Destination vector register. | + +## Register Effects + +### `vsro` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsro128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsro`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsro"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1651`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1651) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:528`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L528) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3945-3954`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3945-L3954) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsro | PpcOpcode::vsro128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vsro128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = u128::from_be_bytes(ctx.vr[ra].as_bytes()); + let nbytes = ((ctx.vr[rb].as_bytes()[15] >> 3) & 0xF) as u32; + let r = if nbytes == 0 { a } else { a >> (nbytes * 8) }; + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ +**`vsro128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsro128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1654`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1654) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:633`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L633) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3945-3954`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3945-L3954) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsro | PpcOpcode::vsro128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vsro128); + let (ra, rb, rd) = if is_128 { (instr.va128(), instr.vb128(), instr.vd128()) } + else { (instr.ra(), instr.rb(), instr.rd()) }; + let a = u128::from_be_bytes(ctx.vr[ra].as_bytes()); + let nbytes = ((ctx.vr[rb].as_bytes()[15] >> 3) & 0xF) as u32; + let r = if nbytes == 0 { a } else { a >> (nbytes * 8) }; + ctx.vr[rd] = xenia_types::Vec128::from_bytes(r.to_be_bytes()); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Whole-register shift-right by octets (bytes).** `VA` is shifted right by `N` bytes, where `N = (VB.b[15] >> 3) & 0xF`. Top end is zero-filled. +- **Shift count constraint.** Uniform 4-bit count required across `VB`; xenia reads only byte 15. +- **Pair with [`vsr`](vsr.md) for full bit-level shifts.** `vsro` handles bytes; `vsr` handles the 0..7 residual. +- **Big-endian.** "Right" = toward LSB end (`VD.b[15]`). +- **No flags, no VSCR.** +- **VMX128 sibling [`vsro128`](vsro128.md).** + +## Related Instructions + +- [`vsr`](vsr.md) — bit-level whole-register shift-right. +- [`vslo`](vslo.md) — shift-left by octet. +- [`vsldoi`](vsldoi.md) — static-immediate variant. +- [`vsrb`](vsrb.md), [`vsrh`](vsrh.md), [`vsrw`](vsrw.md) — per-lane logical shifts. + +## IBM Reference + +- [AIX 7.3 — `vsro` (Vector Shift Right by Octet)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsro-vector-shift-right-by-octet-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsrw.md b/migration/project-root/ppc-manual/vmx/vsrw.md new file mode 100644 index 0000000..90f0468 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsrw.md @@ -0,0 +1,187 @@ +# `vsrw` — Vector Shift Right Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000284` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsrw` | `vsrw` | — | Vector Shift Right Word | +| `vsrw128` | `vsrw128` | — | Vector128 Shift Right Word | + +## Syntax + +```asm +vsrw [VD], [VA], [VB] +vsrw128 [VD], [VA], [VB] +``` + +## Encoding + +### `vsrw` — form `VX` + +- **Opcode word:** `0x10000284` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `644` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vsrw128` — form `VX128` + +- **Opcode word:** `0x180001d0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `464` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsrw: read; vsrw128: read | Source A vector register. | +| `VB` | vsrw: read; vsrw128: read | Source B vector register. | +| `VD` | vsrw: write; vsrw128: write | Destination vector register. | + +## Register Effects + +### `vsrw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsrw128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsrw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1664`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1664) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:491`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L491) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2426-2437`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2426-L2437) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrw | PpcOpcode::vsrw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i] >> sh; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vsrw128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsrw128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1667`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1667) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:124`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L124) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:695`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L695) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2426-2437`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2426-L2437) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsrw | PpcOpcode::vsrw128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { + let sh = b[i] & 0x1F; + r[i] = a[i] >> sh; + } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-lane logical right-shift of words.** `VD.w[i] = VA.w[i] >> (VB.w[i] & 0x1F)` — zero-fill. Low 5 bits of each shift-count word honoured. +- **Per-lane shift counts.** Splat via [`vspltw`](vspltw.md) / [`vspltisw`](vspltisw.md) for uniform shifts. +- **Zero-fill.** Use [`vsraw`](vsraw.md) for sign-preserving right shift. +- **Big-endian word lanes.** +- **No flags, no VSCR.** +- **VMX128 sibling [`vsrw128`](vsrw128.md).** + +## Related Instructions + +- [`vsraw`](vsraw.md) — arithmetic-right word. +- [`vslw`](vslw.md) — logical-left word. +- [`vrlw`](vrlw.md) — word rotate. +- [`vsrb`](vsrb.md), [`vsrh`](vsrh.md) — byte / half-word logical-right. + +## IBM Reference + +- [AIX 7.3 — `vsrw` (Vector Shift Right Integer Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsrw-vector-shift-right-integer-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubcuw.md b/migration/project-root/ppc-manual/vmx/vsubcuw.md new file mode 100644 index 0000000..cd1d854 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubcuw.md @@ -0,0 +1,128 @@ +# `vsubcuw` — Vector Subtract Carryout Unsigned Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000580` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubcuw` | `vsubcuw` | — | Vector Subtract Carryout Unsigned Word | + +## Syntax + +```asm +vsubcuw [VD], [VA], [VB] +``` + +## Encoding + +### `vsubcuw` — form `VX` + +- **Opcode word:** `0x10000580` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1408` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubcuw: read | Source A vector register. | +| `VB` | vsubcuw: read | Source B vector register. | +| `VD` | vsubcuw: write | Destination vector register. | + +## Register Effects + +### `vsubcuw` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubcuw`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubcuw"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1671`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1671) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:536`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L536) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3391-3399`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3391-L3399) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubcuw => { + // "Subtract Carryout": r = 1 if a >= b (no borrow), 0 otherwise. + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = if a[i] >= b[i] { 1 } else { 0 }; } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **"Borrow out" producer for unsigned word subtract.** Each of the 4 lanes produces `1` if `VA.w[i] >= VB.w[i]` (no borrow) and `0` otherwise. This is an **inverted** borrow — conventional borrow would be `1` on underflow, but Altivec's `vsubcuw` returns the opposite to match the `XER[CA]` convention used by scalar `subfc` / `subfe`. +- **Complements [`vadduwm`](vadduwm.md) / [`vaddcuw`](vaddcuw.md)** for 256-bit (or wider) multi-precision arithmetic. After the lane subtract, chain the 4-bit borrow vector into the next word via [`vsubeuwm`](vsubeuwm.md)-style helpers (or manual software glue, since Altivec has no direct `sube`). +- **No saturation, no flags, no VSCR effect.** Despite being in the "carry" family, `vsubcuw` doesn't touch `VSCR[SAT]`. +- **Big-endian word lanes.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsubuwm`](vsubuwm.md) — the difference value (unsigned modulo). +- [`vaddcuw`](vaddcuw.md) — the paired carry-out producer for unsigned-add. +- [`vadduwm`](vadduwm.md) — unsigned word add (modulo). + +## IBM Reference + +- [AIX 7.3 — `vsubcuw` (Vector Subtract and Write Carry-Out Unsigned Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubcuw-vector-subtract-write-carry-out-unsigned-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubfp.md b/migration/project-root/ppc-manual/vmx/vsubfp.md new file mode 100644 index 0000000..c473a8e --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubfp.md @@ -0,0 +1,183 @@ +# `vsubfp` — Vector Subtract Floating Point + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000004a` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubfp` | `vsubfp` | — | Vector Subtract Floating Point | +| `vsubfp128` | `vsubfp128` | — | Vector128 Subtract Floating Point | + +## Syntax + +```asm +vsubfp [VD], [VA], [VB] +vsubfp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vsubfp` — form `VX` + +- **Opcode word:** `0x1000004a` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `74` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vsubfp128` — form `VX128` + +- **Opcode word:** `0x14000050` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `80` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubfp: read; vsubfp128: read | Source A vector register. | +| `VB` | vsubfp: read; vsubfp128: read | Source B vector register. | +| `VD` | vsubfp: write; vsubfp128: write | Destination vector register. | + +## Register Effects + +### `vsubfp` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vsubfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +for each 32-bit float lane i in 0..3: + VD[i] <- VA[i] − VB[i] +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubfp`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubfp"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1686`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1686) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:445`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L445) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2012-2024`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2012-L2024) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubfp => { + // PPCBUG-435. + let a = ctx.vr[instr.ra()].as_f32x4(); + let b = ctx.vr[instr.rb()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + r[i] = vmx::flush_denorm(ai - bi); + } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vsubfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1689`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1689) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:611`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L611) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2025-2037`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2025-L2037) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubfp128 => { + // PPCBUG-435. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + r[i] = vmx::flush_denorm(ai - bi); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Lane-wise IEEE-754 binary32 subtract.** Each of the four lanes computes `VD[i] = VA[i] − VB[i]`, rounded to nearest. +- **`VSCR[NJ]` honoured.** Denormals flushed to zero when `NJ = 1` (the Xenon boot default). +- **NaN propagation.** A NaN in either operand propagates to the destination lane. +- **`±∞ − ±∞` → NaN.** No exception, no VSCR[SAT] set. +- **No FPSCR update.** VMX float ops are independent of the scalar FPU's status register. +- **Big-endian lane indexing.** +- **VMX128 sibling [`vsubfp128`](vsubfp128.md).** +- **Aliasing legal.** `vsubfp v3, v3, v4` is fine. + +## Related Instructions + +- [`vaddfp`](vaddfp.md) — lane-wise float add. +- [`vmaddfp`](vmaddfp.md), [`vnmsubfp`](vnmsubfp.md) — fused multiply-accumulate variants. +- [`vmaxfp`](vmaxfp.md), [`vminfp`](vminfp.md) — IEEE-754-aware max/min. +- [`vcmpeqfp`](vcmpeqfp.md), [`vcmpgtfp`](vcmpgtfp.md), [`vcmpgefp`](vcmpgefp.md), [`vcmpbfp`](vcmpbfp.md) — compares. + +## IBM Reference + +- [AIX 7.3 — `vsubfp` (Vector Subtract Floating Point)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubfp-vector-subtract-floating-point-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubsbs.md b/migration/project-root/ppc-manual/vmx/vsubsbs.md new file mode 100644 index 0000000..b645dd3 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubsbs.md @@ -0,0 +1,133 @@ +# `vsubsbs` — Vector Subtract Signed Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000700` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubsbs` | `vsubsbs` | — | Vector Subtract Signed Byte Saturate | + +## Syntax + +```asm +vsubsbs [VD], [VA], [VB] +``` + +## Encoding + +### `vsubsbs` — form `VX` + +- **Opcode word:** `0x10000700` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1792` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubsbs: read | Source A vector register. | +| `VB` | vsubsbs: read | Source B vector register. | +| `VD` | vsubsbs: write | Destination vector register. | +| `VSCR` | vsubsbs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsubsbs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsubsbs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubsbs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubsbs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1693`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1693) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:546`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L546) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3270-3281`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3270-L3281) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubsbs => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i8x16(ctx.vr[instr.rb()]); + let mut r = [0i8; 16]; let mut sat = false; + for i in 0..16 { + let (v, s) = crate::vmx::sat_sub_i8(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i8x16(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed-byte saturating subtract.** `VD.b[i] = clamp_int8(VA.b[i] − VB.b[i])`. Each lane computed as `int8`; any lane that goes below `−128` or above `+127` is clamped and sticky-sets `VSCR[SAT]`. Xenia's helper is `vmx::sat_sub_i8`. +- **Sticky VSCR[SAT].** Once set it remains set until explicit `mtvscr` clear. +- **Big-endian byte lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** +- **Compare with [`vsububs`](vsububs.md)** for the unsigned variant, or [`vsububm`](vsububm.md) for modulo wrap. + +## Related Instructions + +- [`vaddsbs`](vaddsbs.md) — signed byte saturating add. +- [`vsububs`](vsububs.md), [`vsububm`](vsububm.md) — unsigned byte sub (saturating / modulo). +- [`vsubshs`](vsubshs.md), [`vsubsws`](vsubsws.md) — signed saturating subtracts at wider lane widths. + +## IBM Reference + +- [AIX 7.3 — `vsubsbs` (Vector Subtract Signed Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubsbs-vector-subtract-signed-byte-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubshs.md b/migration/project-root/ppc-manual/vmx/vsubshs.md new file mode 100644 index 0000000..58382e0 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubshs.md @@ -0,0 +1,132 @@ +# `vsubshs` — Vector Subtract Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000740` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubshs` | `vsubshs` | — | Vector Subtract Signed Half Word Saturate | + +## Syntax + +```asm +vsubshs [VD], [VA], [VB] +``` + +## Encoding + +### `vsubshs` — form `VX` + +- **Opcode word:** `0x10000740` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1856` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubshs: read | Source A vector register. | +| `VB` | vsubshs: read | Source B vector register. | +| `VD` | vsubshs: write | Destination vector register. | +| `VSCR` | vsubshs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsubshs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsubshs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubshs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubshs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1702`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1702) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:548`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L548) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3318-3329`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3318-L3329) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubshs => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i16; 8]; let mut sat = false; + for i in 0..8 { + let (v, s) = crate::vmx::sat_sub_i16(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed half-word saturating subtract.** `VD.h[i] = clamp_int16(VA.h[i] − VB.h[i])` for 8 lanes. Overflow clamps to `±0x7FFF` and sticky-sets `VSCR[SAT]`. Xenia uses `vmx::sat_sub_i16`. +- **Sticky VSCR[SAT].** +- **Big-endian half-word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vaddshs`](vaddshs.md) — signed half-word saturating add. +- [`vsubuhs`](vsubuhs.md), [`vsubuhm`](vsubuhm.md) — unsigned half-word sub (sat / mod). +- [`vsubsbs`](vsubsbs.md), [`vsubsws`](vsubsws.md) — signed saturating subs at byte / word width. + +## IBM Reference + +- [AIX 7.3 — `vsubshs` (Vector Subtract Signed Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubshs-vector-subtract-signed-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubsws.md b/migration/project-root/ppc-manual/vmx/vsubsws.md new file mode 100644 index 0000000..e287432 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubsws.md @@ -0,0 +1,132 @@ +# `vsubsws` — Vector Subtract Signed Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000780` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubsws` | `vsubsws` | — | Vector Subtract Signed Word Saturate | + +## Syntax + +```asm +vsubsws [VD], [VA], [VB] +``` + +## Encoding + +### `vsubsws` — form `VX` + +- **Opcode word:** `0x10000780` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1920` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubsws: read | Source A vector register. | +| `VB` | vsubsws: read | Source B vector register. | +| `VD` | vsubsws: write | Destination vector register. | +| `VSCR` | vsubsws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsubsws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsubsws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubsws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubsws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1711`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1711) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:125`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L125) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:549`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L549) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3366-3377`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3366-L3377) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubsws => { + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let b = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::sat_sub_i32(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed word saturating subtract.** `VD.w[i] = clamp_int32(VA.w[i] − VB.w[i])` for 4 lanes. Overflow clamps to `±0x7FFFFFFF` and sticky-sets `VSCR[SAT]`. Xenia uses `vmx::sat_sub_i32`. +- **Sticky VSCR[SAT].** +- **Big-endian word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vaddsws`](vaddsws.md) — signed word saturating add. +- [`vsubuws`](vsubuws.md), [`vsubuwm`](vsubuwm.md) — unsigned word sub (sat / mod). +- [`vsubsbs`](vsubsbs.md), [`vsubshs`](vsubshs.md) — byte / half-word signed saturating subs. + +## IBM Reference + +- [AIX 7.3 — `vsubsws` (Vector Subtract Signed Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubsws-vector-subtract-signed-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsububm.md b/migration/project-root/ppc-manual/vmx/vsububm.md new file mode 100644 index 0000000..e5f84c8 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsububm.md @@ -0,0 +1,128 @@ +# `vsububm` — Vector Subtract Unsigned Byte Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000400` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsububm` | `vsububm` | — | Vector Subtract Unsigned Byte Modulo | + +## Syntax + +```asm +vsububm [VD], [VA], [VB] +``` + +## Encoding + +### `vsububm` — form `VX` + +- **Opcode word:** `0x10000400` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1024` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsububm: read | Source A vector register. | +| `VB` | vsububm: read | Source B vector register. | +| `VD` | vsububm: write | Destination vector register. | + +## Register Effects + +### `vsububm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsububm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsububm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1720`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1720) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:519`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L519) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3206-3213`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3206-L3213) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsububm => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; + for i in 0..16 { r[i] = a[i].wrapping_sub(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned-byte modulo subtract.** `VD.b[i] = VA.b[i] − VB.b[i]` as `u8` wrapping; underflow wraps silently. No saturation, no `VSCR[SAT]` update. +- **Useful as a signed sub too** because 8-bit two's-complement sub is bit-identical to unsigned modulo sub. The `m` suffix signals the modulo/wrap semantics regardless of signed interpretation. +- **Big-endian byte lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vadduwm`](vadduwm.md)/[`vadduhm`](vadduhm.md)/[`vaddubm`](vaddubm.md) — modulo adds at various widths. +- [`vsububs`](vsububs.md) — saturating sibling. +- [`vsubsbs`](vsubsbs.md) — signed saturating byte sub. +- [`vsubuhm`](vsubuhm.md), [`vsubuwm`](vsubuwm.md) — half-word / word modulo subs. + +## IBM Reference + +- [AIX 7.3 — `vsububm` (Vector Subtract Unsigned Byte Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsububm-vector-subtract-unsigned-byte-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsububs.md b/migration/project-root/ppc-manual/vmx/vsububs.md new file mode 100644 index 0000000..896c003 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsububs.md @@ -0,0 +1,134 @@ +# `vsububs` — Vector Subtract Unsigned Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000600` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsububs` | `vsububs` | — | Vector Subtract Unsigned Byte Saturate | + +## Syntax + +```asm +vsububs [VD], [VA], [VB] +``` + +## Encoding + +### `vsububs` — form `VX` + +- **Opcode word:** `0x10000600` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1536` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsububs: read | Source A vector register. | +| `VB` | vsububs: read | Source B vector register. | +| `VD` | vsububs: write | Destination vector register. | +| `VSCR` | vsububs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsububs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsububs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsububs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsububs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1744`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1744) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:538`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L538) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3246-3257`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3246-L3257) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsububs => { + let a = ctx.vr[instr.ra()].as_bytes(); + let b = ctx.vr[instr.rb()].as_bytes(); + let mut r = [0u8; 16]; let mut sat = false; + for i in 0..16 { + let (v, s) = crate::vmx::sat_sub_u8(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_bytes(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned-byte saturating subtract.** `VD.b[i] = clamp_u8(VA.b[i] − VB.b[i])` per lane. Negative results clamp to 0 and sticky-set `VSCR[SAT]`. Xenia uses `vmx::sat_sub_u8`. +- **Sticky VSCR[SAT].** +- **Common image-processing primitive.** "Floor at zero" for per-channel differences (alpha compositing, edge detection, etc.). +- **Big-endian byte lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsububm`](vsububm.md) — modulo sibling. +- [`vsubsbs`](vsubsbs.md) — signed-byte saturating sub. +- [`vsubuhs`](vsubuhs.md), [`vsubuws`](vsubuws.md) — half-word / word unsigned saturating subs. +- [`vadduhs`](vadduhs.md), [`vadduws`](vadduws.md) — the add counterparts. + +## IBM Reference + +- [AIX 7.3 — `vsububs` (Vector Subtract Unsigned Byte Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsububs-vector-subtract-unsigned-byte-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubuhm.md b/migration/project-root/ppc-manual/vmx/vsubuhm.md new file mode 100644 index 0000000..f1b0904 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubuhm.md @@ -0,0 +1,128 @@ +# `vsubuhm` — Vector Subtract Unsigned Half Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000440` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubuhm` | `vsubuhm` | — | Vector Subtract Unsigned Half Word Modulo | + +## Syntax + +```asm +vsubuhm [VD], [VA], [VB] +``` + +## Encoding + +### `vsubuhm` — form `VX` + +- **Opcode word:** `0x10000440` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1088` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubuhm: read | Source A vector register. | +| `VB` | vsubuhm: read | Source B vector register. | +| `VD` | vsubuhm: write | Destination vector register. | + +## Register Effects + +### `vsubuhm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubuhm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubuhm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1728`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1728) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:524`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L524) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3222-3229`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3222-L3229) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubuhm => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; + for i in 0..8 { r[i] = a[i].wrapping_sub(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word modulo subtract.** `VD.h[i] = VA.h[i] − VB.h[i]` as `u16` wrapping. No saturation, no `VSCR[SAT]`. +- **Signed-or-unsigned.** Two's-complement 16-bit sub is bit-identical modulo either interpretation. +- **Big-endian half-word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vadduhm`](vadduhm.md) — modulo add counterpart. +- [`vsubuhs`](vsubuhs.md) — saturating sibling. +- [`vsubshs`](vsubshs.md) — signed saturating half-word sub. +- [`vsububm`](vsububm.md), [`vsubuwm`](vsubuwm.md) — byte / word modulo subs. + +## IBM Reference + +- [AIX 7.3 — `vsubuhm` (Vector Subtract Unsigned Half Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubuhm-vector-subtract-unsigned-half-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubuhs.md b/migration/project-root/ppc-manual/vmx/vsubuhs.md new file mode 100644 index 0000000..7ee2438 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubuhs.md @@ -0,0 +1,133 @@ +# `vsubuhs` — Vector Subtract Unsigned Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000640` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubuhs` | `vsubuhs` | — | Vector Subtract Unsigned Half Word Saturate | + +## Syntax + +```asm +vsubuhs [VD], [VA], [VB] +``` + +## Encoding + +### `vsubuhs` — form `VX` + +- **Opcode word:** `0x10000640` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1600` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubuhs: read | Source A vector register. | +| `VB` | vsubuhs: read | Source B vector register. | +| `VD` | vsubuhs: write | Destination vector register. | +| `VSCR` | vsubuhs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsubuhs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsubuhs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubuhs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubuhs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1753`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1753) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:541`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L541) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3294-3305`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3294-L3305) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubuhs => { + let a = ctx.vr[instr.ra()].as_u16x8(); + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u16; 8]; let mut sat = false; + for i in 0..8 { + let (v, s) = crate::vmx::sat_sub_u16(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u16x8_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned half-word saturating subtract.** `VD.h[i] = clamp_u16(VA.h[i] − VB.h[i])` per lane. Negative results clamp to 0 and sticky-set `VSCR[SAT]`. Xenia uses `vmx::sat_sub_u16`. +- **Sticky VSCR[SAT].** +- **Big-endian half-word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsubuhm`](vsubuhm.md) — modulo sibling. +- [`vsubshs`](vsubshs.md) — signed half-word saturating sub. +- [`vsububs`](vsububs.md), [`vsubuws`](vsubuws.md) — byte / word unsigned saturating subs. +- [`vadduhs`](vadduhs.md) — the unsigned half-word saturating add. + +## IBM Reference + +- [AIX 7.3 — `vsubuhs` (Vector Subtract Unsigned Half Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubuhs-vector-subtract-unsigned-half-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubuwm.md b/migration/project-root/ppc-manual/vmx/vsubuwm.md new file mode 100644 index 0000000..7284ea1 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubuwm.md @@ -0,0 +1,129 @@ +# `vsubuwm` — Vector Subtract Unsigned Word Modulo + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000480` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubuwm` | `vsubuwm` | — | Vector Subtract Unsigned Word Modulo | + +## Syntax + +```asm +vsubuwm [VD], [VA], [VB] +``` + +## Encoding + +### `vsubuwm` — form `VX` + +- **Opcode word:** `0x10000480` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1152` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubuwm: read | Source A vector register. | +| `VB` | vsubuwm: read | Source B vector register. | +| `VD` | vsubuwm: write | Destination vector register. | + +## Register Effects + +### `vsubuwm` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubuwm`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubuwm"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1736`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1736) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:529`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L529) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2404-2411`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2404-L2411) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubuwm => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i].wrapping_sub(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned word modulo subtract.** `VD.w[i] = VA.w[i] − VB.w[i]` as `u32` wrapping. +- **Signed-or-unsigned.** Two's-complement 32-bit sub is bit-identical modulo either interpretation. +- **Big-endian word lanes.** +- **No `Rc`, no XER, no VSCR effect.** +- **No VMX128 sibling.** Note `vsubcuw` is the paired borrow-producer. + +## Related Instructions + +- [`vadduwm`](vadduwm.md) — modulo add counterpart. +- [`vsubuws`](vsubuws.md) — saturating sibling. +- [`vsubsws`](vsubsws.md) — signed saturating word sub. +- [`vsubcuw`](vsubcuw.md) — produces the borrow-out mask (multi-precision subtract helper). +- [`vsububm`](vsububm.md), [`vsubuhm`](vsubuhm.md) — byte / half-word modulo subs. + +## IBM Reference + +- [AIX 7.3 — `vsubuwm` (Vector Subtract Unsigned Word Modulo)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubuwm-vector-subtract-unsigned-word-modulo-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsubuws.md b/migration/project-root/ppc-manual/vmx/vsubuws.md new file mode 100644 index 0000000..d2abb51 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsubuws.md @@ -0,0 +1,133 @@ +# `vsubuws` — Vector Subtract Unsigned Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000680` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsubuws` | `vsubuws` | — | Vector Subtract Unsigned Word Saturate | + +## Syntax + +```asm +vsubuws [VD], [VA], [VB] +``` + +## Encoding + +### `vsubuws` — form `VX` + +- **Opcode word:** `0x10000680` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1664` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsubuws: read | Source A vector register. | +| `VB` | vsubuws: read | Source B vector register. | +| `VD` | vsubuws: write | Destination vector register. | +| `VSCR` | vsubuws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsubuws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsubuws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsubuws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsubuws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1762`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1762) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:126`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L126) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:544`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L544) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3342-3353`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3342-L3353) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsubuws => { + let a = ctx.vr[instr.ra()].as_u32x4(); + let b = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::sat_sub_u32(a[i], b[i]); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned word saturating subtract.** `VD.w[i] = clamp_u32(VA.w[i] − VB.w[i])` per lane. Negative results clamp to 0 and sticky-set `VSCR[SAT]`. Xenia uses `vmx::sat_sub_u32`. +- **Sticky VSCR[SAT].** +- **Big-endian word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsubuwm`](vsubuwm.md) — modulo sibling. +- [`vsubsws`](vsubsws.md) — signed word saturating sub. +- [`vsububs`](vsububs.md), [`vsubuhs`](vsubuhs.md) — byte / half-word unsigned saturating subs. +- [`vadduws`](vadduws.md) — the unsigned word saturating add. + +## IBM Reference + +- [AIX 7.3 — `vsubuws` (Vector Subtract Unsigned Word Saturate)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsubuws-vector-subtract-unsigned-word-saturate-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsum2sws.md b/migration/project-root/ppc-manual/vmx/vsum2sws.md new file mode 100644 index 0000000..0742994 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsum2sws.md @@ -0,0 +1,133 @@ +# `vsum2sws` — Vector Sum Across Partial (1/2) Signed Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000688` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsum2sws` | `vsum2sws` | — | Vector Sum Across Partial (1/2) Signed Word Saturate | + +## Syntax + +```asm +vsum2sws [VD], [VA], [VB] +``` + +## Encoding + +### `vsum2sws` — form `VX` + +- **Opcode word:** `0x10000688` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1672` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsum2sws: read | Source A vector register. | +| `VB` | vsum2sws: read | Source B vector register. | +| `VD` | vsum2sws: write | Destination vector register. | +| `VSCR` | vsum2sws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsum2sws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsum2sws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsum2sws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsum2sws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1776`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1776) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:127`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L127) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:545`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L545) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3668-3679`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3668-L3679) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsum2sws => { + // Two 2-word partial sums at lanes 1 and 3. + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let s0 = a[0] as i64 + a[1] as i64 + c[1] as i64; + let s1 = a[2] as i64 + a[3] as i64 + c[3] as i64; + let (v0, sat0) = crate::vmx::sat_i64_to_i32(s0); + let (v1, sat1) = crate::vmx::sat_i64_to_i32(s1); + if sat0 | sat1 { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4([0, v0, 0, v1]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Two 2-word partial sums.** The four signed-word lanes of `VA` are split into two pairs: `{VA.w[0], VA.w[1]}` and `{VA.w[2], VA.w[3]}`. Each pair is summed, then added to the matching "anchor" word of `VB` (`VB.w[1]` and `VB.w[3]` respectively). Each 33-bit intermediate result is saturated to `int32`. +- **Output lane placement.** `VD.w[0] = 0`, `VD.w[1] = sat(VA.w[0] + VA.w[1] + VB.w[1])`, `VD.w[2] = 0`, `VD.w[3] = sat(VA.w[2] + VA.w[3] + VB.w[3])`. The zero lanes are specified in the ISA — software that wants a contiguous pair must `vmrglw` / `vmrghw` afterwards. +- **Sticky VSCR[SAT]** set when either saturating truncation occurs. +- **Big-endian word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsumsws`](vsumsws.md) — full 4-lane sum. +- [`vsum4sbs`](vsum4sbs.md), [`vsum4shs`](vsum4shs.md), [`vsum4ubs`](vsum4ubs.md) — per-word partial sums at narrower input widths. +- [`vaddsws`](vaddsws.md), [`vsubsws`](vsubsws.md) — word-saturating arithmetic. + +## IBM Reference + +- [AIX 7.3 — `vsum2sws` (Vector Sum across Partial (1/2) Saturated Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsum2sws-vector-sum-across-partial-12-saturated-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsum4sbs.md b/migration/project-root/ppc-manual/vmx/vsum4sbs.md new file mode 100644 index 0000000..06e5d2e --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsum4sbs.md @@ -0,0 +1,135 @@ +# `vsum4sbs` — Vector Sum Across Partial (1/4) Signed Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000708` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsum4sbs` | `vsum4sbs` | — | Vector Sum Across Partial (1/4) Signed Byte Saturate | + +## Syntax + +```asm +vsum4sbs [VD], [VA], [VB] +``` + +## Encoding + +### `vsum4sbs` — form `VX` + +- **Opcode word:** `0x10000708` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1800` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsum4sbs: read | Source A vector register. | +| `VB` | vsum4sbs: read | Source B vector register. | +| `VD` | vsum4sbs: write | Destination vector register. | +| `VSCR` | vsum4sbs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsum4sbs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsum4sbs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsum4sbs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsum4sbs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1781`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1781) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:127`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L127) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:547`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L547) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3680-3692`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3680-L3692) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsum4sbs => { + let a = crate::vmx::as_i8x16(ctx.vr[instr.ra()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let s = a[4*i] as i64 + a[4*i+1] as i64 + a[4*i+2] as i64 + a[4*i+3] as i64 + c[i] as i64; + let (v, o) = crate::vmx::sat_i64_to_i32(s); + r[i] = v; sat |= o; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word 4-byte partial sum (signed).** For each of the 4 output word lanes, sum 4 signed bytes of `VA` plus the matching signed word of `VB`, then saturate to `int32`. Input byte layout: `VD.w[i] = sat(VA.b[4*i] + VA.b[4*i+1] + VA.b[4*i+2] + VA.b[4*i+3] + VB.w[i])`. +- **Sticky VSCR[SAT]** set on overflow. +- **Typical use: accumulate per-channel sums** (e.g. for a colour-averaging or luminance operator). +- **Big-endian byte / word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsum4shs`](vsum4shs.md) — signed half-word variant. +- [`vsum4ubs`](vsum4ubs.md) — unsigned byte variant. +- [`vsumsws`](vsumsws.md), [`vsum2sws`](vsum2sws.md) — reductions that accumulate fewer output lanes. +- [`vmsummbm`](vmsummbm.md) — fused signed-byte multiply-sum (different shape, same "horizontal reduce" flavour). + +## IBM Reference + +- [AIX 7.3 — `vsum4sbs` (Vector Sum across Partial (1/4) Saturated Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsum4sbs-vector-sum-across-partial-14-saturated-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsum4shs.md b/migration/project-root/ppc-manual/vmx/vsum4shs.md new file mode 100644 index 0000000..d645e28 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsum4shs.md @@ -0,0 +1,135 @@ +# `vsum4shs` — Vector Sum Across Partial (1/4) Signed Half Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000648` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsum4shs` | `vsum4shs` | — | Vector Sum Across Partial (1/4) Signed Half Word Saturate | + +## Syntax + +```asm +vsum4shs [VD], [VA], [VB] +``` + +## Encoding + +### `vsum4shs` — form `VX` + +- **Opcode word:** `0x10000648` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1608` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsum4shs: read | Source A vector register. | +| `VB` | vsum4shs: read | Source B vector register. | +| `VD` | vsum4shs: write | Destination vector register. | +| `VSCR` | vsum4shs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsum4shs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsum4shs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsum4shs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsum4shs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1786`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1786) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:127`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L127) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:543`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L543) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3706-3718`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3706-L3718) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsum4shs => { + let a = crate::vmx::as_i16x8(ctx.vr[instr.ra()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let s = a[2*i] as i64 + a[2*i+1] as i64 + c[i] as i64; + let (v, o) = crate::vmx::sat_i64_to_i32(s); + r[i] = v; sat |= o; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word 2-half-word partial sum (signed).** For each of the 4 output word lanes, sum 2 signed half-words of `VA` plus the matching signed word of `VB`, then saturate to `int32`. `VD.w[i] = sat(VA.h[2*i] + VA.h[2*i+1] + VB.w[i])`. +- **Sticky VSCR[SAT]** set on overflow. +- **Useful bridge between 16-bit multiply results and 32-bit accumulators.** Often pairs with `vmulesh` / `vmulosh`. +- **Big-endian half-word / word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsum4sbs`](vsum4sbs.md) — signed byte variant. +- [`vsum4ubs`](vsum4ubs.md) — unsigned byte variant. +- [`vsum2sws`](vsum2sws.md), [`vsumsws`](vsumsws.md) — wider reductions. +- [`vmhaddshs`](vmhaddshs.md), [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — fused multiply-sum cousins. + +## IBM Reference + +- [AIX 7.3 — `vsum4shs` (Vector Sum across Partial (1/4) Saturated Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsum4shs-vector-sum-across-partial-14-saturated-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsum4ubs.md b/migration/project-root/ppc-manual/vmx/vsum4ubs.md new file mode 100644 index 0000000..c6499ba --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsum4ubs.md @@ -0,0 +1,135 @@ +# `vsum4ubs` — Vector Sum Across Partial (1/4) Unsigned Byte Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000608` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsum4ubs` | `vsum4ubs` | — | Vector Sum Across Partial (1/4) Unsigned Byte Saturate | + +## Syntax + +```asm +vsum4ubs [VD], [VA], [VB] +``` + +## Encoding + +### `vsum4ubs` — form `VX` + +- **Opcode word:** `0x10000608` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1544` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsum4ubs: read | Source A vector register. | +| `VB` | vsum4ubs: read | Source B vector register. | +| `VD` | vsum4ubs: write | Destination vector register. | +| `VSCR` | vsum4ubs: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsum4ubs` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsum4ubs`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsum4ubs`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsum4ubs"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1791`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1791) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:127`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L127) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:540`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L540) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3693-3705`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3693-L3705) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsum4ubs => { + let a = ctx.vr[instr.ra()].as_bytes(); + let c = ctx.vr[instr.rb()].as_u32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let s = a[4*i] as u64 + a[4*i+1] as u64 + a[4*i+2] as u64 + a[4*i+3] as u64 + c[i] as u64; + let (v, o) = if s > u32::MAX as u64 { (u32::MAX, true) } else { (s as u32, false) }; + r[i] = v; sat |= o; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Per-word 4-byte partial sum (unsigned).** For each of the 4 output word lanes, sum 4 unsigned bytes of `VA` plus the matching unsigned word of `VB`, saturating at `uint32::MAX`. `VD.w[i] = sat_u32(VA.b[4*i] + VA.b[4*i+1] + VA.b[4*i+2] + VA.b[4*i+3] + VB.w[i])`. +- **Sticky VSCR[SAT]** set on overflow — rare in practice, since `4 * 255 = 1020` plus any 32-bit `VB.w[i]` overflows only when `VB.w[i] > 0xFFFF_FBFF`. +- **Typical "count pixels" / "sum byte channels" primitive.** +- **Big-endian byte / word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** + +## Related Instructions + +- [`vsum4sbs`](vsum4sbs.md) — signed byte variant. +- [`vsum4shs`](vsum4shs.md) — signed half-word variant. +- [`vsumsws`](vsumsws.md), [`vsum2sws`](vsum2sws.md) — wider reductions. +- [`vmsumubm`](vmsumubm.md) — fused unsigned-byte multiply-sum. + +## IBM Reference + +- [AIX 7.3 — `vsum4ubs` (Vector Sum across Partial (1/4) Saturated Unsigned Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsum4ubs-vector-sum-across-partial-14-saturated-unsigned-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vsumsws.md b/migration/project-root/ppc-manual/vmx/vsumsws.md new file mode 100644 index 0000000..169639f --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vsumsws.md @@ -0,0 +1,133 @@ +# `vsumsws` — Vector Sum Across Signed Word Saturate + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x10000788` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vsumsws` | `vsumsws` | — | Vector Sum Across Signed Word Saturate | + +## Syntax + +```asm +vsumsws [VD], [VA], [VB] +``` + +## Encoding + +### `vsumsws` — form `VX` + +- **Opcode word:** `0x10000788` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1928` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vsumsws: read | Source A vector register. | +| `VB` | vsumsws: read | Source B vector register. | +| `VD` | vsumsws: write | Destination vector register. | +| `VSCR` | vsumsws: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vsumsws` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vsumsws`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vsumsws`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vsumsws"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1771`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1771) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:127`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L127) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:550`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L550) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3658-3667`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3658-L3667) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vsumsws => { + // vD[3] = sat_i32(vC[3] + sum over i in 0..4 of vA[i]) + let a = crate::vmx::as_i32x4(ctx.vr[instr.ra()]); + let c = crate::vmx::as_i32x4(ctx.vr[instr.rb()]); + let s = a.iter().map(|&x| x as i64).sum::() + c[3] as i64; + let (v, sat) = crate::vmx::sat_i64_to_i32(s); + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4([0, 0, 0, v]); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Full 4-lane reduction (signed word).** Sum all four signed words of `VA` plus `VB.w[3]`; saturate to `int32`. Output: `VD.w[3] = sat(VA.w[0]+VA.w[1]+VA.w[2]+VA.w[3]+VB.w[3])`; `VD.w[0..2] = 0`. +- **Only the last lane is meaningful.** The specification writes zeros into the first three lanes so software cannot accidentally consume them. +- **Sticky VSCR[SAT]** set on overflow. +- **Equivalent to a horizontal add-reduce**, common terminator for a multi-step dot-product or sum-of-products pipeline. +- **Big-endian word lanes.** +- **No `Rc`, no XER.** +- **No VMX128 sibling.** VMX128 instead uses `vmsum3fp128` / `vmsum4fp128` for float dot-products. + +## Related Instructions + +- [`vsum2sws`](vsum2sws.md) — two 2-word partial sums. +- [`vsum4sbs`](vsum4sbs.md), [`vsum4shs`](vsum4shs.md), [`vsum4ubs`](vsum4ubs.md) — per-word partial sums at narrower input widths. +- [`vmsumshm`](vmsumshm.md), [`vmsumshs`](vmsumshs.md) — fused multiply-sum variants. +- [`vmsum3fp128`](../vmx128/vmsum3fp128.md), [`vmsum4fp128`](../vmx128/vmsum4fp128.md) — VMX128 float dot-product helpers. + +## IBM Reference + +- [AIX 7.3 — `vsumsws` (Vector Sum across Saturated Signed Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vsumsws-vector-sum-across-saturated-signed-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupkhpx.md b/migration/project-root/ppc-manual/vmx/vupkhpx.md new file mode 100644 index 0000000..bcead0a --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupkhpx.md @@ -0,0 +1,127 @@ +# `vupkhpx` — Vector Unpack High Pixel + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000034e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupkhp` | `vupkhpx` | — | Vector Unpack High Pixel | + +## Syntax + +```asm +vupkhpx [VD], [VB] +``` + +## Encoding + +### `vupkhpx` — form `VX` + +- **Opcode word:** `0x1000034e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `846` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupkhpx: read | Source B vector register. | +| `VD` | vupkhpx: write | Destination vector register. | + +## Register Effects + +### `vupkhpx` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupkhpx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkhpx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2002`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2002) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:511`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L511) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4168-4174`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4168-L4174) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupkhpx => { + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = crate::vmx::unpack_pixel_555(b[i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unpack the high 4 of 8 pixel half-words.** The upper half-words of `VB` (`VB.h[0..3]`) are each decoded from 1-5-5-5 pixel format into a 32-bit word (`1.5.5.5 → 1.8.8.8` with sign-extension on the alpha bit and zero-extension of each colour channel into the high 5 bits of a byte). Xenia uses `vmx::unpack_pixel_555`. +- **Output layout.** `VD.w[0..3]` receive the 4 decoded pixels in big-endian order. +- **Inverse of the high half of [`vpkpx`](vpkpx.md).** Unpacking loses no information beyond what the 1-5-5-5 format allows. +- **No saturation, no flags, no VSCR.** +- **No VMX128 sibling.** VMX128 code uses [`vupkd3d128`](../vmx128/vupkd3d128.md) for richer D3D formats. + +## Related Instructions + +- [`vupklpx`](vupklpx.md) — unpack the low 4 pixel half-words. +- [`vpkpx`](vpkpx.md) — the inverse pack. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — byte → half-word sign-extending unpacks. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — half-word → word sign-extending unpacks. +- [`vupkd3d128`](../vmx128/vupkd3d128.md) — VMX128 D3D-format unpack. + +## IBM Reference + +- [AIX 7.3 — `vupkhpx` (Vector Unpack High Pixel16)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupkhpx-vector-unpack-high-pixel16-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupkhsb.md b/migration/project-root/ppc-manual/vmx/vupkhsb.md new file mode 100644 index 0000000..5bb90ca --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupkhsb.md @@ -0,0 +1,181 @@ +# `vupkhsb` — Vector Unpack High Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000020e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupkhsb` | `vupkhsb` | — | Vector Unpack High Signed Byte | +| `vupkhsb128` | `vupkhsb128` | — | Vector128 Unpack High Signed Byte | + +## Syntax + +```asm +vupkhsb [VD], [VB] +vupkhsb128 [VD], [VB] +``` + +## Encoding + +### `vupkhsb` — form `VX` + +- **Opcode word:** `0x1000020e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `526` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vupkhsb128` — form `VX128` + +- **Opcode word:** `0x18000380` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `896` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupkhsb: read; vupkhsb128: read | Source B vector register. | +| `VD` | vupkhsb: write; vupkhsb128: write | Destination vector register. | + +## Register Effects + +### `vupkhsb` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vupkhsb128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupkhsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkhsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2055`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2055) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:481`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L481) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4134-4143`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4134-L4143) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupkhsb | PpcOpcode::vupkhsb128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vupkhsb128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = crate::vmx::as_i8x16(ctx.vr[rb]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = b[i] as i16; } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ +**`vupkhsb128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkhsb128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2058`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2058) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:700`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L700) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4134-4143`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4134-L4143) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupkhsb | PpcOpcode::vupkhsb128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vupkhsb128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = crate::vmx::as_i8x16(ctx.vr[rb]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = b[i] as i16; } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extend the high 8 of 16 bytes into half-words.** `VB.b[0..7]` are each reinterpreted as `int8` and sign-extended to `int16` in `VD.h[0..7]`. +- **Inverse of the high half of [`vpkshss`](vpkshss.md) / [`vpkshus`](vpkshus.md)** (within the `int8` range — larger `int16`s cannot survive a round-trip through a saturating pack). +- **Big-endian lane ordering** — `VB.b[0]` becomes `VD.h[0]`. +- **No saturation, no flags, no VSCR effect.** +- **VMX128 sibling [`vupkhsb128`](vupkhsb128.md).** + +## Related Instructions + +- [`vupklsb`](vupklsb.md) — unpacks the low 8 bytes (same sign-extension). +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — half-word → word sign-extending unpacks. +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — the inverse pack (saturating). +- [`vupkhpx`](vupkhpx.md), [`vupklpx`](vupklpx.md) — pixel-format unpacks. + +## IBM Reference + +- [AIX 7.3 — `vupkhsb` (Vector Unpack High Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupkhsb-vector-unpack-high-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupkhsh.md b/migration/project-root/ppc-manual/vmx/vupkhsh.md new file mode 100644 index 0000000..8996625 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupkhsh.md @@ -0,0 +1,125 @@ +# `vupkhsh` — Vector Unpack High Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000024e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupkhsh` | `vupkhsh` | — | Vector Unpack High Signed Half Word | + +## Syntax + +```asm +vupkhsh [VD], [VB] +``` + +## Encoding + +### `vupkhsh` — form `VX` + +- **Opcode word:** `0x1000024e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `590` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupkhsh: read | Source B vector register. | +| `VD` | vupkhsh: write | Destination vector register. | + +## Register Effects + +### `vupkhsh` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupkhsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkhsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2021`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2021) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:488`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L488) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4154-4160`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4154-L4160) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupkhsh => { + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = b[i] as i32; } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extend the high 4 of 8 half-words into words.** `VB.h[0..3]` are each reinterpreted as `int16` and sign-extended to `int32` in `VD.w[0..3]`. +- **Inverse of the high half of [`vpkswss`](vpkswss.md)** (within the `int16` range). +- **Big-endian lane ordering.** +- **No saturation, no flags, no VSCR effect.** +- **VMX128 sibling [`vupkhsh128`](vupkhsh128.md).** + +## Related Instructions + +- [`vupklsh`](vupklsh.md) — unpacks the low 4 half-words. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — byte → half-word sign-extending unpacks. +- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — the inverse pack (saturating). + +## IBM Reference + +- [AIX 7.3 — `vupkhsh` (Vector Unpack High Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupkhsh-vector-unpack-high-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupklpx.md b/migration/project-root/ppc-manual/vmx/vupklpx.md new file mode 100644 index 0000000..fb5c346 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupklpx.md @@ -0,0 +1,126 @@ +# `vupklpx` — Vector Unpack Low Pixel + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100003ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupklp` | `vupklpx` | — | Vector Unpack Low Pixel | + +## Syntax + +```asm +vupklpx [VD], [VB] +``` + +## Encoding + +### `vupklpx` — form `VX` + +- **Opcode word:** `0x100003ce` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `974` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupklpx: read | Source B vector register. | +| `VD` | vupklpx: write | Destination vector register. | + +## Register Effects + +### `vupklpx` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupklpx`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupklpx"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2007`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2007) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:129`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L129) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:518`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L518) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4175-4181`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4175-L4181) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupklpx => { + let b = ctx.vr[instr.rb()].as_u16x8(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = crate::vmx::unpack_pixel_555(b[4 + i]); } + ctx.vr[instr.rd()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unpack the low 4 of 8 pixel half-words.** The lower half-words of `VB` (`VB.h[4..7]`) are each decoded from 1-5-5-5 pixel format into a 32-bit word (`1.5.5.5 → 1.8.8.8`). Xenia uses `vmx::unpack_pixel_555`. +- **Output layout.** `VD.w[0..3]` receive the 4 decoded pixels in big-endian order. +- **Inverse of the low half of [`vpkpx`](vpkpx.md).** +- **No saturation, no flags, no VSCR.** +- **No VMX128 sibling.** VMX128 uses [`vupkd3d128`](../vmx128/vupkd3d128.md) for richer formats. + +## Related Instructions + +- [`vupkhpx`](vupkhpx.md) — high-half pixel unpack. +- [`vpkpx`](vpkpx.md) — the inverse pack. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md), [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — sign-extending unpacks. +- [`vupkd3d128`](../vmx128/vupkd3d128.md) — VMX128 D3D-format unpack. + +## IBM Reference + +- [AIX 7.3 — `vupklpx` (Vector Unpack Low Pixel16)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupklpx-vector-unpack-low-pixel16-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupklsb.md b/migration/project-root/ppc-manual/vmx/vupklsb.md new file mode 100644 index 0000000..06d8747 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupklsb.md @@ -0,0 +1,181 @@ +# `vupklsb` — Vector Unpack Low Signed Byte + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x1000028e` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupklsb` | `vupklsb` | — | Vector Unpack Low Signed Byte | +| `vupklsb128` | `vupklsb128` | — | Vector128 Unpack Low Signed Byte | + +## Syntax + +```asm +vupklsb [VD], [VB] +vupklsb128 [VD], [VB] +``` + +## Encoding + +### `vupklsb` — form `VX` + +- **Opcode word:** `0x1000028e` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `654` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vupklsb128` — form `VX128` + +- **Opcode word:** `0x180003c0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `960` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupklsb: read; vupklsb128: read | Source B vector register. | +| `VD` | vupklsb: write; vupklsb128: write | Destination vector register. | + +## Register Effects + +### `vupklsb` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vupklsb128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupklsb`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupklsb"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2076`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2076) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:129`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L129) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:494`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L494) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4144-4153`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4144-L4153) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupklsb | PpcOpcode::vupklsb128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vupklsb128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = crate::vmx::as_i8x16(ctx.vr[rb]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = b[8 + i] as i16; } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ +**`vupklsb128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupklsb128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2079`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2079) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:129`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L129) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:701`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L701) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4144-4153`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4144-L4153) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupklsb | PpcOpcode::vupklsb128 => { + let is_128 = matches!(instr.opcode, PpcOpcode::vupklsb128); + let (rb, rd) = if is_128 { (instr.vb128(), instr.vd128()) } + else { (instr.rb(), instr.rd()) }; + let b = crate::vmx::as_i8x16(ctx.vr[rb]); + let mut r = [0i16; 8]; + for i in 0..8 { r[i] = b[8 + i] as i16; } + ctx.vr[rd] = crate::vmx::from_i16x8(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extend the low 8 of 16 bytes into half-words.** `VB.b[8..15]` are each reinterpreted as `int8` and sign-extended to `int16` in `VD.h[0..7]`. +- **Inverse of the low half of [`vpkshss`](vpkshss.md) / [`vpkshus`](vpkshus.md)** (within the `int8` range). +- **Big-endian lane ordering.** +- **No saturation, no flags, no VSCR effect.** +- **VMX128 sibling [`vupklsb128`](vupklsb128.md).** + +## Related Instructions + +- [`vupkhsb`](vupkhsb.md) — high-half byte unpack. +- [`vupkhsh`](vupkhsh.md), [`vupklsh`](vupklsh.md) — half-word → word unpacks. +- [`vpkshss`](vpkshss.md), [`vpkshus`](vpkshus.md) — the inverse packs. +- [`vupkhpx`](vupkhpx.md), [`vupklpx`](vupklpx.md) — pixel-format unpacks. + +## IBM Reference + +- [AIX 7.3 — `vupklsb` (Vector Unpack Low Signed Byte)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupklsb-vector-unpack-low-signed-byte-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vupklsh.md b/migration/project-root/ppc-manual/vmx/vupklsh.md new file mode 100644 index 0000000..a865999 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vupklsh.md @@ -0,0 +1,125 @@ +# `vupklsh` — Vector Unpack Low Signed Half Word + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100002ce` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupklsh` | `vupklsh` | — | Vector Unpack Low Signed Half Word | + +## Syntax + +```asm +vupklsh [VD], [VB] +``` + +## Encoding + +### `vupklsh` — form `VX` + +- **Opcode word:** `0x100002ce` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `718` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupklsh: read | Source B vector register. | +| `VD` | vupklsh: write | Destination vector register. | + +## Register Effects + +### `vupklsh` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupklsh`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupklsh"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2038`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2038) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:129`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L129) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:497`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L497) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4161-4167`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4161-L4167) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupklsh => { + let b = crate::vmx::as_i16x8(ctx.vr[instr.rb()]); + let mut r = [0i32; 4]; + for i in 0..4 { r[i] = b[4 + i] as i32; } + ctx.vr[instr.rd()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Sign-extend the low 4 of 8 half-words into words.** `VB.h[4..7]` are each reinterpreted as `int16` and sign-extended to `int32` in `VD.w[0..3]`. +- **Inverse of the low half of [`vpkswss`](vpkswss.md)** (within the `int16` range). +- **Big-endian lane ordering.** +- **No saturation, no flags, no VSCR effect.** +- **VMX128 sibling [`vupklsh128`](vupklsh128.md).** + +## Related Instructions + +- [`vupkhsh`](vupkhsh.md) — high-half half-word unpack. +- [`vupkhsb`](vupkhsb.md), [`vupklsb`](vupklsb.md) — byte → half-word unpacks. +- [`vpkswss`](vpkswss.md), [`vpkswus`](vpkswus.md) — the inverse packs. + +## IBM Reference + +- [AIX 7.3 — `vupklsh` (Vector Unpack Low Signed Half Word)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vupklsh-vector-unpack-low-signed-half-word-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx/vxor.md b/migration/project-root/ppc-manual/vmx/vxor.md new file mode 100644 index 0000000..acba949 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx/vxor.md @@ -0,0 +1,180 @@ +# `vxor` — Vector Logical XOR + +> **Category:** [VMX (Altivec)](../categories/vmx.md) · **Form:** [VX](../forms/VX.md) · **Opcode:** `0x100004c4` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vxor` | `vxor` | — | Vector Logical XOR | +| `vxor128` | `vxor128` | — | Vector128 Logical XOR | + +## Syntax + +```asm +vxor [VD], [VA], [VB] +vxor128 [VD], [VA], [VB] +``` + +## Encoding + +### `vxor` — form `VX` + +- **Opcode word:** `0x100004c4` +- **Primary opcode (bits 0–5):** `4` +- **Extended opcode:** `1220` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4) | +| 6–10 | `VRT/VD` | destination vector register | +| 11–15 | `VRA/VA` | source A vector register | +| 16–20 | `VRB/VB` | source B vector register | +| 21–31 | `XO` | extended opcode (11 bits) | + +### `vxor128` — form `VX128` + +- **Opcode word:** `0x14000310` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `784` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vxor: read; vxor128: read | Source A vector register. | +| `VB` | vxor: read; vxor128: read | Source B vector register. | +| `VD` | vxor: write; vxor128: write | Destination vector register. | + +## Register Effects + +### `vxor` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +### `vxor128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vxor`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vxor"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2246`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2246) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:130`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L130) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:532`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L532) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2235-2243`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2235-L2243) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vxor | PpcOpcode::vxor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] ^ b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ +**`vxor128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vxor128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2249`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2249) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:130`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L130) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:627`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L627) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2235-2243`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2235-L2243) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vxor | PpcOpcode::vxor128 => { + let (va, vb, vd) = vmx_reg_triple(instr); + let a = ctx.vr[va].as_u32x4(); + let b = ctx.vr[vb].as_u32x4(); + let mut r = [0u32; 4]; + for i in 0..4 { r[i] = a[i] ^ b[i]; } + ctx.vr[vd] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Bitwise XOR across the full 128-bit register.** Lane-agnostic. +- **`vxor VD, VD, VD` is the canonical "vector zero" idiom.** Every Xenon compiler uses this to materialise the all-zero vector; xenia-rs's interpreter does not special-case it (it still reads the register), but the JIT / translator can fold it at emit time. +- **Aliasing legal.** `vxor v3, v3, v4` toggles bits from `v4` into `v3`. +- **No flags, no VSCR.** +- **VMX128 sibling [`vxor128`](vxor128.md).** Identical semantics; wider register file. +- **Compare-then-XOR** is the cheapest "mask flip" when `vnor` is not wanted (e.g. to toggle only certain bits, not every bit). + +## Related Instructions + +- [`vor`](vor.md), [`vand`](vand.md), [`vandc`](vandc.md), [`vnor`](vnor.md) — the rest of the Altivec boolean primitives. +- [`vsel`](vsel.md) — three-input bit-select; equivalent to `(VA & ~VC) | (VB & VC)`. +- [`vcmpequb`](vcmpequb.md), [`vcmpequh`](vcmpequh.md), [`vcmpequw`](vcmpequw.md) — produce masks XORed against. + +## IBM Reference + +- [AIX 7.3 — `vxor` (Vector Logical XOR)](https://www.ibm.com/docs/en/aix/7.3.0?topic=set-vxor-vector-logical-xor-instruction) +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 3 — Logical Operations](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) diff --git a/migration/project-root/ppc-manual/vmx128/vcfpsxws128.md b/migration/project-root/ppc-manual/vmx128/vcfpsxws128.md new file mode 100644 index 0000000..8d62be8 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vcfpsxws128.md @@ -0,0 +1,137 @@ +# `vcfpsxws128` — Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x18000230` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcfpsxws128` | `vcfpsxws128` | — | Vector128 Convert From Floating-Point to Signed Fixed-Point Word Saturate | + +## Syntax + +```asm +vcfpsxws128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcfpsxws128` — form `VX128_3` + +- **Opcode word:** `0x18000230` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `560` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcfpsxws128: read | Source B vector register. | +| `UIMM` | vcfpsxws128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcfpsxws128: write | Destination vector register. | +| `VSCR` | vcfpsxws128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vcfpsxws128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vcfpsxws128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcfpsxws128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfpsxws128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:539`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L539) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:656`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L656) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4323-4334`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4323-L4334) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcfpsxws128 => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0i32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::cvt_f32_to_i32_sat(b[i], uimm); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.vd128()] = crate::vmx::from_i32x4(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Float → signed fixed-point (int32) with explicit scale.** Each lane computes `VD.w[i] = sat_int32(VB[i] * 2^UIMM)`, truncating toward zero and clamping to `[−2^31, 2^31−1]`. `UIMM` is a 5-bit unsigned bias (range 0..31) that specifies a power-of-two pre-scale on the float value. +- **Use case: fixed-point pipelines.** The `UIMM` pre-scale lets game code convert a `[0.0, 1.0]` float channel into a `uint16`-range fixed-point value in one instruction (e.g. `UIMM = 15` → scale by 32768). +- **Sticky VSCR[SAT]** set whenever a lane clamps (including NaN inputs, which xenia's `cvt_f32_to_i32_sat` treats as 0 and flags saturation). +- **`VSCR[NJ]` honoured** on the float input side. +- **VMX128 register-fusion** applies to `VD` and `VB`: 7-bit register IDs via `VD128l ‖ VD128h` and `VB128l ‖ VB128h`. +- **No IBM AIX entry** — this is Xenon-only. The closest standard Altivec op is [`vctsxs`](../vmx/vctsxs.md). +- **No `Rc`, no XER / FPSCR.** + +## Related Instructions + +- [`vctsxs`](../vmx/vctsxs.md) — the standard Altivec equivalent (same semantics, 32-register file). +- [`vcfpuxws128`](vcfpuxws128.md) — unsigned variant (clamps to `uint32`). +- [`vcsxwfp128`](vcsxwfp128.md), [`vcuxwfp128`](vcuxwfp128.md) — the inverse (int → float with scale). +- [`vrfiz`](../vmx/vrfiz.md) — plain truncate-to-float-integer without scale. + +## IBM Reference + +- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions (Microsoft internal documentation); semantics cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vctsxs`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf). diff --git a/migration/project-root/ppc-manual/vmx128/vcfpuxws128.md b/migration/project-root/ppc-manual/vmx128/vcfpuxws128.md new file mode 100644 index 0000000..dfc183e --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vcfpuxws128.md @@ -0,0 +1,137 @@ +# `vcfpuxws128` — Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x18000270` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcfpuxws128` | `vcfpuxws128` | — | Vector128 Convert From Floating-Point to Unsigned Fixed-Point Word Saturate | + +## Syntax + +```asm +vcfpuxws128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcfpuxws128` — form `VX128_3` + +- **Opcode word:** `0x18000270` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `624` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcfpuxws128: read | Source B vector register. | +| `UIMM` | vcfpuxws128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcfpuxws128: write | Destination vector register. | +| `VSCR` | vcfpuxws128: write | Vector Status and Control Register (NJ/SAT bits). | + +## Register Effects + +### `vcfpuxws128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD`, `VSCR` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +- `vcfpuxws128`: **VSCR[SAT]** may be stickied on saturating vector operations. + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcfpuxws128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcfpuxws128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:557`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L557) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:93`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L93) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:657`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L657) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4335-4346`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4335-L4346) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcfpuxws128 => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0u32; 4]; let mut sat = false; + for i in 0..4 { + let (v, s) = crate::vmx::cvt_f32_to_u32_sat(b[i], uimm); + r[i] = v; sat |= s; + } + if sat { ctx.set_vscr_sat(true); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Float → unsigned fixed-point (uint32) with explicit scale.** Each lane computes `VD.w[i] = sat_uint32(VB[i] * 2^UIMM)`, truncating toward zero and clamping to `[0, 2^32−1]`. `UIMM` is a 5-bit unsigned bias (range 0..31). +- **Negative floats clamp to 0** and sticky-set `VSCR[SAT]`. +- **NaN inputs** → 0 with `VSCR[SAT]` set (xenia's `cvt_f32_to_u32_sat`). +- **`VSCR[NJ]` honoured** for denormal inputs. +- **VMX128 register-fusion** applies to `VD` and `VB` (7-bit IDs). +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER / FPSCR.** + +## Related Instructions + +- [`vctuxs`](../vmx/vctuxs.md) — the standard Altivec equivalent (uint32 clamp with scale). +- [`vcfpsxws128`](vcfpsxws128.md) — signed variant. +- [`vcuxwfp128`](vcuxwfp128.md) — the inverse (uint → float with scale). +- [`vrfiz`](../vmx/vrfiz.md) — plain truncate-to-integer-float without scale. + +## IBM Reference + +- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions (Microsoft internal documentation); cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vctuxs`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf). diff --git a/migration/project-root/ppc-manual/vmx128/vcsxwfp128.md b/migration/project-root/ppc-manual/vmx128/vcsxwfp128.md new file mode 100644 index 0000000..4eb1303 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vcsxwfp128.md @@ -0,0 +1,131 @@ +# `vcsxwfp128` — Vector128 Convert From Signed Fixed-Point Word to Floating-Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180002b0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcsxwfp128` | `vcsxwfp128` | — | Vector128 Convert From Signed Fixed-Point Word to Floating-Point | + +## Syntax + +```asm +vcsxwfp128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcsxwfp128` — form `VX128_3` + +- **Opcode word:** `0x180002b0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `688` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcsxwfp128: read | Source B vector register. | +| `UIMM` | vcsxwfp128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcsxwfp128: write | Destination vector register. | + +## Register Effects + +### `vcsxwfp128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcsxwfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcsxwfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:503`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L503) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:658`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L658) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4347-4354`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4347-L4354) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcsxwfp128 => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = crate::vmx::as_i32x4(ctx.vr[instr.vb128()]); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = crate::vmx::cvt_i32_to_f32(b[i], uimm); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Signed fixed-point (int32) → float with explicit scale.** Each lane computes `VD[i] = (float)VB.w[i] * 2^-UIMM` (equivalently `(int32)VB.w[i] / 2^UIMM`). `UIMM` is a 5-bit unsigned bias that specifies a post-scale — the inverse direction of [`vcfpsxws128`](vcfpsxws128.md), so the `UIMM`s should match for a round-trip. +- **IEEE-754 binary32 output, round-to-nearest.** Values outside the exactly-representable range (`|x| > 2^24`) lose low-order bits; no saturation on the float side. +- **No `VSCR[SAT]` effect** — conversion in this direction never saturates. +- **`VSCR[NJ]` does not affect the int → float path.** +- **VMX128 register-fusion** applies (7-bit register IDs). +- **No IBM AIX entry** — Xenon-only. Closest standard Altivec op is [`vcfsx`](../vmx/vcfsx.md). +- **No `Rc`, no XER / FPSCR.** + +## Related Instructions + +- [`vcfsx`](../vmx/vcfsx.md) — the standard Altivec `int32 → float` with scale. +- [`vcuxwfp128`](vcuxwfp128.md) — unsigned-int variant. +- [`vcfpsxws128`](vcfpsxws128.md), [`vcfpuxws128`](vcfpuxws128.md) — the inverse (float → int with scale). + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions; cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vcfsx`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf). diff --git a/migration/project-root/ppc-manual/vmx128/vcuxwfp128.md b/migration/project-root/ppc-manual/vmx128/vcuxwfp128.md new file mode 100644 index 0000000..99e1542 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vcuxwfp128.md @@ -0,0 +1,131 @@ +# `vcuxwfp128` — Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180002f0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vcuxwfp128` | `vcuxwfp128` | — | Vector128 Convert From Unsigned Fixed-Point Word to Floating-Point | + +## Syntax + +```asm +vcuxwfp128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vcuxwfp128` — form `VX128_3` + +- **Opcode word:** `0x180002f0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `752` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vcuxwfp128: read | Source B vector register. | +| `UIMM` | vcuxwfp128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vcuxwfp128: write | Destination vector register. | + +## Register Effects + +### `vcuxwfp128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vcuxwfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vcuxwfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:521`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L521) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:98`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L98) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:659`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L659) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4355-4362`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4355-L4362) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vcuxwfp128 => { + let uimm = (instr.raw >> 16) & 0x1F; + let b = ctx.vr[instr.vb128()].as_u32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { r[i] = crate::vmx::cvt_u32_to_f32(b[i], uimm); } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unsigned fixed-point (uint32) → float with explicit scale.** Each lane computes `VD[i] = (float)VB.w[i] * 2^-UIMM`. Treats the 32-bit input as unsigned, so values ≥ `0x80000000` produce positive floats (unlike `vcsxwfp128` which would produce negatives). +- **IEEE-754 binary32 output, round-to-nearest.** Precision loss above `2^24`. +- **No `VSCR[SAT]` effect.** +- **`VSCR[NJ]` does not affect the uint → float path.** +- **VMX128 register-fusion** applies. +- **No IBM AIX entry** — Xenon-only. Closest standard is [`vcfux`](../vmx/vcfux.md). +- **No `Rc`, no XER / FPSCR.** + +## Related Instructions + +- [`vcfux`](../vmx/vcfux.md) — the standard Altivec `uint32 → float` with scale. +- [`vcsxwfp128`](vcsxwfp128.md) — signed-int variant. +- [`vcfpuxws128`](vcfpuxws128.md), [`vcfpsxws128`](vcfpsxws128.md) — the inverse (float → int with scale). + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions; cross-referenced with [IBM AltiVec Technology Programmer's Interface Manual §`vcfux`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf). diff --git a/migration/project-root/ppc-manual/vmx128/vmaddcfp128.md b/migration/project-root/ppc-manual/vmx128/vmaddcfp128.md new file mode 100644 index 0000000..1db1a62 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vmaddcfp128.md @@ -0,0 +1,148 @@ +# `vmaddcfp128` — Vector128 Multiply Add Floating Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000110` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmaddcfp128` | `vmaddcfp128` | — | Vector128 Multiply Add Floating Point | + +## Syntax + +```asm +vmaddcfp128 [VD], [VA], [VD], [VB] +``` + +## Encoding + +### `vmaddcfp128` — form `VX128` + +- **Opcode word:** `0x14000110` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `272` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmaddcfp128: read | Source A vector register. | +| `VD` | vmaddcfp128: read; vmaddcfp128: write | Destination vector register. | +| `VB` | vmaddcfp128: read | Source B vector register. | + +## Register Effects + +### `vmaddcfp128` + +- **Reads (always):** `VA`, `VD`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmaddcfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmaddcfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:812`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L812) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:100`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L100) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:614`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L614) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4492-4509`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4492-L4509) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmaddcfp128 => { + // ISA: (VD) <- (VA × VD) + VB. Canary InstrEmit_vmaddcfp128 (cc:819): MulAdd(VA, VD, VB). + // Previous code computed di.mul_add(bi, ai) = VD×VB+VA — both operands wrong + // (PPCBUG-425). Fix: ai.mul_add(di, bi) = VA×VD+VB. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let d = ctx.vr[instr.vd128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + let di = vmx::flush_denorm(d[i]); + // PPCBUG-437: flush subnormal output too. + r[i] = vmx::flush_denorm(ai.mul_add(di, bi)); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Xbox-specific fused multiply-add variant.** Each lane computes `VD[i] = VD[i] * VB[i] + VA[i]` — note that `VD` is both source and destination (xenia reads `VD` first, then writes). This is *not* the standard [`vmaddfp`](../vmx/vmaddfp.md) operand order: the "addend" position is `VA`, the other factor is `VB`, and `VD` carries the on-going accumulator. The mnemonic's trailing `c` denotes "accumulator-in-VD" rather than a separate `VC` operand. +- **Fused, single-rounding.** Xenia uses `f32::mul_add`, which maps to a host FMA instruction when available. Bit-for-bit result depends on host support; xenia-canary's LLVM path emits the equivalent IR node. +- **IEEE-754 binary32 lanes; `VSCR[NJ]` honoured.** +- **No VSCR[SAT], no FPSCR update.** +- **NaN propagation** per IEEE-754. +- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`). +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vmaddfp`](../vmx/vmaddfp.md), [`vmaddfp128`](../vmx/vmaddfp.md) — standard fused `(VA × VC) + VB`. +- [`vmulfp128`](vmulfp128.md) — plain lane-wise float multiply. +- [`vnmsubfp`](../vmx/vnmsubfp.md) — negative-multiply-subtract. +- [`vmsum3fp128`](vmsum3fp128.md), [`vmsum4fp128`](vmsum4fp128.md) — dot-product reductions. + +## IBM Reference + +- No IBM AIX entry — this is an Xbox 360 VMX128 extension. Its semantics differ from the base Altivec [`vmaddfp`](../vmx/vmaddfp.md) in the operand order (accumulator in `VD`, not `VC`). +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base FMA semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vmsum3fp128.md b/migration/project-root/ppc-manual/vmx128/vmsum3fp128.md new file mode 100644 index 0000000..66c50fc --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vmsum3fp128.md @@ -0,0 +1,141 @@ +# `vmsum3fp128` — Vector128 Multiply Sum 3-way Floating Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000190` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsum3fp128` | `vmsum3fp128` | — | Vector128 Multiply Sum 3-way Floating Point | + +## Syntax + +```asm +vmsum3fp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmsum3fp128` — form `VX128` + +- **Opcode word:** `0x14000190` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `400` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsum3fp128: read | Source A vector register. | +| `VB` | vmsum3fp128: read | Source B vector register. | +| `VD` | vmsum3fp128: write | Destination vector register. | + +## Register Effects + +### `vmsum3fp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsum3fp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsum3fp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1067`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1067) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:106`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L106) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:616`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L616) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4513-4523`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4513-L4523) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsum3fp128 => { + // PPCBUG-436: flush per-product intermediates (not just the final sum). + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let p0 = vmx::flush_denorm(a[0] * b[0]); + let p1 = vmx::flush_denorm(a[1] * b[1]); + let p2 = vmx::flush_denorm(a[2] * b[2]); + let s = vmx::flush_denorm(p0 + p1 + p2); + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4(s, s, s, s); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **3-way float dot product.** Computes `s = VA[0]*VB[0] + VA[1]*VB[1] + VA[2]*VB[2]` (ignoring lane 3 — the "w" component of a homogeneous vector) and **broadcasts `s` to every lane of `VD`**. Typical call site: 3D vector dot products where the w-component is padding. +- **Scalar-result-splatted-across-lanes.** Consuming code can then use any lane of `VD` as the dot-product result. +- **Rounding.** Xenia performs two adds in sequence (no fused triple-add in Rust). The order matches the spec but the summation order affects round-off by ~1 ulp. Games that need deterministic cross-host behaviour typically pre-scale their inputs. +- **IEEE-754 binary32; `VSCR[NJ]` honoured.** +- **No VSCR[SAT], no FPSCR update.** +- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`). +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vmsum4fp128`](vmsum4fp128.md) — 4-way dot-product (includes the w-lane). +- [`vmulfp128`](vmulfp128.md), [`vaddfp`](../vmx/vaddfp.md) — the building blocks. +- [`vmaddcfp128`](vmaddcfp128.md), [`vmaddfp`](../vmx/vmaddfp.md) — fused MAC variants. +- [`vsumsws`](../vmx/vsumsws.md) — integer sum-reduction analogue. + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. A 3-way dot product is a direct mirror of D3D9's `float3 dot`. +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base float arithmetic semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vmsum4fp128.md b/migration/project-root/ppc-manual/vmx128/vmsum4fp128.md new file mode 100644 index 0000000..84638d5 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vmsum4fp128.md @@ -0,0 +1,142 @@ +# `vmsum4fp128` — Vector128 Multiply Sum 4-way Floating-Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x140001d0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmsum4fp128` | `vmsum4fp128` | — | Vector128 Multiply Sum 4-way Floating-Point | + +## Syntax + +```asm +vmsum4fp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmsum4fp128` — form `VX128` + +- **Opcode word:** `0x140001d0` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `464` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmsum4fp128: read | Source A vector register. | +| `VB` | vmsum4fp128: read | Source B vector register. | +| `VD` | vmsum4fp128: write | Destination vector register. | + +## Register Effects + +### `vmsum4fp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmsum4fp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmsum4fp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1077`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1077) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:106`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L106) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:617`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L617) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4524-4535`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4524-L4535) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmsum4fp128 => { + // PPCBUG-436. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let p0 = vmx::flush_denorm(a[0] * b[0]); + let p1 = vmx::flush_denorm(a[1] * b[1]); + let p2 = vmx::flush_denorm(a[2] * b[2]); + let p3 = vmx::flush_denorm(a[3] * b[3]); + let s = vmx::flush_denorm(p0 + p1 + p2 + p3); + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4(s, s, s, s); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **4-way float dot product.** Computes `s = VA[0]*VB[0] + VA[1]*VB[1] + VA[2]*VB[2] + VA[3]*VB[3]` (the full xyzw dot) and **broadcasts `s` to every lane of `VD`**. +- **Scalar-result-splatted-across-lanes.** Direct mirror of HLSL/GLSL's `float4 dot`. +- **Rounding.** Three sequential adds; round-off order affects result by ~1 ulp. Not an FMA in xenia. +- **IEEE-754 binary32; `VSCR[NJ]` honoured.** +- **No VSCR[SAT], no FPSCR update.** +- **VMX128 register-fusion** (7-bit IDs on `VA`, `VB`, `VD`). +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vmsum3fp128`](vmsum3fp128.md) — 3-way dot-product (ignores the w-lane). +- [`vmulfp128`](vmulfp128.md), [`vaddfp`](../vmx/vaddfp.md) — the building blocks. +- [`vmaddcfp128`](vmaddcfp128.md), [`vmaddfp`](../vmx/vmaddfp.md) — fused MAC variants. +- [`vsumsws`](../vmx/vsumsws.md) — integer sum-reduction analogue. + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Directly mirrors D3D9's `float4 dot`. +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 5 — Floating-Point Arithmetic](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for base float semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vmulfp128.md b/migration/project-root/ppc-manual/vmx128/vmulfp128.md new file mode 100644 index 0000000..1816a41 --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vmulfp128.md @@ -0,0 +1,142 @@ +# `vmulfp128` — Vector128 Multiply Floating-Point + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128](../forms/VX128.md) · **Opcode:** `0x14000090` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vmulfp128` | `vmulfp128` | — | Vector128 Multiply Floating-Point | + +## Syntax + +```asm +vmulfp128 [VD], [VA], [VB] +``` + +## Encoding + +### `vmulfp128` — form `VX128` + +- **Opcode word:** `0x14000090` +- **Primary opcode (bits 0–5):** `5` +- **Extended opcode:** `144` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (4 or 5) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `VA128l` | source A low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21 | `VA128H` | source A high bit | +| 22 | `—` | reserved | +| 23–25 | `VC` | optional VC / XO sub-field | +| 26 | `VA128h` | source A middle bit | +| 27 | `—` | reserved | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VA` | vmulfp128: read | Source A vector register. | +| `VB` | vmulfp128: read | Source B vector register. | +| `VD` | vmulfp128: write | Destination vector register. | + +## Register Effects + +### `vmulfp128` + +- **Reads (always):** `VA`, `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vmulfp128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vmulfp128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1126`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1126) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:108`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L108) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:612`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L612) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:2108-2120`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L2108-L2120) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vmulfp128 => { + // PPCBUG-435 + PPCBUG-437. + let a = ctx.vr[instr.va128()].as_f32x4(); + let b = ctx.vr[instr.vb128()].as_f32x4(); + let mut r = [0f32; 4]; + for i in 0..4 { + let ai = vmx::flush_denorm(a[i]); + let bi = vmx::flush_denorm(b[i]); + r[i] = vmx::flush_denorm(ai * bi); + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_f32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Lane-wise float multiply — Xenon-only.** Base Altivec has no dedicated `vmulfp`; the pattern on traditional PowerPC is `vmaddfp vD, vA, vC, v_zero`. Xenon adds this direct instruction, saving the zero-register setup. +- **IEEE-754 binary32, round-to-nearest.** Each of the four lanes computes `VD[i] = VA[i] * VB[i]`. +- **`VSCR[NJ]` honoured** (denormals flush-to-zero). +- **NaN propagation** per IEEE-754. +- **No VSCR[SAT], no FPSCR update, no exceptions.** +- **VMX128 register-fusion** (7-bit IDs). +- **No IBM AIX entry** — Xbox-specific; contrast with the `vmaddfp`-with-zero workaround used on non-Xenon Altivec. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vmaddfp`](../vmx/vmaddfp.md), [`vmaddcfp128`](vmaddcfp128.md) — fused MAC forms. +- [`vaddfp`](../vmx/vaddfp.md), [`vsubfp`](../vmx/vsubfp.md) — lane-wise float add/sub. +- [`vmsum3fp128`](vmsum3fp128.md), [`vmsum4fp128`](vmsum4fp128.md) — dot-product reductions. + +## IBM Reference + +- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Non-Xenon Altivec code emits `vmaddfp vD, vA, vC, v_zero` to achieve the same effect. +- [IBM AltiVec Technology Programmer's Interface Manual §`vmaddfp`](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the underlying float semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vpermwi128.md b/migration/project-root/ppc-manual/vmx128/vpermwi128.md new file mode 100644 index 0000000..d1b5a7c --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vpermwi128.md @@ -0,0 +1,139 @@ +# `vpermwi128` — Vector128 Permutate Word Immediate + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_P](../forms/VX128_P.md) · **Opcode:** `0x18000210` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpermwi128` | `vpermwi128` | — | Vector128 Permutate Word Immediate | + +## Syntax + +```asm +vpermwi128 [VD], [VB], [UIMM] +``` + +## Encoding + +### `vpermwi128` — form `VX128_P` + +- **Opcode word:** `0x18000210` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `528` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `PERMl` | permute selector low 5 bits | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–22 | `—` | reserved | +| 23–25 | `PERMh` | permute selector high 3 bits | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vpermwi128: read | Source B vector register. | +| `UIMM` | vpermwi128: read | 16-bit unsigned immediate. Zero-extended. | +| `VD` | vpermwi128: write | Destination vector register. | + +## Register Effects + +### `vpermwi128` + +- **Reads (always):** `VB`, `UIMM` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpermwi128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpermwi128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1207`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1207) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:642`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L642) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4537-4548`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4537-L4548) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpermwi128 => { + let imm = instr.vx128_p_perm(); + let b = ctx.vr[instr.vb128()].as_u32x4(); + let mut r = [0u32; 4]; + // Output lane i ← b[(imm >> (2 * (3-i))) & 3] + for i in 0..4 { + let sel = ((imm >> (2 * (3 - i))) & 3) as usize; + r[i] = b[sel]; + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Word-level 4-way permute via an 8-bit immediate.** The 8-bit `PERM` immediate (carried in fields `PERMh ‖ PERMl` of the encoding) is treated as **four 2-bit selectors**, one per output word lane. Each 2-bit field selects which of `VB`'s 4 word lanes is copied to the corresponding output lane. +- **Bit layout of the immediate.** Output lane 0 (big-endian MSB word) is selected by bits 6–7 of `PERM`; lane 1 by bits 4–5; lane 2 by bits 2–3; lane 3 by bits 0–1. (In xenia: `sel = (imm >> (2 * (3-i))) & 3`.) +- **Super-set of [`vspltw`](../vmx/vspltw.md).** A splat is `vpermwi128 vD, vB, 0x00` (all lanes = word 0), `0x55` (all = word 1), `0xAA` (all = word 2), `0xFF` (all = word 3). Arbitrary shuffles like "xyzw → wzyx" are a single-instruction operation. +- **Immediate-only.** No dynamic selector vector; contrast with [`vperm`](../vmx/vperm.md). +- **Single-source.** Unlike `vperm`/`vperm128`, `vpermwi128` only reshuffles one register (`VB`); it cannot interleave two operands. +- **VMX128 register-fusion** on `VD` and `VB` (7-bit IDs). +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER, no VSCR.** + +## Related Instructions + +- [`vperm`](../vmx/vperm.md), [`vperm128`](../vmx/vperm.md) — general byte-granularity permute (two-source). +- [`vspltw`](../vmx/vspltw.md), [`vspltw128`](../vmx/vspltw.md) — single-word splat (special case of `vpermwi128`). +- [`vsldoi`](../vmx/vsldoi.md) — static-immediate byte rotate of two registers. +- [`vrlimi128`](vrlimi128.md) — rotate + mask-insert (per-word rotate with an insert mask). + +## IBM Reference + +- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. Functionally equivalent to HLSL's `.xyzw`-suffix swizzle on `float4`. +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 6 — Permute and Formatting](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base permute semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vpkd3d128.md b/migration/project-root/ppc-manual/vmx128/vpkd3d128.md new file mode 100644 index 0000000..d3e232e --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vpkd3d128.md @@ -0,0 +1,185 @@ +# `vpkd3d128` — Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_4](../forms/VX128_4.md) · **Opcode:** `0x18000610` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vpkd3d128` | `vpkd3d128` | — | Vector128 Pack D3Dtype, Rotate Left Immediate and Mask Insert | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `vpkd3d128` — form `VX128_4` + +- **Opcode word:** `0x18000610` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1552` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–23 | `XO` | extended opcode | +| 24–25 | `z` | sub-operation selector | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vpkd3d128: read | Source B vector register. | +| `VD` | vpkd3d128: write | Destination vector register. | + +## Register Effects + +### `vpkd3d128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vpkd3d128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vpkd3d128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2088`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2088) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:112`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L112) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:648`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L648) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4191-4248`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4191-L4248) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vpkd3d128 => { + use crate::vmx::D3dPackType; + let uimm = crate::decoder::extract_vx128_uimm5(instr.raw); + let pack = (uimm & 3) as usize; + let shift = instr.vx128_4_z() as usize; + let ty = D3dPackType::from_immediate(uimm >> 2); + let src = ctx.vr[instr.vb128()]; + let out = match ty { + D3dPackType::D3dColor => crate::vmx::pack_d3dcolor(src), + D3dPackType::NormShort2 => crate::vmx::pack_normshort2(src), + D3dPackType::NormPacked32 => crate::vmx::pack_normpacked32(src), + D3dPackType::Float16_2 => crate::vmx::pack_float16_2(src), + D3dPackType::NormShort4 => crate::vmx::pack_normshort4(src), + D3dPackType::Float16_4 => crate::vmx::pack_float16_4(src), + D3dPackType::NormPacked64 => crate::vmx::pack_normpacked64(src), + D3dPackType::Other(t) => { + tracing::warn!( + raw = format_args!("{:#010x}", instr.raw), + uimm, + ty = t, + "vpkd3d128: unhandled pack type at {:#010x}", + ctx.pc, + ); + src + } + }; + // Post-pack permutation: merge packed `out` into previous `vd` + // per canary ppc_emit_altivec.cc:2126-2188 MakePermuteMask tables. + // MakePermuteMask(r0,l0, r1,l1, r2,l2, r3,l3): result[i] = if ri==0 { prev[li] } else { out[li] } + let result = if pack == 0 { + out + } else { + // (source_reg, lane): 0=prev vd, 1=packed out + const PERM: [[[(u8, u8); 4]; 4]; 3] = [ + // pack=1 (VPACK_32): places out[3] at lane (3-shift) + [[(0,0),(0,1),(0,2),(1,3)], [(0,0),(0,1),(1,3),(0,3)], + [(0,0),(1,3),(0,2),(0,3)], [(1,3),(0,1),(0,2),(0,3)]], + // pack=2 (64-bit): places out[2..3] at lanes (2-shift)..(3-shift) + [[(0,0),(0,1),(1,2),(1,3)], [(0,0),(1,2),(1,3),(0,3)], + [(1,2),(1,3),(0,2),(0,3)], [(1,3),(0,1),(0,2),(0,3)]], + // pack=3 (64-bit): same as pack=2 except shift=3 selects out[2] at lane 3 + [[(0,0),(0,1),(1,2),(1,3)], [(0,0),(1,2),(1,3),(0,3)], + [(1,2),(1,3),(0,2),(0,3)], [(0,0),(0,1),(0,2),(1,2)]], + ]; + let prev = ctx.vr[instr.vd128()]; + let pw = prev.as_u32x4(); + let ow = out.as_u32x4(); + let sel = PERM[pack - 1][shift]; + xenia_types::Vec128::from_u32x4_array([ + if sel[0].0 == 0 { pw[sel[0].1 as usize] } else { ow[sel[0].1 as usize] }, + if sel[1].0 == 0 { pw[sel[1].1 as usize] } else { ow[sel[1].1 as usize] }, + if sel[2].0 == 0 { pw[sel[2].1 as usize] } else { ow[sel[2].1 as usize] }, + if sel[3].0 == 0 { pw[sel[3].1 as usize] } else { ow[sel[3].1 as usize] }, + ]) + }; + ctx.vr[instr.vd128()] = result; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Pack four float lanes into a single D3D-format 32-bit word.** The `IMM` field and the `z` sub-operation selector (together carried in bits 6–10 of the encoding in xenia's layout) choose *which* D3D format to emit: + - `D3dColor` — pack 4×float `[0.0, 1.0]` lanes into a 32-bit RGBA8 (A in high byte, B in low byte) — the canonical Direct3D 9 `D3DCOLOR` format. Xenia's helper is `vmx::pack_d3dcolor`. + - Other formats (RGBA16, compressed colour, etc.) are not yet implemented in xenia-rs; the interpreter logs a warning and passes through unchanged. +- **Also performs rotate-left-immediate and mask-insert.** The mnemonic is "Pack D3Dtype, Rotate Left Immediate and Mask Insert": the result of the pack step is rotated and merged into an existing `VD` under an immediate mask. Xenia currently emits only the pack step and overwrites `VD` wholesale; games rarely rely on the rotate-and-insert aspect. +- **Sub-operation via the `z` field** (2 bits) + `IMM` (5 bits) gives 7 bits of format selection; the practical set used by Xenon games is small (D3DCOLOR is the dominant one). +- **No saturation signal.** The packer saturates floats beyond `[0.0, 1.0]` silently; `VSCR[SAT]` is not touched. +- **VMX128 register-fusion** on `VD` and `VB`. +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vupkd3d128`](vupkd3d128.md) — the inverse (unpack a D3D-format word back into 4 floats). +- [`vpkpx`](../vmx/vpkpx.md) — the standard Altivec 1-5-5-5 pixel pack. +- [`vpkshus`](../vmx/vpkshus.md), [`vpkuhus`](../vmx/vpkuhus.md) — byte-range saturating packs (an alternative colour-packing path). +- [`vcfpsxws128`](vcfpsxws128.md), [`vcfpuxws128`](vcfpuxws128.md) — conversion with explicit scale; software sometimes pre-scales floats to `[0, 255]` before using these in place of `vpkd3d128`. + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. The "D3D" in the mnemonic refers directly to Direct3D 9 vertex/pixel formats (the `D3DDECLTYPE_*` enumeration). +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. +- Microsoft D3D9 documentation: `D3DDECLTYPE_D3DCOLOR`, `D3DDECLTYPE_UBYTE4N`, etc. diff --git a/migration/project-root/ppc-manual/vmx128/vrlimi128.md b/migration/project-root/ppc-manual/vmx128/vrlimi128.md new file mode 100644 index 0000000..b5fdfba --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vrlimi128.md @@ -0,0 +1,141 @@ +# `vrlimi128` — Vector128 Rotate Left Immediate and Mask Insert + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_4](../forms/VX128_4.md) · **Opcode:** `0x18000710` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vrlimi128` | `vrlimi128` | — | Vector128 Rotate Left Immediate and Mask Insert | + +## Syntax + +```asm +vrlimi128 [VD], [VB], [IMM], [z] +``` + +## Encoding + +### `vrlimi128` — form `VX128_4` + +- **Opcode word:** `0x18000710` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `1808` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–23 | `XO` | extended opcode | +| 24–25 | `z` | sub-operation selector | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vrlimi128: read | Source B vector register. | +| `VD` | vrlimi128: write | Destination vector register. | + +## Register Effects + +### `vrlimi128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vrlimi128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vrlimi128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:1315`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L1315) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:119`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L119) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:649`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L649) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:3962-3977`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L3962-L3977) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vrlimi128 => { + let shift = instr.vx128_4_z() as usize; + let mask = instr.vx128_4_imm(); + let b = ctx.vr[instr.vb128()].as_u32x4(); + let d = ctx.vr[instr.vd128()].as_u32x4(); + let rot = [b[shift % 4], b[(shift + 1) % 4], b[(shift + 2) % 4], b[(shift + 3) % 4]]; + let mut r = [0u32; 4]; + for i in 0..4 { + // mask bit 3 corresponds to word 0 (BE-first). Use rot when + // the corresponding mask bit is set. + let use_rot = (mask >> (3 - i)) & 1 == 1; + r[i] = if use_rot { rot[i] } else { d[i] }; + } + ctx.vr[instr.vd128()] = xenia_types::Vec128::from_u32x4_array(r); + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Rotate-left-word + mask-insert in one step.** `VB` is rotated left by `IMM & 3` word positions (word-granular, 0..3 — not bits). The resulting rotated vector is merged into the pre-existing `VD` under control of a 4-bit "insert mask" (`fmask`, from bits 26–29 of the encoding in xenia's layout): mask bit `i` = 1 keeps lane `i` from the rotated `VB`; mask bit = 0 keeps lane `i` from the old `VD`. +- **Destructive destination.** `VD` is both source and destination — software must preserve its value or pre-initialise it. +- **Typical use: selective-lane overwrite.** Games use this to "rewrite lane `n` of a vector with a shuffled component" without a full permute. A common pattern is "insert a scalar into lane `i` of a vector" where the scalar has been pre-loaded to a known word of `VB`. +- **Mask bit ↔ lane mapping.** Big-endian: mask bit 3 (MSB of the 4-bit mask) controls lane 0; bit 0 controls lane 3. (In xenia: `use_rot = (mask >> (3 − i)) & 1`.) +- **VMX128 register-fusion** on `VD` and `VB`. +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER, no VSCR.** + +## Related Instructions + +- [`vrlw`](../vmx/vrlw.md), [`vrlw128`](../vmx/vrlw.md) — per-lane bit-level rotate (word-granular shift, not lane-granular). +- [`vpermwi128`](vpermwi128.md) — immediate 4-way word permute (no merge). +- [`vsel`](../vmx/vsel.md), [`vsel128`](../vmx/vsel.md) — general bit-select; `vrlimi128` is the specialised "rotate + insert" equivalent. +- [`vsldoi`](../vmx/vsldoi.md) — byte-level immediate shift. + +## IBM Reference + +- No IBM AIX entry — this instruction is exclusive to the Xbox 360's VMX128 extension. The mnemonic is an adaptation of the scalar `rlwimi` (rotate-left-word-immediate-mask-insert) pattern for vectors. +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. +- [IBM AltiVec Technology Programmer's Interface Manual, Chapter 4 — Integer Shift / Rotate](https://www.nxp.com/docs/en/reference-manual/ALTIVECPIM.pdf) for the base rotate semantics. diff --git a/migration/project-root/ppc-manual/vmx128/vupkd3d128.md b/migration/project-root/ppc-manual/vmx128/vupkd3d128.md new file mode 100644 index 0000000..e8c4aeb --- /dev/null +++ b/migration/project-root/ppc-manual/vmx128/vupkd3d128.md @@ -0,0 +1,154 @@ +# `vupkd3d128` — Vector128 Unpack D3Dtype + +> **Category:** [VMX128](../categories/vmx128.md) · **Form:** [VX128_3](../forms/VX128_3.md) · **Opcode:** `0x180007f0` + + + +## Assembler Mnemonics + +| Mnemonic | XML entry | Flags | Description | +| --- | --- | --- | --- | +| `vupkd3d128` | `vupkd3d128` | — | Vector128 Unpack D3Dtype | + +## Syntax + +```asm +(no disassembly template) +``` + +## Encoding + +### `vupkd3d128` — form `VX128_3` + +- **Opcode word:** `0x180007f0` +- **Primary opcode (bits 0–5):** `6` +- **Extended opcode:** `2032` +- **Synchronising:** no + +| Bits | Field | Meaning | +| --- | --- | --- | +| 0–5 | `OPCD` | primary opcode (6) | +| 6–10 | `VD128l` | destination low 5 bits | +| 11–15 | `IMM` | 5-bit immediate | +| 16–20 | `VB128l` | source B low 5 bits | +| 21–27 | `XO` | extended opcode | +| 28–29 | `VD128h` | destination high 2 bits | +| 30–31 | `VB128h` | source B high 2 bits | + +## Operands + +| Field | Role | Description | +| --- | --- | --- | +| `VB` | vupkd3d128: read | Source B vector register. | +| `VD` | vupkd3d128: write | Destination vector register. | + +## Register Effects + +### `vupkd3d128` + +- **Reads (always):** `VB` +- **Reads (conditional):** _none_ +- **Writes (always):** `VD` +- **Writes (conditional):** _none_ + +## Status-Register Effects + +_No condition-register or status-register effects._ + +## Operation (pseudocode) + +``` +; Pseudocode derives directly from the xenia-rs interpreter +; arm (see Implementation References). Operation semantics: +; - Read source operands from the fields listed under Operands. +; - Apply the arithmetic / logical / memory action described +; in the Description field above. +; - Write results to the destination register(s); update any +; status bits enumerated under Status-Register Effects. +; Consult the IBM AIX reference link under IBM Reference for +; canonical PPC-style pseudocode where xenia's expression is +; terse. +``` + +## C Translation Example + +```c +/* C translation: the xenia-rs interpreter arm below in */ +/* Implementation References is the authoritative semantic */ +/* snapshot. Translate it line-by-line: */ +/* - ctx.gpr[N] -> r[N] (or f[]/v[] for FPRs/VRs) */ +/* - mem.read_u*/write_u* -> mem_read_u*_be / mem_write_u*_be */ +/* - ctx.update_cr_signed(fld, v) -> update_cr_signed(fld, v) */ +/* - ctx.xer_ca / xer_ov / xer_so -> xer.CA / xer.OV / xer.SO */ +/* The Register Effects and Status-Register Effects tables above */ +/* enumerate every side effect a faithful translation must emit. */ +``` + +## Implementation References + +**`vupkd3d128`** +- xenia-canary XML: [`tools/ppc-instructions.xml` — search for `mnem="vupkd3d128"`](../../xenia-canary/tools/ppc-instructions.xml) +- xenia-canary emit: [`src/xenia/cpu/ppc/ppc_emit_altivec.cc:2194`](../../xenia-canary/src/xenia/cpu/ppc/ppc_emit_altivec.cc#L2194) +- xenia-rs opcode: [`crates/xenia-cpu/src/opcode.rs:128`](../../xenia-rs/crates/xenia-cpu/src/opcode.rs#L128) +- xenia-rs decoder: [`crates/xenia-cpu/src/decoder.rs:670`](../../xenia-rs/crates/xenia-cpu/src/decoder.rs#L670) +- xenia-rs interpreter: [`crates/xenia-cpu/src/interpreter.rs:4249-4275`](../../xenia-rs/crates/xenia-cpu/src/interpreter.rs#L4249-L4275) +
xenia-rs interpreter body (frozen snapshot) + +```rust + PpcOpcode::vupkd3d128 => { + use crate::vmx::D3dPackType; + let uimm = crate::decoder::extract_vx128_uimm5(instr.raw); + let ty = D3dPackType::from_immediate(uimm >> 2); + let src = ctx.vr[instr.vb128()]; + let out = match ty { + D3dPackType::D3dColor => crate::vmx::unpack_d3dcolor(src), + D3dPackType::NormShort2 => crate::vmx::unpack_normshort2(src), + D3dPackType::NormPacked32 => crate::vmx::unpack_normpacked32(src), + D3dPackType::Float16_2 => crate::vmx::unpack_float16_2(src), + D3dPackType::NormShort4 => crate::vmx::unpack_normshort4(src), + D3dPackType::Float16_4 => crate::vmx::unpack_float16_4(src), + D3dPackType::NormPacked64 => crate::vmx::unpack_normpacked64(src), + D3dPackType::Other(t) => { + tracing::warn!( + raw = format_args!("{:#010x}", instr.raw), + uimm, + ty = t, + "vupkd3d128: unhandled pack type at {:#010x}", + ctx.pc, + ); + src + } + }; + ctx.vr[instr.vd128()] = out; + ctx.pc += 4; + } +``` +
+ + + +## Special Cases & Edge Conditions + +- **Unpack a D3D-format word into 4 float lanes.** The `IMM` field in the encoding selects the target format: + - `D3dColor` — decode a 32-bit RGBA8 (`D3DCOLOR`) into 4 float lanes in `[0.0, 1.0]`. Xenia's helper is `vmx::unpack_d3dcolor`. + - Other formats (UBYTE4N, SHORT2N, etc.) are not yet implemented in xenia-rs; the interpreter logs a warning and passes `VB` through unchanged. +- **Inverse of [`vpkd3d128`](vpkd3d128.md).** The same format code used to pack must be used to unpack. +- **Source-width is a single 32-bit word** of `VB` (typically lane 0; the helpers read the appropriate component). The other three input word lanes are ignored for `D3DCOLOR`. +- **IEEE-754 binary32 outputs,** already normalised to `[0.0, 1.0]` (integer value divided by 255, then cast to float). +- **No `VSCR[SAT]` effect**, no FPSCR, no exceptions. +- **VMX128 register-fusion** on `VD` and `VB`. +- **No IBM AIX entry** — Xenon-only. +- **No `Rc`, no XER.** + +## Related Instructions + +- [`vpkd3d128`](vpkd3d128.md) — the inverse pack. +- [`vupkhpx`](../vmx/vupkhpx.md), [`vupklpx`](../vmx/vupklpx.md) — standard Altivec 1-5-5-5 pixel unpacks. +- [`vupkhsb`](../vmx/vupkhsb.md), [`vupklsb`](../vmx/vupklsb.md) — sign-extending byte→half-word unpacks (the integer analogue). +- [`vcsxwfp128`](vcsxwfp128.md), [`vcuxwfp128`](vcuxwfp128.md) — int → float with scale; sometimes used as an alternate decode path. + +## IBM Reference + +- No IBM AIX entry — Xbox 360 VMX128 extension only. "D3D" denotes the Direct3D 9 vertex/pixel format catalogue (`D3DDECLTYPE_*`). +- Xbox 360 XDK, Altivec-128 (VMX128) extensions. +- Microsoft D3D9 documentation: `D3DDECLTYPE_D3DCOLOR`, `D3DDECLTYPE_UBYTE4N`, etc. diff --git a/migration/project-root/run-canary.sh b/migration/project-root/run-canary.sh new file mode 100755 index 0000000..8c187aa --- /dev/null +++ b/migration/project-root/run-canary.sh @@ -0,0 +1,4 @@ +/home/fabi/RE\ Project\ Sylpheed/xenia-canary/build/bin/Linux/Debug/xenia_canary \ + '/home/fabi/RE Project Sylpheed/xenia-rs/sylpheed.iso' \ + --log_level=3 \ + --disable_instruction_infocache=true diff --git a/migration/setup.sh b/migration/setup.sh new file mode 100755 index 0000000..7c2de8f --- /dev/null +++ b/migration/setup.sh @@ -0,0 +1,136 @@ +#!/usr/bin/env bash +# Idempotent installer for the cross-machine snapshot bundle. +# Run from inside xenia-rs/migration/ on a fresh clone. +# +# Restores the parts of the working state that live OUTSIDE the xenia-rs +# git repo: +# 1. Claude auto-memory directory +# 2. project-root .claude/settings.json (Stop hook + permissions) +# 3. project-root ppc-manual/ +# 4. project-root run-canary.sh +# 5. xenia-canary clone (if missing) +# +# Does NOT restore the Sylpheed ISO (manual) or sylpheed.db (regenerate +# via analysis tooling after first build). See migration/README.md. + +set -euo pipefail + +# Resolve paths from this script's location. +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +REPO_DIR="$(cd "$SCRIPT_DIR/.." && pwd)" # xenia-rs/ +ROOT_DIR="$(cd "$REPO_DIR/.." && pwd)" # project root (parent of xenia-rs) + +echo "==> Cross-machine setup" +echo " script: $SCRIPT_DIR" +echo " repo: $REPO_DIR" +echo " root: $ROOT_DIR" +echo + +# ---------- 1. Auto-memory ---------- +# Path is derived from Claude Code's cwd-sanitized scheme: every '/' in +# the absolute working directory becomes '-'. If the project root path +# differs from the original (/home/fabi/RE Project Sylpheed), the memory +# directory name MUST differ to match this machine's cwd. +CWD_SANITIZED="$(printf '%s' "$REPO_DIR" | tr '/' '-' | sed 's/^-//')" +MEMORY_TARGET="$HOME/.claude/projects/-${CWD_SANITIZED}/memory" + +# But for backward-compat (memory files reference the original absolute +# paths), the original directory name is also restored unconditionally. +MEMORY_TARGET_ORIG="$HOME/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory" + +for target in "$MEMORY_TARGET" "$MEMORY_TARGET_ORIG"; do + if [ -d "$target" ] && [ "$(ls -A "$target" 2>/dev/null | wc -l)" -gt 0 ]; then + echo "==> [skip] memory dir already populated: $target" + else + echo "==> Installing auto-memory: $target" + mkdir -p "$target" + cp -a "$SCRIPT_DIR/claude-memory/." "$target/" + echo " -> $(ls "$target" | wc -l) files restored" + fi +done + +# ---------- 2. Project-root .claude/settings.json ---------- +if [ -f "$ROOT_DIR/.claude/settings.json" ]; then + echo "==> [skip] $ROOT_DIR/.claude/settings.json already exists" +else + echo "==> Installing project-root .claude/settings.json (Stop hook + permissions)" + mkdir -p "$ROOT_DIR/.claude" + cp "$SCRIPT_DIR/project-root/dot-claude/settings.json" "$ROOT_DIR/.claude/settings.json" +fi + +# ---------- 3. ppc-manual ---------- +if [ -d "$ROOT_DIR/ppc-manual" ]; then + echo "==> [skip] $ROOT_DIR/ppc-manual already exists" +else + echo "==> Installing ppc-manual (PowerPC reference docs)" + cp -a "$SCRIPT_DIR/project-root/ppc-manual" "$ROOT_DIR/ppc-manual" +fi + +# ---------- 4. run-canary.sh ---------- +if [ -f "$ROOT_DIR/run-canary.sh" ]; then + echo "==> [skip] $ROOT_DIR/run-canary.sh already exists" +else + echo "==> Installing run-canary.sh helper" + cp "$SCRIPT_DIR/project-root/run-canary.sh" "$ROOT_DIR/run-canary.sh" + chmod +x "$ROOT_DIR/run-canary.sh" +fi + +# ---------- 5. xenia-canary clone ---------- +CANARY_DIR="$ROOT_DIR/xenia-canary" +CANARY_REMOTE="https://git.mc02.dev/fabi/Xenia-Canary.git" +CANARY_HEAD="6de80dffe261b368ecefee36c9b2b337335228c0" + +if [ -d "$CANARY_DIR/.git" ]; then + echo "==> [skip] xenia-canary already cloned at $CANARY_DIR" + echo " HEAD: $(git -C "$CANARY_DIR" rev-parse HEAD)" +else + echo "==> Cloning xenia-canary into $CANARY_DIR" + if git clone "$CANARY_REMOTE" "$CANARY_DIR"; then + git -C "$CANARY_DIR" checkout "$CANARY_HEAD" || \ + echo " [warn] could not check out pinned HEAD $CANARY_HEAD; using whatever HEAD the clone defaulted to" + else + echo " [warn] xenia-canary clone failed; you can clone manually later" + echo " git clone $CANARY_REMOTE $CANARY_DIR" + echo " git -C $CANARY_DIR checkout $CANARY_HEAD" + fi +fi + +# ---------- Final checklist ---------- +cat < Setup complete. Remaining manual steps: + +1. Sylpheed ISO (cannot ship via git): + Copy your local copy of + "Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" + into $ROOT_DIR/ + +2. Build xenia-rs: + cd "$REPO_DIR" + cargo build --release + +3. Build xenia-canary Debug (only if you intend to run cross-runtime probes): + cd "$CANARY_DIR" + cmake --preset linux-debug # or whatever invocation your canary tree expects + cmake --build build/ + +4. Regenerate sylpheed.db (analyzer reads XEX from the ISO): + cd "$REPO_DIR" + # Check current --help for exact subcommand; the analysis crates are + # under crates/xenia-analysis/ and crates/xenia-app/src/main.rs + cargo run --release --bin xenia-rs -- analyze \\ + "$ROOT_DIR/Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso" + +5. Switch back to whatever branch you want to continue work on. The audit + history (audit-findings.md + audit-runs/) and migration/ bundle are + already on chore/portable-snapshot. Either: + - keep working on chore/portable-snapshot (simplest), OR + - merge it into master and continue on master. + +6. Start Claude Code. Memory loads automatically when cwd matches: + cd "$REPO_DIR" + claude + Read MEMORY.md first; the last paused audit is AUDIT-058 with + AUDIT-059 recommended in its memory file. + +EOF