chore: add migration/ bundle for cross-machine setup
Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:
- claude-memory/ ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
(103 files, 1.1 MB - MEMORY.md + every
project_xenia_rs_*.md from audits
addis_signext through audit-058)
- project-root/dot-claude/ <project-root>/.claude/settings.json
(Stop hook + permissions)
- project-root/ppc-manual/ <project-root>/ppc-manual/
(PowerPC reference docs, 397 files, 3.7 MB)
- project-root/run-canary.sh <project-root>/run-canary.sh
- README.md Human-readable setup checklist
- setup.sh Idempotent installer (also reclones
xenia-canary at pinned HEAD 6de80dffe)
- MANIFEST.md Per-file mapping + per-file-not-bundled
restoration recipe
Excluded from bundle (not shippable via git):
- Sylpheed ISO (7.8 GB; copyright; manual copy required)
- sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
- target/ build artifacts (rebuild on target)
- audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
- audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
- xenia-canary checkout (setup.sh reclones from
git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
114
migration/project-root/ppc-manual/generator/README.md
Normal file
114
migration/project-root/ppc-manual/generator/README.md
Normal file
@@ -0,0 +1,114 @@
|
||||
# Manual generator
|
||||
|
||||
Python scripts that build the `ppc-manual/` tree from the two
|
||||
authoritative sources in this repository:
|
||||
|
||||
- `xenia-canary/tools/ppc-instructions.xml` — metadata for all 455
|
||||
Xbox 360 PPC instructions (mnemonic, form, group, opcode, in/out
|
||||
fields, disasm template).
|
||||
- `xenia-rs/crates/xenia-cpu/src/` — the Rust interpreter. Individual
|
||||
instruction semantics live in `interpreter.rs` match arms.
|
||||
- `xenia-canary/src/xenia/cpu/ppc/ppc_emit_*.cc` — the C++ emit
|
||||
functions; referenced by line number only.
|
||||
|
||||
## Files
|
||||
|
||||
| File | Purpose |
|
||||
| --- | --- |
|
||||
| `generate_manual.py` | Main entry point. Parses XML, builds families, renders pages, writes `index.json`. |
|
||||
| `xml_model.py` | XML parser + `expand_runtime_variants()` (produces the set of Rc/OE/LK-expanded mnemonics a single XML entry covers). |
|
||||
| `bit_layout.py` | Per-form bit-field tables (rendered into the Encoding section of every page and into `forms/*.md`). |
|
||||
| `rust_scraper.py` | Locates each `PpcOpcode::<mnem>` enum variant, decoder arm, and interpreter match-arm line range. |
|
||||
| `cxx_scraper.py` | Locates `InstrEmit_<mnem>` in the xenia-canary emit `.cc` files. |
|
||||
|
||||
## Running
|
||||
|
||||
```bash
|
||||
python3 ppc-manual/generator/generate_manual.py # full generate
|
||||
python3 ppc-manual/generator/generate_manual.py --dry-run # parse + consistency checks only
|
||||
python3 ppc-manual/generator/generate_manual.py --out /tmp/out # alternate output root
|
||||
python3 ppc-manual/generator/generate_manual.py --xml /path/to/ppc-instructions.xml
|
||||
```
|
||||
|
||||
No third-party dependencies; Python 3.10+ standard library only.
|
||||
|
||||
## Idempotency
|
||||
|
||||
The generator is re-runnable without data loss:
|
||||
|
||||
1. Each page has a pair of sentinel comments:
|
||||
- `<!-- GENERATED: BEGIN -->`
|
||||
- `<!-- GENERATED: END -->`
|
||||
2. On re-run, only the text **between** the sentinels is rewritten.
|
||||
Everything after `END` (Special Cases, Related Instructions, IBM
|
||||
Reference) is preserved verbatim.
|
||||
3. If the `END` sentinel is missing, the generator assumes a reviewer
|
||||
has fully taken over the file and skips it entirely.
|
||||
|
||||
## Consistency checks (enforced by `--dry-run` as well)
|
||||
|
||||
- **XML entry count ≡ 455** — warns if the XML has been modified.
|
||||
- **family membership total ≡ XML entry count** — every XML entry
|
||||
must land in exactly one family.
|
||||
- **index coverage ≡ runtime-expanded mnemonic count** — the JSON
|
||||
index must contain a key for every runtime variant (`add`, `add.`,
|
||||
`addo`, `addo.`, `bclr`, `bclrl`, …).
|
||||
|
||||
## Family grouping rules
|
||||
|
||||
Three rules applied in order (see `_family_head` in
|
||||
`generate_manual.py`):
|
||||
|
||||
1. If a mnemonic ends in `128` and the non-128 sibling exists, it
|
||||
joins the sibling's family. So `vaddfp128` is consolidated into
|
||||
the `vaddfp` page.
|
||||
2. For memory ops (group `m`), trailing `u`, `x`, or `ux` suffixes
|
||||
are stripped when the base exists. So `lwz`, `lwzu`, `lwzx`,
|
||||
`lwzux` all land on the `lwz` page.
|
||||
3. Otherwise the mnemonic is its own family head.
|
||||
|
||||
All other flag variants (`Rc`, `OE`, `LK`) are **runtime** — they are
|
||||
NOT separate XML entries; they are listed in the page's "Assembler
|
||||
Mnemonics" table.
|
||||
|
||||
## Category mapping
|
||||
|
||||
| XML group | Category dir | Notes |
|
||||
| --- | --- | --- |
|
||||
| `i` (integer) | `alu/` | |
|
||||
| `m` (memory) | `memory/` | |
|
||||
| `b` (branch) | `branch/` | Includes `sc` and traps |
|
||||
| `c` (control) | `control/` | CR logical, SPR, sync |
|
||||
| `f` (fpu) | `fpu/` | |
|
||||
| `v` (vector) | `vmx/` or `vmx128/` | Split by form: `VX128*` → `vmx128/` |
|
||||
|
||||
## Extending the generator
|
||||
|
||||
- **Pseudocode seeds.** The `PSEUDOCODE_SEEDS` dict in
|
||||
`generate_manual.py` maps an XML mnemonic to a PPC-style pseudocode
|
||||
block. Add entries here to pre-fill the Operation section for
|
||||
additional mnemonics. Phase 2 reviewers can still override by
|
||||
writing content outside the sentinels.
|
||||
- **C translation seeds.** Similar dict of C snippets keyed by family
|
||||
head.
|
||||
- **Field descriptions.** `FIELD_DESCRIPTIONS` maps XML field names to
|
||||
IBM-style prose. Missing entries are marked "_Phase 2: document
|
||||
this field._"
|
||||
|
||||
## Known limitations
|
||||
|
||||
- Extended-opcode extraction in `xml_model.Instruction.extended_opcode`
|
||||
is best-effort per form. For VMX128 variants the extracted value may
|
||||
not match the exact pattern used by xenia's decoder tree — the page
|
||||
still shows it as a reference but the decoder source (linked on
|
||||
every page) is authoritative.
|
||||
- `rust_scraper` uses a naive brace counter to delimit interpreter
|
||||
match arms. It works for the current interpreter because the match
|
||||
arms use balanced braces and no string literals with unbalanced
|
||||
braces. If the interpreter ever adopts such literals the scraper
|
||||
will need a Rust-aware parser.
|
||||
- The generator treats mnemonics ending in `x` as xenia convention
|
||||
("extended/XO form") and strips them for assembly display — except
|
||||
for the memory group, where `x` is the natural indexed-form suffix.
|
||||
If future xenia XML adds a new group where `x` is structural, the
|
||||
heuristic in `xml_model.expand_runtime_variants` needs updating.
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
266
migration/project-root/ppc-manual/generator/bit_layout.py
Normal file
266
migration/project-root/ppc-manual/generator/bit_layout.py
Normal file
@@ -0,0 +1,266 @@
|
||||
"""
|
||||
Canonical bit layout per PPC instruction form.
|
||||
|
||||
Tables derived from xenia-canary/src/xenia/cpu/ppc/ppc_instr.h (struct
|
||||
PPCOpcodeBits union). In PPC notation bit 0 is the MSB of the 32-bit
|
||||
word (big-endian bit numbering).
|
||||
|
||||
Each entry is a list of (bit_start, bit_end_inclusive, field_name,
|
||||
notes) tuples laid out from MSB (bit 0) to LSB (bit 31).
|
||||
"""
|
||||
|
||||
|
||||
# NOTE: bit ranges use PPC big-endian numbering (0 = MSB, 31 = LSB).
|
||||
|
||||
FORM_LAYOUTS: dict[str, list[tuple[int, int, str, str]]] = {
|
||||
"I": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 29, "LI", "signed 24-bit word-offset target"),
|
||||
(30, 30, "AA", "absolute-address flag"),
|
||||
(31, 31, "LK", "link flag (bl/ba/bla)"),
|
||||
],
|
||||
"B": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 10, "BO", "branch options"),
|
||||
(11, 15, "BI", "CR bit to test"),
|
||||
(16, 29, "BD", "signed 14-bit word-offset target"),
|
||||
(30, 30, "AA", "absolute-address flag"),
|
||||
(31, 31, "LK", "link flag"),
|
||||
],
|
||||
"SC": [
|
||||
(0, 5, "OPCD", "primary opcode (17)"),
|
||||
(6, 19, "—", "reserved"),
|
||||
(20, 26, "LEV", "exception level"),
|
||||
(27, 29, "—", "reserved"),
|
||||
(30, 30, "1", "fixed 1"),
|
||||
(31, 31, "—", "reserved"),
|
||||
],
|
||||
"D": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 10, "RT", "destination GPR (or RS when storing)"),
|
||||
(11, 15, "RA", "source GPR (0 ⇒ literal 0 for RA0 forms)"),
|
||||
(16, 31, "D/SI/UI", "16-bit signed or unsigned immediate"),
|
||||
],
|
||||
"DS": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 10, "RT", "destination GPR (or RS)"),
|
||||
(11, 15, "RA", "source GPR (0 ⇒ literal 0)"),
|
||||
(16, 29, "DS", "14-bit signed word-scaled displacement"),
|
||||
(30, 31, "XO", "extended opcode"),
|
||||
],
|
||||
"X": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 10, "RT/FRT/VRT", "destination"),
|
||||
(11, 15, "RA/FRA/VRA", "source A"),
|
||||
(16, 20, "RB/FRB/VRB", "source B"),
|
||||
(21, 30, "XO", "extended opcode (10 bits)"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"XL": [
|
||||
(0, 5, "OPCD", "primary opcode (19)"),
|
||||
(6, 10, "BT/BO", "target / branch options"),
|
||||
(11, 15, "BA/BI", "source A / CR bit to test"),
|
||||
(16, 20, "BB", "source B"),
|
||||
(21, 30, "XO", "extended opcode (10 bits)"),
|
||||
(31, 31, "LK", "link flag"),
|
||||
],
|
||||
"XFX": [
|
||||
(0, 5, "OPCD", "primary opcode (31)"),
|
||||
(6, 10, "RT", "destination / source GPR"),
|
||||
(11, 20, "spr/tbr/FXM", "SPR/TBR number (byte-swapped halves) or CR field mask"),
|
||||
(21, 30, "XO", "extended opcode"),
|
||||
(31, 31, "—", "reserved"),
|
||||
],
|
||||
"XFL": [
|
||||
(0, 5, "OPCD", "primary opcode (63)"),
|
||||
(6, 6, "L", "field-select behaviour"),
|
||||
(7, 14, "FM", "FPSCR field mask"),
|
||||
(15, 15, "W", "immediate-value flag"),
|
||||
(16, 20, "FRB", "source FPR"),
|
||||
(21, 30, "XO", "extended opcode"),
|
||||
(31, 31, "Rc", "record-form flag (updates CR1)"),
|
||||
],
|
||||
"XS": [
|
||||
(0, 5, "OPCD", "primary opcode (31)"),
|
||||
(6, 10, "RS", "source GPR"),
|
||||
(11, 15, "RA", "destination GPR"),
|
||||
(16, 20, "sh", "shift amount low 5 bits"),
|
||||
(21, 29, "XO", "extended opcode (9 bits)"),
|
||||
(30, 30, "sh5", "shift amount high bit"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"XO": [
|
||||
(0, 5, "OPCD", "primary opcode (31)"),
|
||||
(6, 10, "RT", "destination GPR"),
|
||||
(11, 15, "RA", "source A"),
|
||||
(16, 20, "RB", "source B"),
|
||||
(21, 21, "OE", "overflow-enable flag"),
|
||||
(22, 30, "XO", "extended opcode (9 bits)"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"A": [
|
||||
(0, 5, "OPCD", "primary opcode (59 or 63)"),
|
||||
(6, 10, "FRT", "destination FPR"),
|
||||
(11, 15, "FRA", "source A FPR"),
|
||||
(16, 20, "FRB", "source B FPR"),
|
||||
(21, 25, "FRC", "source C FPR (multiplier for madd-style ops)"),
|
||||
(26, 30, "XO", "extended opcode (5 bits)"),
|
||||
(31, 31, "Rc", "record-form flag (updates CR1)"),
|
||||
],
|
||||
"M": [
|
||||
(0, 5, "OPCD", "primary opcode"),
|
||||
(6, 10, "RS", "source GPR"),
|
||||
(11, 15, "RA", "destination GPR"),
|
||||
(16, 20, "SH/RB", "shift amount or source B"),
|
||||
(21, 25, "MB", "mask begin"),
|
||||
(26, 30, "ME", "mask end"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"MD": [
|
||||
(0, 5, "OPCD", "primary opcode (30)"),
|
||||
(6, 10, "RS", "source GPR"),
|
||||
(11, 15, "RA", "destination GPR"),
|
||||
(16, 20, "sh", "shift amount low 5 bits"),
|
||||
(21, 26, "mb/me", "6-bit mask field (swapped halves)"),
|
||||
(27, 29, "XO", "extended opcode"),
|
||||
(30, 30, "sh5", "shift amount high bit"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"MDS": [
|
||||
(0, 5, "OPCD", "primary opcode (30)"),
|
||||
(6, 10, "RS", "source GPR"),
|
||||
(11, 15, "RA", "destination GPR"),
|
||||
(16, 20, "RB", "source B GPR"),
|
||||
(21, 26, "mb/me", "6-bit mask field (swapped halves)"),
|
||||
(27, 30, "XO", "extended opcode"),
|
||||
(31, 31, "Rc", "record-form flag"),
|
||||
],
|
||||
"DCBZ": [
|
||||
(0, 5, "OPCD", "primary opcode (31)"),
|
||||
(6, 10, "—", "reserved"),
|
||||
(11, 15, "RA", "base register (0 ⇒ literal 0)"),
|
||||
(16, 20, "RB", "offset register"),
|
||||
(21, 30, "XO", "extended opcode (1014 for dcbz / 1010 for dcbz128)"),
|
||||
(31, 31, "—", "reserved"),
|
||||
],
|
||||
"VX": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VRT/VD", "destination vector register"),
|
||||
(11, 15, "VRA/VA", "source A vector register"),
|
||||
(16, 20, "VRB/VB", "source B vector register"),
|
||||
(21, 31, "XO", "extended opcode (11 bits)"),
|
||||
],
|
||||
"VA": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VRT", "destination vector register"),
|
||||
(11, 15, "VRA", "source A"),
|
||||
(16, 20, "VRB", "source B"),
|
||||
(21, 25, "VRC", "source C / shift"),
|
||||
(26, 31, "XO", "extended opcode (6 bits)"),
|
||||
],
|
||||
"VC": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VRT", "destination vector register"),
|
||||
(11, 15, "VRA", "source A"),
|
||||
(16, 20, "VRB", "source B"),
|
||||
(21, 21, "Rc", "record-form flag (updates CR6)"),
|
||||
(22, 31, "XO", "extended opcode (10 bits)"),
|
||||
],
|
||||
"VX128": [
|
||||
(0, 5, "OPCD", "primary opcode (4 or 5)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "VA128l", "source A low 5 bits"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 21, "VA128H", "source A high bit"),
|
||||
(22, 22, "—", "reserved"),
|
||||
(23, 25, "VC", "optional VC / XO sub-field"),
|
||||
(26, 26, "VA128h", "source A middle bit"),
|
||||
(27, 27, "—", "reserved"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_1": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "RA", "address register"),
|
||||
(16, 20, "RB", "offset register"),
|
||||
(21, 27, "XO", "extended opcode"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "—", "reserved"),
|
||||
],
|
||||
"VX128_2": [
|
||||
(0, 5, "OPCD", "primary opcode (5)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "VA128l", "source A low 5 bits"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 21, "VA128H", "source A high bit"),
|
||||
(23, 25, "VC", "source C 3-bit field"),
|
||||
(26, 26, "VA128h", "source A middle bit"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_3": [
|
||||
(0, 5, "OPCD", "primary opcode (6)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "IMM", "5-bit immediate"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 27, "XO", "extended opcode"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_4": [
|
||||
(0, 5, "OPCD", "primary opcode (6)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "IMM", "5-bit immediate"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 23, "XO", "extended opcode"),
|
||||
(24, 25, "z", "sub-operation selector"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_5": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "VA128l", "source A low 5 bits"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 21, "VA128H", "source A high bit"),
|
||||
(22, 25, "SH", "4-bit shift amount"),
|
||||
(26, 26, "VA128h", "source A middle bit"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_P": [
|
||||
(0, 5, "OPCD", "primary opcode (6)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "PERMl", "permute selector low 5 bits"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 22, "—", "reserved"),
|
||||
(23, 25, "PERMh", "permute selector high 3 bits"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
"VX128_R": [
|
||||
(0, 5, "OPCD", "primary opcode (4)"),
|
||||
(6, 10, "VD128l", "destination low 5 bits"),
|
||||
(11, 15, "VA128l", "source A low 5 bits"),
|
||||
(16, 20, "VB128l", "source B low 5 bits"),
|
||||
(21, 21, "VA128H", "source A high bit"),
|
||||
(22, 25, "XO", "extended opcode (compare)"),
|
||||
(26, 26, "VA128h", "source A middle bit"),
|
||||
(27, 27, "Rc", "record-form flag (updates CR6)"),
|
||||
(28, 29, "VD128h", "destination high 2 bits"),
|
||||
(30, 31, "VB128h", "source B high 2 bits"),
|
||||
],
|
||||
}
|
||||
|
||||
|
||||
def render_bit_table(form: str) -> str:
|
||||
"""Return a markdown table of the form's bit layout."""
|
||||
layout = FORM_LAYOUTS.get(form)
|
||||
if not layout:
|
||||
return f"_Unknown form_ `{form}` _— see `forms/` for details._"
|
||||
rows = ["| Bits | Field | Meaning |", "| --- | --- | --- |"]
|
||||
for start, end, name, notes in layout:
|
||||
span = f"{start}" if start == end else f"{start}–{end}"
|
||||
rows.append(f"| {span} | `{name}` | {notes} |")
|
||||
return "\n".join(rows)
|
||||
75
migration/project-root/ppc-manual/generator/cxx_scraper.py
Normal file
75
migration/project-root/ppc-manual/generator/cxx_scraper.py
Normal file
@@ -0,0 +1,75 @@
|
||||
"""
|
||||
Scrapes xenia-canary's emit files for the location of each instruction's
|
||||
semantic implementation function `InstrEmit_<mnem>`.
|
||||
|
||||
The files are:
|
||||
src/xenia/cpu/ppc/ppc_emit_alu.cc (integer ALU)
|
||||
src/xenia/cpu/ppc/ppc_emit_memory.cc (loads/stores/cache/sync)
|
||||
src/xenia/cpu/ppc/ppc_emit_altivec.cc (VMX + VMX128)
|
||||
src/xenia/cpu/ppc/ppc_emit_fpu.cc (floating-point)
|
||||
src/xenia/cpu/ppc/ppc_emit_control.cc (branch/CR/SPR/syscall/trap)
|
||||
|
||||
Returns, for each mnemonic, the relative file path and the starting line
|
||||
of the `int InstrEmit_<mnem>(...)` definition.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
import re
|
||||
|
||||
|
||||
CXX_EMIT_FILES = [
|
||||
"src/xenia/cpu/ppc/ppc_emit_alu.cc",
|
||||
"src/xenia/cpu/ppc/ppc_emit_memory.cc",
|
||||
"src/xenia/cpu/ppc/ppc_emit_altivec.cc",
|
||||
"src/xenia/cpu/ppc/ppc_emit_fpu.cc",
|
||||
"src/xenia/cpu/ppc/ppc_emit_control.cc",
|
||||
]
|
||||
|
||||
|
||||
@dataclass
|
||||
class CxxRef:
|
||||
mnem: str
|
||||
emit_file: str | None = None # relative to xenia-canary/
|
||||
emit_line: int | None = None
|
||||
|
||||
|
||||
def _cxx_ident(mnem: str) -> str:
|
||||
"""Canary maps '.' in the mnemonic to a trailing 'x' in the C++ symbol
|
||||
(e.g. addic. → InstrEmit_addicx)."""
|
||||
return mnem.replace(".", "x")
|
||||
|
||||
|
||||
class CxxScraper:
|
||||
def __init__(self, repo_root: Path):
|
||||
self.canary_root = repo_root / "xenia-canary"
|
||||
self._index: dict[str, tuple[str, int]] = {}
|
||||
fn_pat = re.compile(r"^\s*int\s+InstrEmit_([A-Za-z_][A-Za-z0-9_]*)\s*\(")
|
||||
for rel in CXX_EMIT_FILES:
|
||||
path = self.canary_root / rel
|
||||
if not path.is_file():
|
||||
continue
|
||||
for i, line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
|
||||
m = fn_pat.match(line)
|
||||
if not m:
|
||||
continue
|
||||
name = m.group(1)
|
||||
self._index.setdefault(name, (rel, i))
|
||||
|
||||
def lookup(self, mnem: str) -> CxxRef:
|
||||
ident = _cxx_ident(mnem)
|
||||
hit = self._index.get(ident)
|
||||
if hit is None:
|
||||
return CxxRef(mnem=mnem)
|
||||
return CxxRef(mnem=mnem, emit_file=hit[0], emit_line=hit[1])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
root = Path(__file__).resolve().parent.parent.parent
|
||||
s = CxxScraper(root)
|
||||
for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp",
|
||||
"vaddfp128", "faddx", "lvsl"):
|
||||
r = s.lookup(m)
|
||||
print(f"{m:12s} {r.emit_file}:{r.emit_line}")
|
||||
1093
migration/project-root/ppc-manual/generator/generate_manual.py
Normal file
1093
migration/project-root/ppc-manual/generator/generate_manual.py
Normal file
File diff suppressed because it is too large
Load Diff
184
migration/project-root/ppc-manual/generator/rust_scraper.py
Normal file
184
migration/project-root/ppc-manual/generator/rust_scraper.py
Normal file
@@ -0,0 +1,184 @@
|
||||
"""
|
||||
Scrapes xenia-rs source files for per-instruction references and
|
||||
snippets of the interpreter semantics.
|
||||
|
||||
Outputs produced for each mnemonic:
|
||||
- opcode_line: line in crates/xenia-cpu/src/opcode.rs where the
|
||||
PpcOpcode variant is declared (1-indexed)
|
||||
- decoder_line: line in crates/xenia-cpu/src/decoder.rs where the
|
||||
variant is produced from raw bits
|
||||
- interp_start: line in crates/xenia-cpu/src/interpreter.rs where
|
||||
the match arm `PpcOpcode::<mnem> =>` begins
|
||||
- interp_end: line where the arm closes (matching brace, naive)
|
||||
- interp_body: raw text of the arm body (for reviewer reference)
|
||||
|
||||
The xenia-rs opcode identifier often has trailing `x` preserved
|
||||
(PpcOpcode::addx) — this scraper matches on the XML mnemonic directly
|
||||
plus a stripped alternative without trailing 'x' and the xenia-style
|
||||
identifier forms.
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
from dataclasses import dataclass
|
||||
from pathlib import Path
|
||||
import re
|
||||
|
||||
|
||||
@dataclass
|
||||
class RustRef:
|
||||
mnem: str
|
||||
opcode_line: int | None = None
|
||||
decoder_line: int | None = None
|
||||
interp_start: int | None = None
|
||||
interp_end: int | None = None
|
||||
interp_body: str = ""
|
||||
|
||||
|
||||
# PpcOpcode identifiers in xenia-rs match the XML mnemonic *exactly* except
|
||||
# that '.' is illegal in Rust identifiers. Mnemonics ending in '.' appear as
|
||||
# a trailing 'x' replacement in some cases but the codebase seems to keep the
|
||||
# XML name verbatim (e.g. addic. → addicx OR addic_). Check the codebase.
|
||||
|
||||
|
||||
def _rust_ident(mnem: str) -> str:
|
||||
"""Convert XML mnemonic to the xenia-rs PpcOpcode variant name."""
|
||||
# Xenia-rs uses the same name as xenia-canary's opcode enum, which
|
||||
# mirrors ppc-instructions.xml directly. '.' is replaced with 'x' in
|
||||
# the opcode enum (e.g. 'addic.' → 'addicx'), but the XML entry is
|
||||
# already 'addic.'. We only need to handle that single case.
|
||||
return mnem.replace(".", "x")
|
||||
|
||||
|
||||
class RustScraper:
|
||||
def __init__(self, repo_root: Path):
|
||||
self.repo_root = repo_root
|
||||
self.cpu_root = repo_root / "xenia-rs" / "crates" / "xenia-cpu" / "src"
|
||||
self._opcode_lines = self._read_lines(self.cpu_root / "opcode.rs")
|
||||
self._decoder_lines = self._read_lines(self.cpu_root / "decoder.rs")
|
||||
self._interp_lines = self._read_lines(self.cpu_root / "interpreter.rs")
|
||||
self._opcode_index: dict[str, int] = self._index_opcode_enum()
|
||||
self._decoder_index: dict[str, int] = self._index_decoder()
|
||||
self._interp_index: dict[str, tuple[int, int]] = self._index_interpreter()
|
||||
|
||||
@staticmethod
|
||||
def _read_lines(path: Path) -> list[str]:
|
||||
if not path.is_file():
|
||||
return []
|
||||
return path.read_text(encoding="utf-8").splitlines()
|
||||
|
||||
def _index_opcode_enum(self) -> dict[str, int]:
|
||||
"""Map rust-identifier → 1-indexed line in opcode.rs. The enum uses
|
||||
comma-separated identifiers (often many per line) so we extract
|
||||
every identifier match inside the enum body."""
|
||||
idx: dict[str, int] = {}
|
||||
token = re.compile(r"\b([A-Za-z_][A-Za-z0-9_]*)\b")
|
||||
in_enum = False
|
||||
for i, line in enumerate(self._opcode_lines, start=1):
|
||||
if "pub enum PpcOpcode" in line:
|
||||
in_enum = True
|
||||
continue
|
||||
if not in_enum:
|
||||
continue
|
||||
if line.startswith("}"):
|
||||
break
|
||||
stripped = line.strip()
|
||||
# skip blank / comment-only lines
|
||||
if not stripped or stripped.startswith("//"):
|
||||
continue
|
||||
# split off any trailing line comment
|
||||
code = stripped.split("//", 1)[0]
|
||||
for m in token.finditer(code):
|
||||
idx.setdefault(m.group(1), i)
|
||||
return idx
|
||||
|
||||
def _index_decoder(self) -> dict[str, int]:
|
||||
"""Map rust-identifier → 1-indexed line of its `PpcOpcode::<name>` producer."""
|
||||
idx: dict[str, int] = {}
|
||||
pat = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)")
|
||||
for i, line in enumerate(self._decoder_lines, start=1):
|
||||
for m in pat.finditer(line):
|
||||
name = m.group(1)
|
||||
# keep the FIRST occurrence (the match-arm line where it's
|
||||
# produced, not any later references)
|
||||
idx.setdefault(name, i)
|
||||
return idx
|
||||
|
||||
def _index_interpreter(self) -> dict[str, tuple[int, int]]:
|
||||
"""Map rust-identifier → (start, end) lines of the match arm.
|
||||
|
||||
An arm starts at `PpcOpcode::<name>` and ends at the closing `}`
|
||||
at the same indentation level. We accept multi-variant arms of
|
||||
the form `PpcOpcode::a | PpcOpcode::b => {` by recording the same
|
||||
(start, end) for every named variant.
|
||||
"""
|
||||
arm_header = re.compile(r"^(\s*)((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*\s*\|\s*)*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)\s*=>\s*\{?\s*$")
|
||||
# Some arms use no leading whitespace quirks — adjusted regex:
|
||||
arm_header = re.compile(
|
||||
r"^(\s*)" # indent
|
||||
r"((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*" # first variant
|
||||
r"(?:\s*\|\s*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)*))" # more variants
|
||||
r"\s*=>\s*\{?\s*$"
|
||||
)
|
||||
var_re = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)")
|
||||
idx: dict[str, tuple[int, int]] = {}
|
||||
i = 0
|
||||
n = len(self._interp_lines)
|
||||
while i < n:
|
||||
line = self._interp_lines[i]
|
||||
m = arm_header.match(line)
|
||||
if not m:
|
||||
i += 1
|
||||
continue
|
||||
indent = m.group(1)
|
||||
names = var_re.findall(m.group(2))
|
||||
# Find the closing '}' at the same indentation. The arm body
|
||||
# starts on line i (which ends with '{') and ends at a line
|
||||
# whose content (after `indent`) is '}' (with optional trailing
|
||||
# comma).
|
||||
start = i + 1 # 1-indexed
|
||||
end = start
|
||||
j = i + 1
|
||||
depth = 1 if line.rstrip().endswith("{") else 0
|
||||
if depth == 0:
|
||||
# Single-expression arm like `... => foo(),` — treat the line
|
||||
# itself as start=end.
|
||||
end = start
|
||||
j = i + 1
|
||||
else:
|
||||
while j < n:
|
||||
l = self._interp_lines[j]
|
||||
# A naive brace counter suffices for this codebase — the
|
||||
# interpreter arms use balanced braces and no string
|
||||
# literals containing stray braces.
|
||||
depth += l.count("{") - l.count("}")
|
||||
if depth == 0:
|
||||
end = j + 1 # 1-indexed
|
||||
break
|
||||
j += 1
|
||||
for name in names:
|
||||
idx.setdefault(name, (start, end))
|
||||
i = j + 1
|
||||
return idx
|
||||
|
||||
def lookup(self, mnem: str) -> RustRef:
|
||||
ident = _rust_ident(mnem)
|
||||
ref = RustRef(mnem=mnem)
|
||||
ref.opcode_line = self._opcode_index.get(ident)
|
||||
ref.decoder_line = self._decoder_index.get(ident)
|
||||
rng = self._interp_index.get(ident)
|
||||
if rng:
|
||||
ref.interp_start, ref.interp_end = rng
|
||||
body = "\n".join(self._interp_lines[ref.interp_start - 1: ref.interp_end])
|
||||
ref.interp_body = body
|
||||
return ref
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
root = Path(__file__).resolve().parent.parent.parent
|
||||
s = RustScraper(root)
|
||||
for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp",
|
||||
"vaddfp128", "faddx", "lvsl"):
|
||||
r = s.lookup(m)
|
||||
print(f"{m:12s} opcode@{r.opcode_line} decoder@{r.decoder_line} "
|
||||
f"interp@{r.interp_start}-{r.interp_end}")
|
||||
231
migration/project-root/ppc-manual/generator/xml_model.py
Normal file
231
migration/project-root/ppc-manual/generator/xml_model.py
Normal file
@@ -0,0 +1,231 @@
|
||||
"""
|
||||
Parses xenia-canary's tools/ppc-instructions.xml into typed records.
|
||||
|
||||
The XML is the authoritative catalogue of Xbox 360 PPC instructions
|
||||
(455 <insn> entries). Each entry carries:
|
||||
- mnem: mnemonic (e.g. "addx", "lwzu", "vaddfp128")
|
||||
- opcode: 32-bit hex encoding (primary + extended opcode bits)
|
||||
- form: instruction format (XO, D, DS, X, XL, XFX, ..., VX, VX128_*)
|
||||
- group: functional group (i=integer, m=memory, b=branch,
|
||||
c=control, f=fpu, v=vmx)
|
||||
- desc: short human-readable description
|
||||
- <in>/<out> fields with optional conditional="true" flag
|
||||
- <disasm>: template string used by the canary disassembler
|
||||
"""
|
||||
|
||||
from __future__ import annotations
|
||||
|
||||
import xml.etree.ElementTree as ET
|
||||
from dataclasses import dataclass, field
|
||||
from pathlib import Path
|
||||
|
||||
|
||||
GROUP_NAMES = {
|
||||
"i": "integer",
|
||||
"m": "memory",
|
||||
"b": "branch",
|
||||
"c": "control",
|
||||
"f": "fpu",
|
||||
"v": "vmx",
|
||||
}
|
||||
|
||||
# Maps the short group code to the manual's on-disk category directory.
|
||||
# VMX entries are split by form in generate_manual.py (VX128_* → vmx128/).
|
||||
GROUP_TO_DIR = {
|
||||
"i": "alu",
|
||||
"m": "memory",
|
||||
"b": "branch",
|
||||
"c": "control",
|
||||
"f": "fpu",
|
||||
"v": "vmx",
|
||||
}
|
||||
|
||||
|
||||
@dataclass
|
||||
class Field:
|
||||
name: str
|
||||
conditional: bool = False
|
||||
|
||||
|
||||
@dataclass
|
||||
class Instruction:
|
||||
mnem: str
|
||||
opcode_hex: str # lowercase, no "0x" prefix
|
||||
form: str
|
||||
group: str # one-letter code
|
||||
desc: str
|
||||
sync: bool
|
||||
reads: list[Field] = field(default_factory=list)
|
||||
writes: list[Field] = field(default_factory=list)
|
||||
disasm: str = ""
|
||||
|
||||
@property
|
||||
def opcode_int(self) -> int:
|
||||
return int(self.opcode_hex, 16)
|
||||
|
||||
@property
|
||||
def primary_opcode(self) -> int:
|
||||
# PPC: bits 0-5 of a big-endian 32-bit word are the top 6 bits.
|
||||
return (self.opcode_int >> 26) & 0x3F
|
||||
|
||||
@property
|
||||
def extended_opcode(self) -> int | None:
|
||||
"""Best-effort extended opcode extraction by form.
|
||||
Returns None for forms where "extended opcode" is not meaningful
|
||||
(I, B, D, DS, SC, M, MD, MDS, DCBZ) — those pages will omit it."""
|
||||
code = self.opcode_int
|
||||
form = self.form
|
||||
if form in ("X", "XL", "XFX", "XFL", "XS", "DCBZ"):
|
||||
return (code >> 1) & 0x3FF # bits 21-30
|
||||
if form == "XO":
|
||||
return (code >> 1) & 0x1FF # bits 22-30 (bit 21 = OE)
|
||||
if form == "A":
|
||||
return (code >> 1) & 0x1F # bits 26-30
|
||||
if form in ("VX", "VX128_2", "VX128_5"):
|
||||
return code & 0x7FF # bits 21-31
|
||||
if form == "VA":
|
||||
return code & 0x3F # bits 26-31
|
||||
if form == "VC":
|
||||
return code & 0x3FF # bits 22-31 (bit 21 = Rc)
|
||||
if form in ("VX128", "VX128_R"):
|
||||
# complex split; best-effort — not used for lookup, just display
|
||||
return code & 0x7FF
|
||||
if form in ("VX128_1", "VX128_3", "VX128_4", "VX128_P"):
|
||||
return code & 0x7FF
|
||||
return None
|
||||
|
||||
@property
|
||||
def group_name(self) -> str:
|
||||
return GROUP_NAMES.get(self.group, "unknown")
|
||||
|
||||
@property
|
||||
def has_rc(self) -> bool:
|
||||
"""Does this instruction have a runtime Rc bit (record form)?"""
|
||||
return any(w.name == "CR" and w.conditional for w in self.writes)
|
||||
|
||||
@property
|
||||
def has_oe(self) -> bool:
|
||||
"""Does this instruction have a runtime OE bit (overflow enable)?"""
|
||||
return any(w.name == "OE" and w.conditional for w in self.writes)
|
||||
|
||||
@property
|
||||
def has_lk(self) -> bool:
|
||||
"""Does this instruction have a runtime LK bit (branch link)?"""
|
||||
return any(r.name == "LK" for r in self.reads)
|
||||
|
||||
@property
|
||||
def rc_is_mandatory(self) -> bool:
|
||||
"""Instructions like `addic.` where CR is written unconditionally."""
|
||||
return any(w.name == "CR" and not w.conditional for w in self.writes)
|
||||
|
||||
|
||||
def load_instructions(xml_path: Path | str) -> list[Instruction]:
|
||||
tree = ET.parse(str(xml_path))
|
||||
root = tree.getroot()
|
||||
insns: list[Instruction] = []
|
||||
for node in root.iter("insn"):
|
||||
reads = [Field(x.get("field", ""), x.get("conditional") == "true")
|
||||
for x in node.findall("in")]
|
||||
writes = [Field(x.get("field", ""), x.get("conditional") == "true")
|
||||
for x in node.findall("out")]
|
||||
disasm_node = node.find("disasm")
|
||||
disasm = (disasm_node.text or "").strip() if disasm_node is not None else ""
|
||||
insns.append(Instruction(
|
||||
mnem=node.get("mnem", ""),
|
||||
opcode_hex=node.get("opcode", "").lower(),
|
||||
form=node.get("form", ""),
|
||||
group=node.get("group", ""),
|
||||
desc=node.get("desc", ""),
|
||||
sync=node.get("sync") == "true",
|
||||
reads=reads,
|
||||
writes=writes,
|
||||
disasm=disasm,
|
||||
))
|
||||
return insns
|
||||
|
||||
|
||||
def expand_runtime_variants(insn: Instruction) -> list[dict]:
|
||||
"""
|
||||
Return the set of concrete assembly mnemonics this XML entry represents
|
||||
under different runtime flag settings. Flags: Rc (record) → append '.',
|
||||
OE (overflow) → insert 'o' before any '.', LK (link) → append 'l'.
|
||||
|
||||
The display mnemonic is derived from the XML mnem by stripping a trailing
|
||||
'x' if present (xenia uses trailing x to mark X/XO form entries; the
|
||||
assembly mnemonic omits it). Mnemonics ending in '.' or digits are kept.
|
||||
"""
|
||||
raw = insn.mnem
|
||||
# Xenia convention: trailing 'x' on XO/X/A/M/MD/MDS/XFL/XS/VX/VA form
|
||||
# marks "extended form" but is dropped in assembly display.
|
||||
# Keep trailing x for: memory indexed forms (lbzx, lwzx, ...), which are
|
||||
# separate XML entries — those should not have their x stripped.
|
||||
# We use the group code to decide: group=i / group=f / group=c /
|
||||
# form family VX*/VA/VC → strip trailing x. group=m / group=b → keep.
|
||||
def strip_x(m: str) -> str:
|
||||
if not m.endswith("x"):
|
||||
return m
|
||||
# Memory mnemonics: 'x' is part of the assembly name (indexed form).
|
||||
if insn.group == "m":
|
||||
return m
|
||||
# Branch: bx/bcx/bcctrx/bclrx — xenia's trailing x, strip.
|
||||
return m[:-1]
|
||||
|
||||
base = strip_x(raw)
|
||||
variants: list[dict] = []
|
||||
|
||||
if insn.rc_is_mandatory:
|
||||
# e.g. addic. — already has the dot baked in
|
||||
variants.append({"mnem": raw, "flags": {}, "is_primary": True})
|
||||
return variants
|
||||
|
||||
has_rc = insn.has_rc
|
||||
has_oe = insn.has_oe
|
||||
has_lk = insn.has_lk
|
||||
|
||||
if not (has_rc or has_oe or has_lk):
|
||||
variants.append({"mnem": base, "flags": {}, "is_primary": True})
|
||||
return variants
|
||||
|
||||
# Enumerate all combinations of the runtime flags that apply.
|
||||
def insert_o(name: str) -> str:
|
||||
# 'addo' / 'addo.' — insert 'o' before any trailing '.'
|
||||
if name.endswith("."):
|
||||
return name[:-1] + "o."
|
||||
return name + "o"
|
||||
|
||||
combos: list[tuple[str, dict]] = [(base, {})]
|
||||
if has_oe:
|
||||
combos += [(insert_o(n), {**f, "OE": 1}) for (n, f) in combos]
|
||||
if has_rc:
|
||||
combos += [(n + ".", {**f, "Rc": 1}) for (n, f) in combos]
|
||||
if has_lk:
|
||||
# Branch link: append 'l' AFTER any trailing dot? PPC convention:
|
||||
# bl, bcl, bclrl, bcctrl — 'l' is appended at the end of the base
|
||||
# mnemonic with no dot (branches don't have Rc). Add the l-variant
|
||||
# only when OE/Rc weren't applied.
|
||||
combos += [(n + "l", {**f, "LK": 1}) for (n, f) in combos if "Rc" not in f and "OE" not in f]
|
||||
|
||||
for i, (name, flags) in enumerate(combos):
|
||||
variants.append({"mnem": name, "flags": flags, "is_primary": i == 0})
|
||||
return variants
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
# Smoke test: print summary of what we loaded.
|
||||
import sys
|
||||
repo_root = Path(__file__).resolve().parent.parent.parent
|
||||
xml = repo_root / "xenia-canary" / "tools" / "ppc-instructions.xml"
|
||||
insns = load_instructions(xml)
|
||||
print(f"Loaded {len(insns)} instructions from {xml}")
|
||||
total_mnems = sum(len(expand_runtime_variants(i)) for i in insns)
|
||||
print(f"Total runtime-expanded mnemonics: {total_mnems}")
|
||||
# show 5 examples
|
||||
for mnem in ("addx", "lwz", "bclrx", "mfspr", "stvx", "vaddfp", "vaddfp128", "addic."):
|
||||
for i in insns:
|
||||
if i.mnem == mnem:
|
||||
vs = expand_runtime_variants(i)
|
||||
print(f" {mnem:12s} form={i.form:7s} group={i.group} "
|
||||
f"variants={[v['mnem'] for v in vs]}")
|
||||
break
|
||||
else:
|
||||
print(f" {mnem:12s} NOT FOUND")
|
||||
Reference in New Issue
Block a user