chore: add migration/ bundle for cross-machine setup

Bundles state that lives OUTSIDE the xenia-rs repo so a fresh clone on
another machine can be brought up to identical configuration via
migration/setup.sh:

  - claude-memory/             ~/.claude/projects/-home-fabi-RE-Project-Sylpheed/memory/
                               (103 files, 1.1 MB - MEMORY.md + every
                                project_xenia_rs_*.md from audits
                                addis_signext through audit-058)
  - project-root/dot-claude/   <project-root>/.claude/settings.json
                               (Stop hook + permissions)
  - project-root/ppc-manual/   <project-root>/ppc-manual/
                               (PowerPC reference docs, 397 files, 3.7 MB)
  - project-root/run-canary.sh <project-root>/run-canary.sh
  - README.md                  Human-readable setup checklist
  - setup.sh                   Idempotent installer (also reclones
                               xenia-canary at pinned HEAD 6de80dffe)
  - MANIFEST.md                Per-file mapping + per-file-not-bundled
                               restoration recipe

Excluded from bundle (not shippable via git):
  - Sylpheed ISO (7.8 GB; copyright; manual copy required)
  - sylpheed.db (395 MB; regenerable from XEX via analysis tooling)
  - target/ build artifacts (rebuild on target)
  - audit-runs probe firehoses (.log/.stdout/.stderr ~11 GB; rerun if needed)
  - audit-runs memory dumps (.bin ~4.5 GB; rerun audit-026/027/029 if needed)
  - xenia-canary checkout (setup.sh reclones from
    git.mc02.dev/fabi/Xenia-Canary.git at HEAD 6de80dffe)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-10 21:38:38 +02:00
parent 8e709b0a24
commit e6d43a23ac
505 changed files with 86028 additions and 0 deletions

View File

@@ -0,0 +1,114 @@
# Manual generator
Python scripts that build the `ppc-manual/` tree from the two
authoritative sources in this repository:
- `xenia-canary/tools/ppc-instructions.xml` — metadata for all 455
Xbox 360 PPC instructions (mnemonic, form, group, opcode, in/out
fields, disasm template).
- `xenia-rs/crates/xenia-cpu/src/` — the Rust interpreter. Individual
instruction semantics live in `interpreter.rs` match arms.
- `xenia-canary/src/xenia/cpu/ppc/ppc_emit_*.cc` — the C++ emit
functions; referenced by line number only.
## Files
| File | Purpose |
| --- | --- |
| `generate_manual.py` | Main entry point. Parses XML, builds families, renders pages, writes `index.json`. |
| `xml_model.py` | XML parser + `expand_runtime_variants()` (produces the set of Rc/OE/LK-expanded mnemonics a single XML entry covers). |
| `bit_layout.py` | Per-form bit-field tables (rendered into the Encoding section of every page and into `forms/*.md`). |
| `rust_scraper.py` | Locates each `PpcOpcode::<mnem>` enum variant, decoder arm, and interpreter match-arm line range. |
| `cxx_scraper.py` | Locates `InstrEmit_<mnem>` in the xenia-canary emit `.cc` files. |
## Running
```bash
python3 ppc-manual/generator/generate_manual.py # full generate
python3 ppc-manual/generator/generate_manual.py --dry-run # parse + consistency checks only
python3 ppc-manual/generator/generate_manual.py --out /tmp/out # alternate output root
python3 ppc-manual/generator/generate_manual.py --xml /path/to/ppc-instructions.xml
```
No third-party dependencies; Python 3.10+ standard library only.
## Idempotency
The generator is re-runnable without data loss:
1. Each page has a pair of sentinel comments:
- `<!-- GENERATED: BEGIN -->`
- `<!-- GENERATED: END -->`
2. On re-run, only the text **between** the sentinels is rewritten.
Everything after `END` (Special Cases, Related Instructions, IBM
Reference) is preserved verbatim.
3. If the `END` sentinel is missing, the generator assumes a reviewer
has fully taken over the file and skips it entirely.
## Consistency checks (enforced by `--dry-run` as well)
- **XML entry count ≡ 455** — warns if the XML has been modified.
- **family membership total ≡ XML entry count** — every XML entry
must land in exactly one family.
- **index coverage ≡ runtime-expanded mnemonic count** — the JSON
index must contain a key for every runtime variant (`add`, `add.`,
`addo`, `addo.`, `bclr`, `bclrl`, …).
## Family grouping rules
Three rules applied in order (see `_family_head` in
`generate_manual.py`):
1. If a mnemonic ends in `128` and the non-128 sibling exists, it
joins the sibling's family. So `vaddfp128` is consolidated into
the `vaddfp` page.
2. For memory ops (group `m`), trailing `u`, `x`, or `ux` suffixes
are stripped when the base exists. So `lwz`, `lwzu`, `lwzx`,
`lwzux` all land on the `lwz` page.
3. Otherwise the mnemonic is its own family head.
All other flag variants (`Rc`, `OE`, `LK`) are **runtime** — they are
NOT separate XML entries; they are listed in the page's "Assembler
Mnemonics" table.
## Category mapping
| XML group | Category dir | Notes |
| --- | --- | --- |
| `i` (integer) | `alu/` | |
| `m` (memory) | `memory/` | |
| `b` (branch) | `branch/` | Includes `sc` and traps |
| `c` (control) | `control/` | CR logical, SPR, sync |
| `f` (fpu) | `fpu/` | |
| `v` (vector) | `vmx/` or `vmx128/` | Split by form: `VX128*``vmx128/` |
## Extending the generator
- **Pseudocode seeds.** The `PSEUDOCODE_SEEDS` dict in
`generate_manual.py` maps an XML mnemonic to a PPC-style pseudocode
block. Add entries here to pre-fill the Operation section for
additional mnemonics. Phase 2 reviewers can still override by
writing content outside the sentinels.
- **C translation seeds.** Similar dict of C snippets keyed by family
head.
- **Field descriptions.** `FIELD_DESCRIPTIONS` maps XML field names to
IBM-style prose. Missing entries are marked "_Phase 2: document
this field._"
## Known limitations
- Extended-opcode extraction in `xml_model.Instruction.extended_opcode`
is best-effort per form. For VMX128 variants the extracted value may
not match the exact pattern used by xenia's decoder tree — the page
still shows it as a reference but the decoder source (linked on
every page) is authoritative.
- `rust_scraper` uses a naive brace counter to delimit interpreter
match arms. It works for the current interpreter because the match
arms use balanced braces and no string literals with unbalanced
braces. If the interpreter ever adopts such literals the scraper
will need a Rust-aware parser.
- The generator treats mnemonics ending in `x` as xenia convention
("extended/XO form") and strips them for assembly display — except
for the memory group, where `x` is the natural indexed-form suffix.
If future xenia XML adds a new group where `x` is structural, the
heuristic in `xml_model.expand_runtime_variants` needs updating.

View File

@@ -0,0 +1,266 @@
"""
Canonical bit layout per PPC instruction form.
Tables derived from xenia-canary/src/xenia/cpu/ppc/ppc_instr.h (struct
PPCOpcodeBits union). In PPC notation bit 0 is the MSB of the 32-bit
word (big-endian bit numbering).
Each entry is a list of (bit_start, bit_end_inclusive, field_name,
notes) tuples laid out from MSB (bit 0) to LSB (bit 31).
"""
# NOTE: bit ranges use PPC big-endian numbering (0 = MSB, 31 = LSB).
FORM_LAYOUTS: dict[str, list[tuple[int, int, str, str]]] = {
"I": [
(0, 5, "OPCD", "primary opcode"),
(6, 29, "LI", "signed 24-bit word-offset target"),
(30, 30, "AA", "absolute-address flag"),
(31, 31, "LK", "link flag (bl/ba/bla)"),
],
"B": [
(0, 5, "OPCD", "primary opcode"),
(6, 10, "BO", "branch options"),
(11, 15, "BI", "CR bit to test"),
(16, 29, "BD", "signed 14-bit word-offset target"),
(30, 30, "AA", "absolute-address flag"),
(31, 31, "LK", "link flag"),
],
"SC": [
(0, 5, "OPCD", "primary opcode (17)"),
(6, 19, "", "reserved"),
(20, 26, "LEV", "exception level"),
(27, 29, "", "reserved"),
(30, 30, "1", "fixed 1"),
(31, 31, "", "reserved"),
],
"D": [
(0, 5, "OPCD", "primary opcode"),
(6, 10, "RT", "destination GPR (or RS when storing)"),
(11, 15, "RA", "source GPR (0 ⇒ literal 0 for RA0 forms)"),
(16, 31, "D/SI/UI", "16-bit signed or unsigned immediate"),
],
"DS": [
(0, 5, "OPCD", "primary opcode"),
(6, 10, "RT", "destination GPR (or RS)"),
(11, 15, "RA", "source GPR (0 ⇒ literal 0)"),
(16, 29, "DS", "14-bit signed word-scaled displacement"),
(30, 31, "XO", "extended opcode"),
],
"X": [
(0, 5, "OPCD", "primary opcode"),
(6, 10, "RT/FRT/VRT", "destination"),
(11, 15, "RA/FRA/VRA", "source A"),
(16, 20, "RB/FRB/VRB", "source B"),
(21, 30, "XO", "extended opcode (10 bits)"),
(31, 31, "Rc", "record-form flag"),
],
"XL": [
(0, 5, "OPCD", "primary opcode (19)"),
(6, 10, "BT/BO", "target / branch options"),
(11, 15, "BA/BI", "source A / CR bit to test"),
(16, 20, "BB", "source B"),
(21, 30, "XO", "extended opcode (10 bits)"),
(31, 31, "LK", "link flag"),
],
"XFX": [
(0, 5, "OPCD", "primary opcode (31)"),
(6, 10, "RT", "destination / source GPR"),
(11, 20, "spr/tbr/FXM", "SPR/TBR number (byte-swapped halves) or CR field mask"),
(21, 30, "XO", "extended opcode"),
(31, 31, "", "reserved"),
],
"XFL": [
(0, 5, "OPCD", "primary opcode (63)"),
(6, 6, "L", "field-select behaviour"),
(7, 14, "FM", "FPSCR field mask"),
(15, 15, "W", "immediate-value flag"),
(16, 20, "FRB", "source FPR"),
(21, 30, "XO", "extended opcode"),
(31, 31, "Rc", "record-form flag (updates CR1)"),
],
"XS": [
(0, 5, "OPCD", "primary opcode (31)"),
(6, 10, "RS", "source GPR"),
(11, 15, "RA", "destination GPR"),
(16, 20, "sh", "shift amount low 5 bits"),
(21, 29, "XO", "extended opcode (9 bits)"),
(30, 30, "sh5", "shift amount high bit"),
(31, 31, "Rc", "record-form flag"),
],
"XO": [
(0, 5, "OPCD", "primary opcode (31)"),
(6, 10, "RT", "destination GPR"),
(11, 15, "RA", "source A"),
(16, 20, "RB", "source B"),
(21, 21, "OE", "overflow-enable flag"),
(22, 30, "XO", "extended opcode (9 bits)"),
(31, 31, "Rc", "record-form flag"),
],
"A": [
(0, 5, "OPCD", "primary opcode (59 or 63)"),
(6, 10, "FRT", "destination FPR"),
(11, 15, "FRA", "source A FPR"),
(16, 20, "FRB", "source B FPR"),
(21, 25, "FRC", "source C FPR (multiplier for madd-style ops)"),
(26, 30, "XO", "extended opcode (5 bits)"),
(31, 31, "Rc", "record-form flag (updates CR1)"),
],
"M": [
(0, 5, "OPCD", "primary opcode"),
(6, 10, "RS", "source GPR"),
(11, 15, "RA", "destination GPR"),
(16, 20, "SH/RB", "shift amount or source B"),
(21, 25, "MB", "mask begin"),
(26, 30, "ME", "mask end"),
(31, 31, "Rc", "record-form flag"),
],
"MD": [
(0, 5, "OPCD", "primary opcode (30)"),
(6, 10, "RS", "source GPR"),
(11, 15, "RA", "destination GPR"),
(16, 20, "sh", "shift amount low 5 bits"),
(21, 26, "mb/me", "6-bit mask field (swapped halves)"),
(27, 29, "XO", "extended opcode"),
(30, 30, "sh5", "shift amount high bit"),
(31, 31, "Rc", "record-form flag"),
],
"MDS": [
(0, 5, "OPCD", "primary opcode (30)"),
(6, 10, "RS", "source GPR"),
(11, 15, "RA", "destination GPR"),
(16, 20, "RB", "source B GPR"),
(21, 26, "mb/me", "6-bit mask field (swapped halves)"),
(27, 30, "XO", "extended opcode"),
(31, 31, "Rc", "record-form flag"),
],
"DCBZ": [
(0, 5, "OPCD", "primary opcode (31)"),
(6, 10, "", "reserved"),
(11, 15, "RA", "base register (0 ⇒ literal 0)"),
(16, 20, "RB", "offset register"),
(21, 30, "XO", "extended opcode (1014 for dcbz / 1010 for dcbz128)"),
(31, 31, "", "reserved"),
],
"VX": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VRT/VD", "destination vector register"),
(11, 15, "VRA/VA", "source A vector register"),
(16, 20, "VRB/VB", "source B vector register"),
(21, 31, "XO", "extended opcode (11 bits)"),
],
"VA": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VRT", "destination vector register"),
(11, 15, "VRA", "source A"),
(16, 20, "VRB", "source B"),
(21, 25, "VRC", "source C / shift"),
(26, 31, "XO", "extended opcode (6 bits)"),
],
"VC": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VRT", "destination vector register"),
(11, 15, "VRA", "source A"),
(16, 20, "VRB", "source B"),
(21, 21, "Rc", "record-form flag (updates CR6)"),
(22, 31, "XO", "extended opcode (10 bits)"),
],
"VX128": [
(0, 5, "OPCD", "primary opcode (4 or 5)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "VA128l", "source A low 5 bits"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 21, "VA128H", "source A high bit"),
(22, 22, "", "reserved"),
(23, 25, "VC", "optional VC / XO sub-field"),
(26, 26, "VA128h", "source A middle bit"),
(27, 27, "", "reserved"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_1": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "RA", "address register"),
(16, 20, "RB", "offset register"),
(21, 27, "XO", "extended opcode"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "", "reserved"),
],
"VX128_2": [
(0, 5, "OPCD", "primary opcode (5)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "VA128l", "source A low 5 bits"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 21, "VA128H", "source A high bit"),
(23, 25, "VC", "source C 3-bit field"),
(26, 26, "VA128h", "source A middle bit"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_3": [
(0, 5, "OPCD", "primary opcode (6)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "IMM", "5-bit immediate"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 27, "XO", "extended opcode"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_4": [
(0, 5, "OPCD", "primary opcode (6)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "IMM", "5-bit immediate"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 23, "XO", "extended opcode"),
(24, 25, "z", "sub-operation selector"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_5": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "VA128l", "source A low 5 bits"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 21, "VA128H", "source A high bit"),
(22, 25, "SH", "4-bit shift amount"),
(26, 26, "VA128h", "source A middle bit"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_P": [
(0, 5, "OPCD", "primary opcode (6)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "PERMl", "permute selector low 5 bits"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 22, "", "reserved"),
(23, 25, "PERMh", "permute selector high 3 bits"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
"VX128_R": [
(0, 5, "OPCD", "primary opcode (4)"),
(6, 10, "VD128l", "destination low 5 bits"),
(11, 15, "VA128l", "source A low 5 bits"),
(16, 20, "VB128l", "source B low 5 bits"),
(21, 21, "VA128H", "source A high bit"),
(22, 25, "XO", "extended opcode (compare)"),
(26, 26, "VA128h", "source A middle bit"),
(27, 27, "Rc", "record-form flag (updates CR6)"),
(28, 29, "VD128h", "destination high 2 bits"),
(30, 31, "VB128h", "source B high 2 bits"),
],
}
def render_bit_table(form: str) -> str:
"""Return a markdown table of the form's bit layout."""
layout = FORM_LAYOUTS.get(form)
if not layout:
return f"_Unknown form_ `{form}` _— see `forms/` for details._"
rows = ["| Bits | Field | Meaning |", "| --- | --- | --- |"]
for start, end, name, notes in layout:
span = f"{start}" if start == end else f"{start}{end}"
rows.append(f"| {span} | `{name}` | {notes} |")
return "\n".join(rows)

View File

@@ -0,0 +1,75 @@
"""
Scrapes xenia-canary's emit files for the location of each instruction's
semantic implementation function `InstrEmit_<mnem>`.
The files are:
src/xenia/cpu/ppc/ppc_emit_alu.cc (integer ALU)
src/xenia/cpu/ppc/ppc_emit_memory.cc (loads/stores/cache/sync)
src/xenia/cpu/ppc/ppc_emit_altivec.cc (VMX + VMX128)
src/xenia/cpu/ppc/ppc_emit_fpu.cc (floating-point)
src/xenia/cpu/ppc/ppc_emit_control.cc (branch/CR/SPR/syscall/trap)
Returns, for each mnemonic, the relative file path and the starting line
of the `int InstrEmit_<mnem>(...)` definition.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import re
CXX_EMIT_FILES = [
"src/xenia/cpu/ppc/ppc_emit_alu.cc",
"src/xenia/cpu/ppc/ppc_emit_memory.cc",
"src/xenia/cpu/ppc/ppc_emit_altivec.cc",
"src/xenia/cpu/ppc/ppc_emit_fpu.cc",
"src/xenia/cpu/ppc/ppc_emit_control.cc",
]
@dataclass
class CxxRef:
mnem: str
emit_file: str | None = None # relative to xenia-canary/
emit_line: int | None = None
def _cxx_ident(mnem: str) -> str:
"""Canary maps '.' in the mnemonic to a trailing 'x' in the C++ symbol
(e.g. addic. → InstrEmit_addicx)."""
return mnem.replace(".", "x")
class CxxScraper:
def __init__(self, repo_root: Path):
self.canary_root = repo_root / "xenia-canary"
self._index: dict[str, tuple[str, int]] = {}
fn_pat = re.compile(r"^\s*int\s+InstrEmit_([A-Za-z_][A-Za-z0-9_]*)\s*\(")
for rel in CXX_EMIT_FILES:
path = self.canary_root / rel
if not path.is_file():
continue
for i, line in enumerate(path.read_text(encoding="utf-8").splitlines(), start=1):
m = fn_pat.match(line)
if not m:
continue
name = m.group(1)
self._index.setdefault(name, (rel, i))
def lookup(self, mnem: str) -> CxxRef:
ident = _cxx_ident(mnem)
hit = self._index.get(ident)
if hit is None:
return CxxRef(mnem=mnem)
return CxxRef(mnem=mnem, emit_file=hit[0], emit_line=hit[1])
if __name__ == "__main__":
root = Path(__file__).resolve().parent.parent.parent
s = CxxScraper(root)
for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp",
"vaddfp128", "faddx", "lvsl"):
r = s.lookup(m)
print(f"{m:12s} {r.emit_file}:{r.emit_line}")

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,184 @@
"""
Scrapes xenia-rs source files for per-instruction references and
snippets of the interpreter semantics.
Outputs produced for each mnemonic:
- opcode_line: line in crates/xenia-cpu/src/opcode.rs where the
PpcOpcode variant is declared (1-indexed)
- decoder_line: line in crates/xenia-cpu/src/decoder.rs where the
variant is produced from raw bits
- interp_start: line in crates/xenia-cpu/src/interpreter.rs where
the match arm `PpcOpcode::<mnem> =>` begins
- interp_end: line where the arm closes (matching brace, naive)
- interp_body: raw text of the arm body (for reviewer reference)
The xenia-rs opcode identifier often has trailing `x` preserved
(PpcOpcode::addx) — this scraper matches on the XML mnemonic directly
plus a stripped alternative without trailing 'x' and the xenia-style
identifier forms.
"""
from __future__ import annotations
from dataclasses import dataclass
from pathlib import Path
import re
@dataclass
class RustRef:
mnem: str
opcode_line: int | None = None
decoder_line: int | None = None
interp_start: int | None = None
interp_end: int | None = None
interp_body: str = ""
# PpcOpcode identifiers in xenia-rs match the XML mnemonic *exactly* except
# that '.' is illegal in Rust identifiers. Mnemonics ending in '.' appear as
# a trailing 'x' replacement in some cases but the codebase seems to keep the
# XML name verbatim (e.g. addic. → addicx OR addic_). Check the codebase.
def _rust_ident(mnem: str) -> str:
"""Convert XML mnemonic to the xenia-rs PpcOpcode variant name."""
# Xenia-rs uses the same name as xenia-canary's opcode enum, which
# mirrors ppc-instructions.xml directly. '.' is replaced with 'x' in
# the opcode enum (e.g. 'addic.' → 'addicx'), but the XML entry is
# already 'addic.'. We only need to handle that single case.
return mnem.replace(".", "x")
class RustScraper:
def __init__(self, repo_root: Path):
self.repo_root = repo_root
self.cpu_root = repo_root / "xenia-rs" / "crates" / "xenia-cpu" / "src"
self._opcode_lines = self._read_lines(self.cpu_root / "opcode.rs")
self._decoder_lines = self._read_lines(self.cpu_root / "decoder.rs")
self._interp_lines = self._read_lines(self.cpu_root / "interpreter.rs")
self._opcode_index: dict[str, int] = self._index_opcode_enum()
self._decoder_index: dict[str, int] = self._index_decoder()
self._interp_index: dict[str, tuple[int, int]] = self._index_interpreter()
@staticmethod
def _read_lines(path: Path) -> list[str]:
if not path.is_file():
return []
return path.read_text(encoding="utf-8").splitlines()
def _index_opcode_enum(self) -> dict[str, int]:
"""Map rust-identifier → 1-indexed line in opcode.rs. The enum uses
comma-separated identifiers (often many per line) so we extract
every identifier match inside the enum body."""
idx: dict[str, int] = {}
token = re.compile(r"\b([A-Za-z_][A-Za-z0-9_]*)\b")
in_enum = False
for i, line in enumerate(self._opcode_lines, start=1):
if "pub enum PpcOpcode" in line:
in_enum = True
continue
if not in_enum:
continue
if line.startswith("}"):
break
stripped = line.strip()
# skip blank / comment-only lines
if not stripped or stripped.startswith("//"):
continue
# split off any trailing line comment
code = stripped.split("//", 1)[0]
for m in token.finditer(code):
idx.setdefault(m.group(1), i)
return idx
def _index_decoder(self) -> dict[str, int]:
"""Map rust-identifier → 1-indexed line of its `PpcOpcode::<name>` producer."""
idx: dict[str, int] = {}
pat = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)")
for i, line in enumerate(self._decoder_lines, start=1):
for m in pat.finditer(line):
name = m.group(1)
# keep the FIRST occurrence (the match-arm line where it's
# produced, not any later references)
idx.setdefault(name, i)
return idx
def _index_interpreter(self) -> dict[str, tuple[int, int]]:
"""Map rust-identifier → (start, end) lines of the match arm.
An arm starts at `PpcOpcode::<name>` and ends at the closing `}`
at the same indentation level. We accept multi-variant arms of
the form `PpcOpcode::a | PpcOpcode::b => {` by recording the same
(start, end) for every named variant.
"""
arm_header = re.compile(r"^(\s*)((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*\s*\|\s*)*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)\s*=>\s*\{?\s*$")
# Some arms use no leading whitespace quirks — adjusted regex:
arm_header = re.compile(
r"^(\s*)" # indent
r"((?:PpcOpcode::[A-Za-z_][A-Za-z0-9_]*" # first variant
r"(?:\s*\|\s*PpcOpcode::[A-Za-z_][A-Za-z0-9_]*)*))" # more variants
r"\s*=>\s*\{?\s*$"
)
var_re = re.compile(r"PpcOpcode::([A-Za-z_][A-Za-z0-9_]*)")
idx: dict[str, tuple[int, int]] = {}
i = 0
n = len(self._interp_lines)
while i < n:
line = self._interp_lines[i]
m = arm_header.match(line)
if not m:
i += 1
continue
indent = m.group(1)
names = var_re.findall(m.group(2))
# Find the closing '}' at the same indentation. The arm body
# starts on line i (which ends with '{') and ends at a line
# whose content (after `indent`) is '}' (with optional trailing
# comma).
start = i + 1 # 1-indexed
end = start
j = i + 1
depth = 1 if line.rstrip().endswith("{") else 0
if depth == 0:
# Single-expression arm like `... => foo(),` — treat the line
# itself as start=end.
end = start
j = i + 1
else:
while j < n:
l = self._interp_lines[j]
# A naive brace counter suffices for this codebase — the
# interpreter arms use balanced braces and no string
# literals containing stray braces.
depth += l.count("{") - l.count("}")
if depth == 0:
end = j + 1 # 1-indexed
break
j += 1
for name in names:
idx.setdefault(name, (start, end))
i = j + 1
return idx
def lookup(self, mnem: str) -> RustRef:
ident = _rust_ident(mnem)
ref = RustRef(mnem=mnem)
ref.opcode_line = self._opcode_index.get(ident)
ref.decoder_line = self._decoder_index.get(ident)
rng = self._interp_index.get(ident)
if rng:
ref.interp_start, ref.interp_end = rng
body = "\n".join(self._interp_lines[ref.interp_start - 1: ref.interp_end])
ref.interp_body = body
return ref
if __name__ == "__main__":
root = Path(__file__).resolve().parent.parent.parent
s = RustScraper(root)
for m in ("addx", "addic.", "lwz", "bclrx", "mfspr", "stvx", "vaddfp",
"vaddfp128", "faddx", "lvsl"):
r = s.lookup(m)
print(f"{m:12s} opcode@{r.opcode_line} decoder@{r.decoder_line} "
f"interp@{r.interp_start}-{r.interp_end}")

View File

@@ -0,0 +1,231 @@
"""
Parses xenia-canary's tools/ppc-instructions.xml into typed records.
The XML is the authoritative catalogue of Xbox 360 PPC instructions
(455 <insn> entries). Each entry carries:
- mnem: mnemonic (e.g. "addx", "lwzu", "vaddfp128")
- opcode: 32-bit hex encoding (primary + extended opcode bits)
- form: instruction format (XO, D, DS, X, XL, XFX, ..., VX, VX128_*)
- group: functional group (i=integer, m=memory, b=branch,
c=control, f=fpu, v=vmx)
- desc: short human-readable description
- <in>/<out> fields with optional conditional="true" flag
- <disasm>: template string used by the canary disassembler
"""
from __future__ import annotations
import xml.etree.ElementTree as ET
from dataclasses import dataclass, field
from pathlib import Path
GROUP_NAMES = {
"i": "integer",
"m": "memory",
"b": "branch",
"c": "control",
"f": "fpu",
"v": "vmx",
}
# Maps the short group code to the manual's on-disk category directory.
# VMX entries are split by form in generate_manual.py (VX128_* → vmx128/).
GROUP_TO_DIR = {
"i": "alu",
"m": "memory",
"b": "branch",
"c": "control",
"f": "fpu",
"v": "vmx",
}
@dataclass
class Field:
name: str
conditional: bool = False
@dataclass
class Instruction:
mnem: str
opcode_hex: str # lowercase, no "0x" prefix
form: str
group: str # one-letter code
desc: str
sync: bool
reads: list[Field] = field(default_factory=list)
writes: list[Field] = field(default_factory=list)
disasm: str = ""
@property
def opcode_int(self) -> int:
return int(self.opcode_hex, 16)
@property
def primary_opcode(self) -> int:
# PPC: bits 0-5 of a big-endian 32-bit word are the top 6 bits.
return (self.opcode_int >> 26) & 0x3F
@property
def extended_opcode(self) -> int | None:
"""Best-effort extended opcode extraction by form.
Returns None for forms where "extended opcode" is not meaningful
(I, B, D, DS, SC, M, MD, MDS, DCBZ) — those pages will omit it."""
code = self.opcode_int
form = self.form
if form in ("X", "XL", "XFX", "XFL", "XS", "DCBZ"):
return (code >> 1) & 0x3FF # bits 21-30
if form == "XO":
return (code >> 1) & 0x1FF # bits 22-30 (bit 21 = OE)
if form == "A":
return (code >> 1) & 0x1F # bits 26-30
if form in ("VX", "VX128_2", "VX128_5"):
return code & 0x7FF # bits 21-31
if form == "VA":
return code & 0x3F # bits 26-31
if form == "VC":
return code & 0x3FF # bits 22-31 (bit 21 = Rc)
if form in ("VX128", "VX128_R"):
# complex split; best-effort — not used for lookup, just display
return code & 0x7FF
if form in ("VX128_1", "VX128_3", "VX128_4", "VX128_P"):
return code & 0x7FF
return None
@property
def group_name(self) -> str:
return GROUP_NAMES.get(self.group, "unknown")
@property
def has_rc(self) -> bool:
"""Does this instruction have a runtime Rc bit (record form)?"""
return any(w.name == "CR" and w.conditional for w in self.writes)
@property
def has_oe(self) -> bool:
"""Does this instruction have a runtime OE bit (overflow enable)?"""
return any(w.name == "OE" and w.conditional for w in self.writes)
@property
def has_lk(self) -> bool:
"""Does this instruction have a runtime LK bit (branch link)?"""
return any(r.name == "LK" for r in self.reads)
@property
def rc_is_mandatory(self) -> bool:
"""Instructions like `addic.` where CR is written unconditionally."""
return any(w.name == "CR" and not w.conditional for w in self.writes)
def load_instructions(xml_path: Path | str) -> list[Instruction]:
tree = ET.parse(str(xml_path))
root = tree.getroot()
insns: list[Instruction] = []
for node in root.iter("insn"):
reads = [Field(x.get("field", ""), x.get("conditional") == "true")
for x in node.findall("in")]
writes = [Field(x.get("field", ""), x.get("conditional") == "true")
for x in node.findall("out")]
disasm_node = node.find("disasm")
disasm = (disasm_node.text or "").strip() if disasm_node is not None else ""
insns.append(Instruction(
mnem=node.get("mnem", ""),
opcode_hex=node.get("opcode", "").lower(),
form=node.get("form", ""),
group=node.get("group", ""),
desc=node.get("desc", ""),
sync=node.get("sync") == "true",
reads=reads,
writes=writes,
disasm=disasm,
))
return insns
def expand_runtime_variants(insn: Instruction) -> list[dict]:
"""
Return the set of concrete assembly mnemonics this XML entry represents
under different runtime flag settings. Flags: Rc (record) → append '.',
OE (overflow) → insert 'o' before any '.', LK (link) → append 'l'.
The display mnemonic is derived from the XML mnem by stripping a trailing
'x' if present (xenia uses trailing x to mark X/XO form entries; the
assembly mnemonic omits it). Mnemonics ending in '.' or digits are kept.
"""
raw = insn.mnem
# Xenia convention: trailing 'x' on XO/X/A/M/MD/MDS/XFL/XS/VX/VA form
# marks "extended form" but is dropped in assembly display.
# Keep trailing x for: memory indexed forms (lbzx, lwzx, ...), which are
# separate XML entries — those should not have their x stripped.
# We use the group code to decide: group=i / group=f / group=c /
# form family VX*/VA/VC → strip trailing x. group=m / group=b → keep.
def strip_x(m: str) -> str:
if not m.endswith("x"):
return m
# Memory mnemonics: 'x' is part of the assembly name (indexed form).
if insn.group == "m":
return m
# Branch: bx/bcx/bcctrx/bclrx — xenia's trailing x, strip.
return m[:-1]
base = strip_x(raw)
variants: list[dict] = []
if insn.rc_is_mandatory:
# e.g. addic. — already has the dot baked in
variants.append({"mnem": raw, "flags": {}, "is_primary": True})
return variants
has_rc = insn.has_rc
has_oe = insn.has_oe
has_lk = insn.has_lk
if not (has_rc or has_oe or has_lk):
variants.append({"mnem": base, "flags": {}, "is_primary": True})
return variants
# Enumerate all combinations of the runtime flags that apply.
def insert_o(name: str) -> str:
# 'addo' / 'addo.' — insert 'o' before any trailing '.'
if name.endswith("."):
return name[:-1] + "o."
return name + "o"
combos: list[tuple[str, dict]] = [(base, {})]
if has_oe:
combos += [(insert_o(n), {**f, "OE": 1}) for (n, f) in combos]
if has_rc:
combos += [(n + ".", {**f, "Rc": 1}) for (n, f) in combos]
if has_lk:
# Branch link: append 'l' AFTER any trailing dot? PPC convention:
# bl, bcl, bclrl, bcctrl — 'l' is appended at the end of the base
# mnemonic with no dot (branches don't have Rc). Add the l-variant
# only when OE/Rc weren't applied.
combos += [(n + "l", {**f, "LK": 1}) for (n, f) in combos if "Rc" not in f and "OE" not in f]
for i, (name, flags) in enumerate(combos):
variants.append({"mnem": name, "flags": flags, "is_primary": i == 0})
return variants
if __name__ == "__main__":
# Smoke test: print summary of what we loaded.
import sys
repo_root = Path(__file__).resolve().parent.parent.parent
xml = repo_root / "xenia-canary" / "tools" / "ppc-instructions.xml"
insns = load_instructions(xml)
print(f"Loaded {len(insns)} instructions from {xml}")
total_mnems = sum(len(expand_runtime_variants(i)) for i in insns)
print(f"Total runtime-expanded mnemonics: {total_mnems}")
# show 5 examples
for mnem in ("addx", "lwz", "bclrx", "mfspr", "stvx", "vaddfp", "vaddfp128", "addic."):
for i in insns:
if i.mnem == mnem:
vs = expand_runtime_variants(i)
print(f" {mnem:12s} form={i.form:7s} group={i.group} "
f"variants={[v['mnem'] for v in vs]}")
break
else:
print(f" {mnem:12s} NOT FOUND")