# Manual generator Python scripts that build the `ppc-manual/` tree from the two authoritative sources in this repository: - `xenia-canary/tools/ppc-instructions.xml` — metadata for all 455 Xbox 360 PPC instructions (mnemonic, form, group, opcode, in/out fields, disasm template). - `xenia-rs/crates/xenia-cpu/src/` — the Rust interpreter. Individual instruction semantics live in `interpreter.rs` match arms. - `xenia-canary/src/xenia/cpu/ppc/ppc_emit_*.cc` — the C++ emit functions; referenced by line number only. ## Files | File | Purpose | | --- | --- | | `generate_manual.py` | Main entry point. Parses XML, builds families, renders pages, writes `index.json`. | | `xml_model.py` | XML parser + `expand_runtime_variants()` (produces the set of Rc/OE/LK-expanded mnemonics a single XML entry covers). | | `bit_layout.py` | Per-form bit-field tables (rendered into the Encoding section of every page and into `forms/*.md`). | | `rust_scraper.py` | Locates each `PpcOpcode::` enum variant, decoder arm, and interpreter match-arm line range. | | `cxx_scraper.py` | Locates `InstrEmit_` in the xenia-canary emit `.cc` files. | ## Running ```bash python3 ppc-manual/generator/generate_manual.py # full generate python3 ppc-manual/generator/generate_manual.py --dry-run # parse + consistency checks only python3 ppc-manual/generator/generate_manual.py --out /tmp/out # alternate output root python3 ppc-manual/generator/generate_manual.py --xml /path/to/ppc-instructions.xml ``` No third-party dependencies; Python 3.10+ standard library only. ## Idempotency The generator is re-runnable without data loss: 1. Each page has a pair of sentinel comments: - `` - `` 2. On re-run, only the text **between** the sentinels is rewritten. Everything after `END` (Special Cases, Related Instructions, IBM Reference) is preserved verbatim. 3. If the `END` sentinel is missing, the generator assumes a reviewer has fully taken over the file and skips it entirely. ## Consistency checks (enforced by `--dry-run` as well) - **XML entry count ≡ 455** — warns if the XML has been modified. - **family membership total ≡ XML entry count** — every XML entry must land in exactly one family. - **index coverage ≡ runtime-expanded mnemonic count** — the JSON index must contain a key for every runtime variant (`add`, `add.`, `addo`, `addo.`, `bclr`, `bclrl`, …). ## Family grouping rules Three rules applied in order (see `_family_head` in `generate_manual.py`): 1. If a mnemonic ends in `128` and the non-128 sibling exists, it joins the sibling's family. So `vaddfp128` is consolidated into the `vaddfp` page. 2. For memory ops (group `m`), trailing `u`, `x`, or `ux` suffixes are stripped when the base exists. So `lwz`, `lwzu`, `lwzx`, `lwzux` all land on the `lwz` page. 3. Otherwise the mnemonic is its own family head. All other flag variants (`Rc`, `OE`, `LK`) are **runtime** — they are NOT separate XML entries; they are listed in the page's "Assembler Mnemonics" table. ## Category mapping | XML group | Category dir | Notes | | --- | --- | --- | | `i` (integer) | `alu/` | | | `m` (memory) | `memory/` | | | `b` (branch) | `branch/` | Includes `sc` and traps | | `c` (control) | `control/` | CR logical, SPR, sync | | `f` (fpu) | `fpu/` | | | `v` (vector) | `vmx/` or `vmx128/` | Split by form: `VX128*` → `vmx128/` | ## Extending the generator - **Pseudocode seeds.** The `PSEUDOCODE_SEEDS` dict in `generate_manual.py` maps an XML mnemonic to a PPC-style pseudocode block. Add entries here to pre-fill the Operation section for additional mnemonics. Phase 2 reviewers can still override by writing content outside the sentinels. - **C translation seeds.** Similar dict of C snippets keyed by family head. - **Field descriptions.** `FIELD_DESCRIPTIONS` maps XML field names to IBM-style prose. Missing entries are marked "_Phase 2: document this field._" ## Known limitations - Extended-opcode extraction in `xml_model.Instruction.extended_opcode` is best-effort per form. For VMX128 variants the extracted value may not match the exact pattern used by xenia's decoder tree — the page still shows it as a reference but the decoder source (linked on every page) is authoritative. - `rust_scraper` uses a naive brace counter to delimit interpreter match arms. It works for the current interpreter because the match arms use balanced braces and no string literals with unbalanced braces. If the interpreter ever adopts such literals the scraper will need a Rust-aware parser. - The generator treats mnemonics ending in `x` as xenia convention ("extended/XO form") and strips them for assembly display — except for the memory group, where `x` is the natural indexed-form suffix. If future xenia XML adds a new group where `x` is structural, the heuristic in `xml_model.expand_runtime_variants` needs updating.