--- name: Disassembler unification — Phase 2 complete (2026-04-27) description: Iterator + 3 sinks (text/JSON/DuckDB) layered over Phase 1's format(). New `xenia dis --json` subcommand. db.rs and formatter.rs both drive through enrich_section. type: project originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec --- **Phase 2 of disassembler unification is COMPLETE** (2026-04-27, same session as Phase 1). ## What's in place ### xenia-cpu (decoder + iterator) - **[crates/xenia-cpu/src/disasm.rs](crates/xenia-cpu/src/disasm.rs)** adds: - `pub struct DisasmItem { addr, raw, opcode, text }` — yielded by the iterator. - `pub fn iter_disasm(image, image_base, va_start, va_end) -> impl Iterator` — walks bytes in PPC big-endian, decodes via `decoder::decode`, formats via `format`, yields one `DisasmItem` per 4-byte word. Stops on truncated tail. - 2 new unit tests: `iter_disasm_walks_byte_slice_in_order`, `iter_disasm_stops_on_truncated_tail`. - **[crates/xenia-cpu/src/lib.rs](crates/xenia-cpu/src/lib.rs)** re-exports `DisasmItem`, `iter_disasm`. ### xenia-analysis (enrichment + sinks) - **[crates/xenia-analysis/src/disasm.rs](crates/xenia-analysis/src/disasm.rs)** (NEW, ~50 LOC): - `pub struct RichDisasmItem<'a> { item, section, function, label }` — adds analysis context. - `pub fn enrich_section(image, image_base, section_name, va_start, va_end, func_analysis, labels) -> impl Iterator>` — wraps `iter_disasm` with rolling-window function tracking + label lookup. - **[crates/xenia-analysis/src/sinks/mod.rs](crates/xenia-analysis/src/sinks/mod.rs)** (NEW): module declarations. - **[crates/xenia-analysis/src/sinks/duckdb.rs](crates/xenia-analysis/src/sinks/duckdb.rs)** (NEW, ~30 LOC): `append_instructions(appender, items) -> Result` — DuckDB Appender call per row. - **[crates/xenia-analysis/src/sinks/json.rs](crates/xenia-analysis/src/sinks/json.rs)** (NEW, ~60 LOC): `write_jsonl(out, items) -> io::Result` — one JSON object per line. Internal `JsonRow<'a>` derives Serialize; uses `#[serde(skip_serializing_if = "Option::is_none")]` to keep rows compact. - **[crates/xenia-analysis/src/sinks/text.rs](crates/xenia-analysis/src/sinks/text.rs)** (NEW, ~50 LOC): `write_instr_line(out, item, labels, sections, image_base, data_annotation)` — renders one .asm line with branch-target / data-ref annotation. Uses the structured `branch_target` field (not a regex over the disasm string — cleaner than the old `annotate_branch`). - **[crates/xenia-analysis/src/lib.rs](crates/xenia-analysis/src/lib.rs)** declares `disasm` and `sinks` modules; re-exports `RichDisasmItem` and `enrich_section`. - **[crates/xenia-analysis/Cargo.toml](crates/xenia-analysis/Cargo.toml)** adds `serde_json = { workspace = true }` dep. ### Refactored call sites - **[crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs)** `insert_instructions_streaming` collapsed from a 50-line byte loop into 12 lines: `for section { let items = enrich_section(...); total += sinks::duckdb::append_instructions(&mut appender, items)?; }`. - **[crates/xenia-analysis/src/formatter.rs](crates/xenia-analysis/src/formatter.rs)** code-section loop collapsed: now iterates `enrich_section` and calls `write_instr_line` for the per-line render. Orchestration (function headers, labels, xref comments, import annotations) stays in formatter.rs. The old `annotate_branch` helper is **deleted** — branch-target annotation lives in the text sink and uses `branch_target: Option` from `DisasmText`. ### CLI - **[crates/xenia-app/src/main.rs](crates/xenia-app/src/main.rs)**: new `--json ` flag on `dis` subcommand. Writes JSON Lines via `sinks::json::write_jsonl` per code section. Wires through `cmd_dis` signature. ## Architecture ``` ┌──────────────┐ │ image bytes │ │ + image_base│ └──────┬───────┘ │ ▼ xenia-cpu::iter_disasm(image, base, range) │ yields DisasmItem ▼ xenia-analysis::enrich_section(...).map(|i| RichDisasmItem { i, section, function, label }) │ yields RichDisasmItem ▼ ┌────────────────┼────────────────┐ ▼ ▼ ▼ sinks::duckdb sinks::json sinks::text append_instructions write_jsonl write_instr_line │ │ │ ▼ ▼ ▼ instructions .jsonl .asm table (DuckDB) (one row/line) (formatted) ``` `DecodedInstr` (8 bytes, in decode cache) is unchanged. `DisasmItem` and `RichDisasmItem` only exist at the sink layer. ## Constraint #1 honored: DecodedInstr unchanged Same as Phase 1 — `DecodedInstr` is still the 8-byte cache-resident struct; `DisasmItem` is allocated only in the iterator/sink layer. ## Verification - `cargo build --workspace` clean (one previously-existing analysis warning was fixed during refactor). - `cargo test -p xenia-cpu` — all 168 tests + 10 disasm tests pass (2 new for `iter_disasm`). - `cargo test -p xenia-analysis` — all 9 audit tests pass. - `xenia disasm -n 8` smoke test: same extended-mnemonic output as Phase 1. - `xenia dis --db --json --quiet ` end-to-end smoke test: PENDING (running at write time). ## LOC delta (Phase 2) - xenia-cpu/src/disasm.rs: +60 (DisasmItem + iter_disasm + 2 tests) - xenia-cpu/src/lib.rs: +1 - xenia-analysis/src/disasm.rs: +50 (new file) - xenia-analysis/src/sinks/{mod,duckdb,json,text}.rs: +160 (new files) - xenia-analysis/src/db.rs: −38 (collapsed loop) - xenia-analysis/src/formatter.rs: −15 (annotate_branch deleted, inner loop replaced) - xenia-analysis/Cargo.toml: +1 (serde_json dep) - xenia-app/src/main.rs: +20 (--json flag + sink call) - **Net: ~+240 LOC** (in line with the plan's "+250 / −250 net 0" estimate, modulo the new JSON sink which had no prior counterpart). ## Behavior changes visible to users 1. **New `xenia dis --json ` flag** — emits one structured JSON object per instruction. Schema: `addr, raw, mnemonic, operands, disasm, ext_mnemonic?, ext_operands?, ext_disasm?, branch_target?, section, function?, label?`. 2. Branch-target annotation in the .asm text output is now driven by the structured `branch_target` field (was a regex find of "0x" in the disasm string). Functionally equivalent for direct branches; immune to false-positive matches in non-branch operands containing hex. 3. Three sinks share one decode/format pass per instruction, but db+json+asm output runs decode 3 times (once per sink). Phase 7 / future work could fan out from a single iterator if needed. ## What's next (Phases 3-4) Per [/home/fabi/.claude/plans/ok-execute-your-proposed-refactored-dolphin.md](plan): - **Phase 3**: Split `db.rs` into `ingest_instructions` + `write_analysis_results`; add `target_hex BIGINT` column on `instructions`; add `crates/xenia-analysis/src/sql_views.rs` with `v_branch_xrefs`/`v_call_graph`/`v_reachability_from_entry`/`v_function_first_instruction`/`v_imports_called`; add `--analyze=rust|sql|both` flag (default `rust`). Rust passes (`func.rs`, `xref.rs`) stay default. - **Phase 4**: Replace println-only audits with assert-based JSON-fixture goldens. Expand coverage to base + extended + VMX128 (silent-bug area) + DB schema + ISO-gated end-to-end.