--- name: Disassembler unification — Phase 3 complete (2026-04-27) description: db.rs split into ingest_instructions + write_analysis_results; new sql_views.rs with 5 views; --analyze=rust|sql|both CLI flag; target_hex column on instructions; Rust/SQL cross-check warning in `both` mode. type: project originSessionId: 680cc54c-e77a-4d2d-a11b-ca562e9a68ec --- **Phase 3 of disassembler unification is COMPLETE** (2026-04-27, same session). ## What's in place ### Schema - **`instructions` table** gains a `target_hex BIGINT NULL` column populated from `DisasmText.branch_target`. Indexed via `idx_instructions_target_hex`. Documented in [crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs) module docstring. - **DuckDB sink** ([crates/xenia-analysis/src/sinks/duckdb.rs](crates/xenia-analysis/src/sinks/duckdb.rs)) writes the new column. ### Split DbWriter API ([crates/xenia-analysis/src/db.rs](crates/xenia-analysis/src/db.rs)) - `pub fn ingest_instructions(pe, info, func_analysis, labels)` — creates `instructions` table + indices and streams rows via the iterator + duckdb sink. **No analysis tables.** - `pub fn write_analysis_results(pe, info, func_analysis, labels, xrefs)` — creates `functions`, `labels`, `xrefs` tables + indices. Populated from Rust pass output. - `pub fn write_disasm(...)` — back-compat wrapper that calls both. Existing callers (e.g. `cmd_exec`) keep working unchanged. - `pub fn create_sql_views(&mut self)` — runs the SQL view definitions from `crate::sql_views::ALL_VIEWS`. - `pub fn cross_check_branch_xrefs(&self) -> Result<(u64, u64)>` — returns `(sql_only, rust_only)` row counts for symmetric difference between `v_branch_xrefs` and `xrefs WHERE kind IN ('call','jump','branch')`. ### SQL views ([crates/xenia-analysis/src/sql_views.rs](crates/xenia-analysis/src/sql_views.rs)) — 5 views - `v_branch_xrefs` — derived from `instructions.target_hex` self-join. CASE on mnemonic mirrors `xref.rs` kind logic: `bl`/`bla` → call, `b`/`ba` → jump, `bc*` → branch. - `v_call_graph` — `xrefs ⨝ functions` filtered to `kind = 'call'`. Surfaces caller/callee names. - `v_reachability_from_entry` — recursive CTE seeded from `labels.name = 'entry_point'`, transitive over `xrefs.kind IN ('call','jump','branch')`. `UNION` (not `UNION ALL`) handles call-graph cycles. - `v_function_first_instruction` — `functions ⨝ instructions ON address`. Convenience for inspecting prologues. - `v_imports_called` — `xrefs ⨝ labels` filtered to `xrefs.kind = 'call' AND labels.kind = 'import'`. Per-function import call summary. All views are `CREATE OR REPLACE` — re-running is idempotent. ### CLI ([crates/xenia-app/src/main.rs](crates/xenia-app/src/main.rs)) - New `AnalyzeMode` enum (`Rust` / `Sql` / `Both`) derived `ValueEnum`. - `Dis { ..., analyze: AnalyzeMode }` field with `default_value_t = AnalyzeMode::Rust`. - `cmd_dis` routes through: - Always: `write_base` → `ingest_instructions` → `write_analysis_results` (Rust passes always run, honoring constraint #3). - `Sql` or `Both`: also `create_sql_views`. - `Both`: also `cross_check_branch_xrefs` and log on disagreement (info if both zero, warn otherwise). ## Constraint #3 honored: Rust analysis stays default and functional - Default flag value is `rust`. - Rust passes (`func.rs` + `xref.rs`) ALWAYS run when `--db` is set. The `analyze` flag only controls whether SQL views are *additionally* created. - The `xrefs` table is always populated by Rust passes. `v_branch_xrefs` is an alternative read surface, not a replacement. - Data-ref pass (xref.rs lis+addi/ori register tracking) and function detection (func.rs prologue patterns) remain Rust-only — they are not cleanly relational. ## Verification - `cargo build --workspace`: clean. - `cargo test -p xenia-cpu` / `-p xenia-analysis`: all green (10 disasm tests + 9 audit + 168 cpu). - `xenia dis --analyze=both --db ` smoke verified end-to-end: 1.87M instructions written, 299,615 with `target_hex` (16% — direct branches), all 5 views queryable, cross-check returns `(0, 0)` — Rust and SQL agree on every (source, target, kind) tuple. - Sample reachability: 7,557 of 12,156 functions reachable from entry_point (62%) — sensible for a game with significant dead/unused code. ### Bugs found and fixed during verification 1. **Kind-tag mismatch.** `XrefKind::tag()` ([xref.rs:21-29](crates/xenia-analysis/src/xref.rs)) returns the SHORT tags `"call"` / `"j"` / `"br"` (and `"read"` / `"write"` / `"ref"`). The first version of `v_branch_xrefs` and `cross_check_branch_xrefs` used the LONG names (`'call'` / `'jump'` / `'branch'`) — which the comment in [db.rs](crates/xenia-analysis/src/db.rs) describes for the *trace* table, not `xrefs`. Cross-check returned 195K SQL-only rows. Fixed by changing CASE to `'call'` / `'j'` / `'br'`. **Don't trust the docstring at the top of db.rs** — `branch_trace.kind` uses long names but `xrefs.kind` uses short tags. 2. **Reachability view collapsed to 1 row.** First version seeded with `labels.address` (a single instruction VA) and looked for `xrefs.source = r.addr`. But the entry-point address (`mflr r12`) has no outgoing xref — branches happen at later instructions of the function. Fixed by reformulating as function-level reachability: seed with the function containing the entry_point label, then walk `function → instructions → xrefs → target's enclosing function`. `UNION` handles call-graph cycles. ## LOC delta (Phase 3) - xenia-analysis/src/db.rs: +60 (split write_disasm; new methods) - xenia-analysis/src/sql_views.rs: +120 (NEW) - xenia-analysis/src/sinks/duckdb.rs: +1 line (target_hex column write) - xenia-analysis/src/lib.rs: +1 line (mod sql_views) - xenia-app/src/main.rs: +35 (AnalyzeMode enum + flag + routing + cross-check log) - **Net: +217 LOC**. ## Behavior changes visible to users 1. **New `--analyze=rust|sql|both` flag on `xenia dis`**, default `rust`. Backward compatible — existing scripts behave the same. 2. **New `target_hex BIGINT` column on `instructions` table**. Existing queries work; new column adds query power for SQL-side branch xref derivation. 3. **5 SQL views** available when `--analyze` is `sql` or `both`. Read-only, idempotent. 4. **Cross-check warning** in `both` mode flags any drift between formatter mnemonic strings and `xref.rs` kind classification. ## What's next (Phase 4) Per [/home/fabi/.claude/plans/ok-execute-your-proposed-refactored-dolphin.md](plan): - **Phase 4**: Replace println-only audits with assert-based JSON-fixture goldens. Expand coverage to base + extended + VMX128 (silent-bug area) + DB schema golden + ISO-gated end-to-end consistency.