Files
xenia-rs/crates/xenia-analysis/src/db.rs
MechaCat02 e428ce33aa M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle.
All four are smaller-scope and additive; lockstep determinism
unaffected (analyzer-only changes).

## M9.5 — __CxxFrameHandler scope-table parsing

- New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the
  three documented MSVC FuncInfo signatures (0x19930520/21/22) on
  4-byte alignment. Each match is parsed as the documented struct
  (BE u32 fields), with sanity caps on max_state / n_try_blocks /
  pointer validity.
- Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap
  (TryBlockMapEntry, 20 bytes) into one row each.
- New tables eh_funcinfo, eh_unwind_map, eh_try_blocks.
- Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) /
  10,019 unwind entries / 315 try-blocks.

## M11.5 — Static-init driver chain detection

- New `xenia_analysis::static_init` module. Walks every function
  looking for the canonical _initterm loop: lwz cursor; mtctr;
  bcctrl; addi cursor, cursor, 4 bounded by a compare against another
  constant register. Extracts (array_start, array_end) and reads
  the array.
- Reuses `function_pointer_arrays` table — drivers' arrays land with
  kind='static_init' (replacing M11's prologue-heuristic output where
  the structurally-grounded pattern fires).
- Sylpheed yield: 0 drivers detected — the binary's static-init
  structure does not match the canonical CRT loop. Infrastructure
  ready; future M11.6 can relax.

## VMX vector-store xrefs (M6 follow-up)

- Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31
  dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and
  stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode=
  'x_form_indexed'. Static resolution still requires both rA and rB
  constant.
- Sylpheed yield: 110 newly-detected stvx writes.

## Shift_JIS + UTF-8 localised-string detection (M7 follow-up)

- Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS
  X 0208 lead/trail byte ranges + half-width katakana pass-through)
  and scan_utf8 (2- and 3-byte sequences). At least one multi-byte
  unit required so pure-ASCII strings aren't double-counted.
- SJIS bytes rendered as \xHH escapes for diagnostic readability;
  full SJIS→UTF-8 decoding deferred.
- Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text)
  + 39 UTF-8.

## Tests

- +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state)
- +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern)
- +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string)

Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results
signature updated for new EH parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:36:53 +02:00

1386 lines
60 KiB
Rust
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
//! DuckDB writer for xenia-rs.
//!
//! Layered, streaming writes shared by `extract`, `dis`, and `exec`.
//! Each command's output is a superset of the previous:
//! - `extract --db` -> base tables (metadata, sections, imports)
//! - `dis --db` -> base + disasm tables (functions, labels, instructions, xrefs)
//! - `exec --db` -> base + disasm + opt-in trace tables (exec_trace, import_calls, branch_trace)
//!
//! Bulk inserts use the DuckDB Appender API, which bypasses the SQL layer and
//! writes directly to columnar storage — no transaction batching required.
//!
//! Trace kind values for `branch_trace.kind`:
//! - `"call"` : any branch with LK set (raw & 1 == 1)
//! - `"return"` : bclrx without LK
//! - `"jump"` : bcctrx without LK
//! - `"branch"` : bx/bcx without LK
//!
//! # Schema
//!
//! ## `metadata`
//! Key-value table. One row per XEX header field. Values are strings.
//!
//! | key | value format | meaning |
//! |--------------------|------------------|----------------------------------------------------|
//! | `image_base` | `"0xXXXXXXXX"` | Virtual address where the PE image is mapped |
//! | `entry_point` | `"0xXXXXXXXX"` | Absolute VA of the XEX entry point |
//! | `original_pe_name` | string | Original PE filename from XEX optional headers |
//! | `title_id` | `"0xXXXXXXXX"` | Xbox 360 Title ID (identifies the game) |
//! | `media_id` | `"0xXXXXXXXX"` | Disc/media ID (identifies the specific disc build) |
//!
//! ## `sections`
//! One row per PE section (`.text`, `.data`, etc.).
//! - `name` — PE section name
//! - `virtual_address` — RVA relative to `image_base` where the section is mapped in memory
//! - `virtual_size` — Size in memory; may exceed `raw_size` due to BSS zero-fill
//! - `raw_offset` — Byte offset of section data within the XEX/PE file
//! - `raw_size` — Size of section data on disk
//! - `flags` — `IMAGE_SCN_*` characteristics bit field
//! - `is_code` — `true` if `IMAGE_SCN_CNT_CODE` is set
//!
//! ## `imports`
//! One row per import record from the XEX import descriptor table.
//! - `library` — Module name (e.g. `xboxkrnl.exe`, `xam.xex`)
//! - `ordinal` — Numeric ordinal identifying the export within the library
//! - `name` — Resolved human-readable symbol name; `NULL` if not in symbol table
//! - `record_type` — XEX import record type: `0` = function thunk, `1` = variable
//! - `address` — Absolute VA of the import thunk or variable in the binary
//!
//! ## `functions`
//! One row per detected function (from prologue analysis).
//! - `address` — Absolute VA of the function entry point (PK)
//! - `name` — Symbol name, or `sub_XXXXXXXX` if unresolved
//! - `end_address` — Absolute VA of last instruction + 4 (exclusive end)
//! - `frame_size` — Stack frame size in bytes (from prologue)
//! - `saved_gprs` — Bitmask of GPRs saved in prologue (bit N set ⇒ rN is saved)
//! - `is_leaf` — `true` if the function has no outgoing calls (no `bl`/`blr`)
//! - `is_saverestore` — `true` if this is a `__savegprlr_*`/`__restgprlr_*` compiler stub
//!
//! ## `labels`
//! One row per named address; superset of functions.
//! - `address` — Absolute VA (PK)
//! - `name` — Symbol name
//! - `kind` — One of: `function`, `import`, `saverestore`, `local`, `data`, `other`
//!
//! ## `instructions`
//! One row per disassembled instruction.
//! - `address` — Absolute VA (PK)
//! - `raw` — 4-byte big-endian instruction word as integer
//! - `mnemonic` — Base mnemonic (e.g. `stw`, `bl`, `cmpwi`)
//! - `operands` — Operand string from base disassembly
//! - `disasm` — Full base disassembly string (`mnemonic + " " + operands`)
//! - `ext_mnemonic` — Simplified mnemonic (e.g. `mr` for `or rX,rY,rY`); `NULL` if none
//! - `ext_operands` — Operands for the extended form; `NULL` if none
//! - `ext_disasm` — Full extended disassembly string; `NULL` if none
//! - `target_hex` — Resolved absolute branch target for `b`/`bc` (and link/AA variants); `NULL` for indirect or non-branch instructions. SQL views (`v_branch_xrefs`) self-join on this column.
//! - `section` — Name of the PE section containing this instruction
//! - `function` — VA of the enclosing function; `NULL` if not inside a detected function
//! - `label` — Label name at this address; `NULL` if none
//!
//! ## `xrefs`
//! One row per cross-reference edge (call, jump, data access).
//! - `source` — Absolute VA of the instruction making the reference
//! - `target` — Absolute VA of the referenced destination
//! - `kind` — Reference type as the short tag from [`crate::xref::XrefKind::tag`]:
//! `call`, `j` (jump), `br` (branch), `read` (data_read),
//! `write` (data_write), `ref` (data_ref).
//! Note: this is a different convention from `branch_trace.kind`,
//! which uses the long names (`call` / `return` / `jump` / `branch`).
//! - `instruction` — Mnemonic of the source instruction; `NULL` if address is not in binary
//! - `source_func` — VA of the function containing `source`; `NULL` if unknown
//! - `source_label` — Label at `source`; `NULL` if none
//! - `target_label` — Label at `target`; `NULL` if none
//!
//! ## `exec_trace` *(opt-in: `--trace-instructions`)*
//! One row per executed instruction.
//! - `address` — Absolute VA of the instruction
//! - `cycle` — Monotonic instruction counter (execution order)
//! - `r3`, `r4`, `lr`, `sp` — Snapshot of key GPRs at time of execution
//!
//! ## `import_calls` *(opt-in: `--trace-imports`)*
//! One row per intercepted kernel/import call.
//! - `address` — VA of the import thunk
//! - `cycle` — Instruction counter at point of interception
//! - `module` — Library name (e.g. `xboxkrnl.exe`)
//! - `ordinal` — Numeric ordinal within the module
//! - `name` — Resolved symbol name
//! - `arg_r3``arg_r6` — First four call arguments (PowerPC ABI: r3r6)
//! - `return_value` — Value in r3 after the call returns
//!
//! ## `branch_trace` *(opt-in: `--trace-branches`)*
//! One row per taken branch.
//! - `cycle` — Instruction counter
//! - `source` — VA of the branch instruction
//! - `target` — VA of the branch destination
//! - `kind` — `call`, `return`, `jump`, or `branch` (see top-level doc)
//! - `lr` — Link register value at time of branch
use std::collections::HashMap;
use std::path::Path;
use duckdb::{Connection, params};
use crate::func::FuncAnalysis;
use crate::xref::{XrefMap, resolve_source_label};
use crate::formatter::DisasmInfo;
const DEFAULT_BATCH_SIZE: u64 = 100_000;
/// Rows per trace buffer flush. Configurable via `XENIA_DB_BATCH_SIZE` env var (default 100_000).
/// Applies to `exec_trace` and `branch_trace` buffer thresholds.
/// `import_calls` always flushes at 1000 — low volume, not worth scaling.
fn batch_size() -> u64 {
use std::sync::OnceLock;
static CACHED: OnceLock<u64> = OnceLock::new();
*CACHED.get_or_init(|| {
std::env::var("XENIA_DB_BATCH_SIZE")
.ok()
.and_then(|s| s.parse::<u64>().ok())
.filter(|&n| n > 0)
.unwrap_or(DEFAULT_BATCH_SIZE)
})
}
pub struct ExecTraceEntry {
pub address: u32,
pub cycle: u64,
pub r3: u64,
pub r4: u64,
pub lr: u64,
pub sp: u64,
}
pub struct ImportCallEntry {
pub address: u32,
pub cycle: u64,
pub module: String,
pub ordinal: u16,
pub name: String,
pub arg_r3: u64,
pub arg_r4: u64,
pub arg_r5: u64,
pub arg_r6: u64,
pub return_value: u64,
}
pub struct BranchTraceEntry {
pub source: u32,
pub target: u32,
pub cycle: u64,
pub kind: &'static str,
pub lr: u64,
}
pub struct DbWriter {
conn: Connection,
exec_buffer: Vec<ExecTraceEntry>,
import_buffer: Vec<ImportCallEntry>,
branch_buffer: Vec<BranchTraceEntry>,
exec_count: u64,
import_count: u64,
branch_count: u64,
trace_instructions: bool,
trace_imports: bool,
trace_branches: bool,
}
impl DbWriter {
/// Open a fresh database at `path`, removing any existing file first.
pub fn open_fresh(path: &Path) -> anyhow::Result<Self> {
if path.exists() {
std::fs::remove_file(path)?;
}
let conn = Connection::open(path)?;
let cap = batch_size() as usize;
Ok(Self {
conn,
exec_buffer: Vec::with_capacity(cap),
import_buffer: Vec::with_capacity(1024),
branch_buffer: Vec::with_capacity(cap),
exec_count: 0,
import_count: 0,
branch_count: 0,
trace_instructions: false,
trace_imports: false,
trace_branches: false,
})
}
// ── Base layer (written by extract/dis/exec) ─────────────────────────────
/// Write metadata, sections, imports tables and their indices.
#[tracing::instrument(skip_all, name = "db.write_base")]
pub fn write_base(&mut self, info: &DisasmInfo) -> anyhow::Result<()> {
self.conn.execute_batch("
CREATE TABLE metadata (
key VARCHAR PRIMARY KEY, -- header field name
value VARCHAR NOT NULL -- hex-formatted or plain string value
);
CREATE TABLE sections (
name VARCHAR NOT NULL, -- PE section name (e.g. .text, .rdata)
virtual_address BIGINT NOT NULL, -- RVA relative to image_base
virtual_size BIGINT NOT NULL, -- size in memory; may exceed raw_size (BSS)
raw_offset BIGINT NOT NULL, -- byte offset of section data in the file
raw_size BIGINT NOT NULL, -- size of section data on disk
flags BIGINT NOT NULL, -- IMAGE_SCN_* characteristics bit field
is_code BOOLEAN NOT NULL -- true if IMAGE_SCN_CNT_CODE is set
);
CREATE TABLE imports (
library VARCHAR NOT NULL, -- module name (e.g. xboxkrnl.exe, xam.xex)
ordinal BIGINT NOT NULL, -- ordinal identifying the export within the library
name VARCHAR, -- resolved symbol name; NULL if not in symbol table
record_type BIGINT NOT NULL, -- 0 = function thunk, 1 = variable
address BIGINT NOT NULL -- absolute VA of the thunk or variable
);
")?;
insert_metadata(&self.conn, info)?;
insert_sections(&self.conn, info.sections)?;
insert_imports(&self.conn, info)?;
self.conn.execute_batch("
CREATE INDEX idx_imports_library ON imports(library);
CREATE INDEX idx_imports_name ON imports(name);
")?;
Ok(())
}
// ── Disasm layer (written by dis/exec) ───────────────────────────────────
/// Phase-3 ingest pass — purely mechanical disasm rows. Creates the
/// `instructions` table (and its indices) and streams every code-section
/// instruction through the iterator + DuckDB sink. Does NOT touch
/// `functions` / `labels` / `xrefs` — that's [`Self::write_analysis_results`].
///
/// `func_analysis` and `labels` are still required at this layer because
/// each row carries the rolling-window `function` and `label` columns for
/// downstream queries.
#[tracing::instrument(skip_all, name = "db.ingest_instructions")]
pub fn ingest_instructions(
&mut self,
pe: &[u8],
info: &DisasmInfo,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
self.conn.execute_batch("
CREATE TABLE instructions (
address BIGINT PRIMARY KEY, -- absolute VA
raw BIGINT NOT NULL, -- 4-byte big-endian instruction word as integer
mnemonic VARCHAR NOT NULL, -- base mnemonic (e.g. stw, bl, cmpwi)
operands VARCHAR NOT NULL, -- operand string from base disassembly
disasm VARCHAR NOT NULL, -- full base disassembly (mnemonic + operands)
ext_mnemonic VARCHAR, -- simplified mnemonic (e.g. mr); NULL if none
ext_operands VARCHAR, -- operands for the extended form; NULL if none
ext_disasm VARCHAR, -- full extended disassembly string; NULL if none
target_hex BIGINT, -- resolved absolute target for direct branches; NULL for indirect/non-branch
section VARCHAR NOT NULL, -- PE section name containing this instruction
function BIGINT, -- VA of the enclosing function; NULL if unknown
label VARCHAR -- label at this address; NULL if none
);
")?;
insert_instructions_streaming(&self.conn, pe, info, func_analysis, labels)?;
let indices = [
("idx_instructions_function", "CREATE INDEX idx_instructions_function ON instructions(function)"),
("idx_instructions_mnemonic", "CREATE INDEX idx_instructions_mnemonic ON instructions(mnemonic)"),
("idx_instructions_ext_mnemonic", "CREATE INDEX idx_instructions_ext_mnemonic ON instructions(ext_mnemonic)"),
("idx_instructions_section", "CREATE INDEX idx_instructions_section ON instructions(section)"),
("idx_instructions_label", "CREATE INDEX idx_instructions_label ON instructions(label)"),
("idx_instructions_target_hex", "CREATE INDEX idx_instructions_target_hex ON instructions(target_hex)"),
];
for (name, sql) in indices {
tracing::debug!(index = name, "creating instructions index");
self.conn.execute_batch(sql)?;
}
Ok(())
}
/// Phase-3 analyze pass — writes the Rust-pass-derived tables
/// (`functions`, `labels`, `xrefs`) and their indices. Always executes
/// in `--analyze=rust` and `--analyze=both` modes; skipped only when
/// the caller deliberately chooses a Rust-free DB layout.
///
/// `vtables` is the M3 result; pass an empty slice when the caller has
/// not run the vtable scan (the tables are still created, just empty).
/// `strings` is the M7 result; same convention. `funcptr_arrays` is the
/// M8/M11 result. `typed_ind` is the M5.5 result. `eh_records` is the
/// M9.5 result.
#[tracing::instrument(skip_all, name = "db.write_analysis_results")]
pub fn write_analysis_results(
&mut self,
pe: &[u8],
info: &DisasmInfo,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
xrefs: &XrefMap,
vtables: &[crate::vtables::Vtable],
strings: &[crate::strings::DetectedString],
funcptr_arrays: &[crate::funcptr_arrays::FuncPtrArray],
typed_ind: Option<&crate::ind_dispatch_typed::TypedIndirectResult>,
eh_records: &[crate::eh_scope::EhFuncInfo],
) -> anyhow::Result<()> {
self.conn.execute_batch("
CREATE TABLE functions (
address BIGINT PRIMARY KEY, -- absolute VA of entry point
name VARCHAR NOT NULL, -- symbol name, or sub_XXXXXXXX if unresolved
end_address BIGINT NOT NULL, -- VA of last instruction + 4 (exclusive end)
frame_size BIGINT NOT NULL, -- stack frame size in bytes (from prologue)
saved_gprs BIGINT NOT NULL, -- bitmask of GPRs saved in prologue (bit N = rN)
is_leaf BOOLEAN NOT NULL, -- true if the function has no outgoing calls
is_saverestore BOOLEAN NOT NULL, -- true if __savegprlr_* / __restgprlr_* stub
pdata_validated BOOLEAN NOT NULL, -- true if .pdata RUNTIME_FUNCTION exists at this VA
pdata_length BIGINT, -- length in bytes per .pdata; NULL if no pdata entry
has_eh BOOLEAN NOT NULL -- M9: pdata exception-flag bit set; function has C++ EH/SEH
);
CREATE TABLE pdata_entries (
begin_address BIGINT PRIMARY KEY, -- absolute VA of function start (RUNTIME_FUNCTION.BeginAddress)
end_address BIGINT NOT NULL, -- begin_address + function_length (exclusive)
function_length BIGINT NOT NULL, -- function size in bytes
prolog_length BIGINT NOT NULL, -- prolog size in bytes
flags BIGINT NOT NULL -- raw 2-bit flags (bit 1=32-bit-code, bit 0=exception)
);
CREATE TABLE labels (
address BIGINT PRIMARY KEY, -- absolute VA
name VARCHAR NOT NULL, -- symbol name
kind VARCHAR NOT NULL -- function | import | saverestore | local | data | other
);
CREATE TABLE vtables (
address BIGINT PRIMARY KEY, -- absolute VA of vtable[0]
length BIGINT NOT NULL, -- number of method slots
col_address BIGINT, -- VA of CompleteObjectLocator (NULL when no RTTI)
class_name VARCHAR NOT NULL, -- demangled class name OR ANON_Class_<hash> when stripped
rtti_present BOOLEAN NOT NULL, -- true when COL → TypeDescriptor walk succeeded
base_classes_json VARCHAR -- JSON array of base class names (NULL if none / parse failure)
);
CREATE TABLE methods (
vtable_address BIGINT NOT NULL, -- vtable this slot belongs to
slot BIGINT NOT NULL, -- 0-based slot index
function_address BIGINT NOT NULL, -- VA of the function this slot points at
mangled_name VARCHAR, -- raw label name when mangled (?...)
demangled_name VARCHAR, -- LLVM-style demangled output
PRIMARY KEY (vtable_address, slot)
);
CREATE TABLE classes (
name VARCHAR PRIMARY KEY, -- class name (demangled or ANON_*)
vtable_address BIGINT NOT NULL, -- representative vtable (first detected)
rtti_present BOOLEAN NOT NULL,
base_classes_json VARCHAR -- JSON of base class names (NULL when stripped)
);
CREATE TABLE strings (
address BIGINT PRIMARY KEY, -- absolute VA of first byte
encoding VARCHAR NOT NULL, -- 'ascii' or 'utf16le'
length BIGINT NOT NULL, -- length in bytes (excluding NUL terminator)
content VARCHAR NOT NULL -- UTF-8 representation of the string
);
CREATE TABLE tls_info (
raw_data_start BIGINT NOT NULL, -- VA of TLS template start
raw_data_end BIGINT NOT NULL, -- VA one-past-end of TLS template
index_address BIGINT NOT NULL, -- VA of u32 the loader writes the assigned slot index into
callback_array BIGINT NOT NULL, -- VA of zero-terminated callback array (0 if none)
zero_fill_size BIGINT NOT NULL, -- bytes of zero-fill appended after raw template
characteristics BIGINT NOT NULL -- IMAGE_TLS_DIRECTORY characteristics flags
);
CREATE TABLE tls_callbacks (
slot BIGINT PRIMARY KEY, -- 0-based index in the callback array
address BIGINT NOT NULL -- VA of callback function
);
CREATE TABLE function_pointer_arrays (
address BIGINT PRIMARY KEY, -- absolute VA of the array's first slot
length BIGINT NOT NULL, -- number of slots
kind VARCHAR NOT NULL -- 'vtable' (M3) | 'dispatch_table' (M8) | 'static_init' (M11)
);
CREATE TABLE function_pointer_array_entries (
array_address BIGINT NOT NULL, -- FK to function_pointer_arrays.address
slot BIGINT NOT NULL, -- 0-based slot index
function_address BIGINT NOT NULL, -- VA of the function this slot points at
PRIMARY KEY (array_address, slot)
);
-- M5.5 — typed indirect-dispatch resolutions. Each row is one
-- bcctrl site that matched the canonical lwz vt, off(this);
-- lwz fn, slot(vt); mtctr; bcctrl pattern. candidate_count > 1
-- means the analysis could not pick a single class; downstream
-- queries should treat such rows as reachability-only.
CREATE TABLE indirect_dispatch_sites (
dispatch_pc BIGINT PRIMARY KEY,
vptr_offset BIGINT NOT NULL,
slot BIGINT NOT NULL,
candidate_count BIGINT NOT NULL
);
-- M5.5 — one row per (dispatch site × candidate vtable). The
-- ind_call xref edges in the `xrefs` table are derived from
-- this; this view lets you join back to vtable / method info.
CREATE TABLE indirect_dispatch_candidates (
dispatch_pc BIGINT NOT NULL,
vtable_address BIGINT NOT NULL,
method_address BIGINT NOT NULL,
PRIMARY KEY (dispatch_pc, vtable_address)
);
-- M5.5 — every detected `stw rVtable, vptr_off(rThis)` writer
-- found in any function. Useful for diagnosing why a class
-- has (or does not have) coverage in the dispatch resolver.
CREATE TABLE vptr_writes (
writer_pc BIGINT NOT NULL,
vtable_address BIGINT NOT NULL,
vptr_offset BIGINT NOT NULL,
writer_function BIGINT NOT NULL,
PRIMARY KEY (writer_pc, vtable_address, vptr_offset)
);
-- M9.5 — MSVC __CxxFrameHandler scope-table records found by
-- magic-number scan in .rdata.
CREATE TABLE eh_funcinfo (
address BIGINT PRIMARY KEY,
magic BIGINT NOT NULL, -- 0x19930520/21/22
max_state BIGINT NOT NULL,
p_unwind_map BIGINT NOT NULL,
n_try_blocks BIGINT NOT NULL,
p_try_block_map BIGINT NOT NULL,
n_ip_map_entries BIGINT NOT NULL,
p_ip_to_state_map BIGINT NOT NULL,
p_es_type_list BIGINT,
eh_flags BIGINT
);
CREATE TABLE eh_unwind_map (
funcinfo_address BIGINT NOT NULL, -- FK to eh_funcinfo.address
state_index BIGINT NOT NULL,
to_state BIGINT NOT NULL,
action_pc BIGINT NOT NULL,
PRIMARY KEY (funcinfo_address, state_index)
);
CREATE TABLE eh_try_blocks (
funcinfo_address BIGINT NOT NULL, -- FK to eh_funcinfo.address
try_index BIGINT NOT NULL,
try_low BIGINT NOT NULL,
try_high BIGINT NOT NULL,
catch_high BIGINT NOT NULL,
n_catches BIGINT NOT NULL,
p_handler_array BIGINT NOT NULL,
PRIMARY KEY (funcinfo_address, try_index)
);
CREATE TABLE demangled_names (
address BIGINT, -- VA the mangled name is associated with; NULL when from a non-address source (e.g. RTTI-only string)
mangled VARCHAR NOT NULL, -- original mangled symbol (e.g. ?Foo@Bar@@QEAAXXZ)
raw_demangled VARCHAR NOT NULL, -- LLVM-style demangled output (or mangled string on parse failure)
namespace_path VARCHAR, -- e.g. xe::apu (NULL = global / parser failure)
class_name VARCHAR, -- e.g. AudioSystem (NULL = free function / parser failure)
method_name VARCHAR, -- e.g. Setup (NULL on parser failure)
params_signature VARCHAR -- contents of the outermost (...) (NULL = not a function)
);
CREATE TABLE xrefs (
source BIGINT NOT NULL, -- VA of the referencing instruction
target BIGINT NOT NULL, -- VA of the referenced destination
kind VARCHAR NOT NULL, -- call | ind_call | j | br | read | write | ref
addr_mode VARCHAR, -- M6 sub-classification of how source computes target (NULL for control-flow)
instruction VARCHAR, -- mnemonic of source instruction; NULL if not in binary
source_func BIGINT, -- VA of the function containing source; NULL if unknown
source_label VARCHAR, -- label at source; NULL if none
target_label VARCHAR -- label at target; NULL if none
);
")?;
insert_functions(&self.conn, func_analysis, labels)?;
insert_pdata_entries(&self.conn, &func_analysis.pdata_entries)?;
insert_labels(&self.conn, labels)?;
insert_demangled_from_labels(&self.conn, labels, info.import_libraries)?;
insert_vtables(&self.conn, vtables, pe, info.image_base)?;
insert_methods_and_classes(&self.conn, vtables, labels)?;
insert_strings(&self.conn, strings)?;
insert_funcptr_arrays(&self.conn, funcptr_arrays)?;
if let Some(t) = typed_ind {
insert_typed_ind_dispatch(&self.conn, t)?;
}
insert_eh_records(&self.conn, eh_records)?;
insert_xrefs_streaming(&self.conn, xrefs, pe, info.image_base, func_analysis, labels)?;
let indices = [
("idx_functions_name", "CREATE INDEX idx_functions_name ON functions(name)"),
("idx_functions_pdata_validated", "CREATE INDEX idx_functions_pdata_validated ON functions(pdata_validated)"),
("idx_functions_has_eh", "CREATE INDEX idx_functions_has_eh ON functions(has_eh)"),
("idx_labels_kind", "CREATE INDEX idx_labels_kind ON labels(kind)"),
("idx_labels_name", "CREATE INDEX idx_labels_name ON labels(name)"),
("idx_demangled_address", "CREATE INDEX idx_demangled_address ON demangled_names(address)"),
("idx_demangled_class", "CREATE INDEX idx_demangled_class ON demangled_names(class_name)"),
("idx_demangled_method", "CREATE INDEX idx_demangled_method ON demangled_names(method_name)"),
("idx_methods_function", "CREATE INDEX idx_methods_function ON methods(function_address)"),
("idx_classes_rtti", "CREATE INDEX idx_classes_rtti ON classes(rtti_present)"),
("idx_strings_encoding", "CREATE INDEX idx_strings_encoding ON strings(encoding)"),
("idx_xrefs_addr_mode", "CREATE INDEX idx_xrefs_addr_mode ON xrefs(addr_mode)"),
("idx_fparrays_kind", "CREATE INDEX idx_fparrays_kind ON function_pointer_arrays(kind)"),
("idx_fpentries_function", "CREATE INDEX idx_fpentries_function ON function_pointer_array_entries(function_address)"),
("idx_indcand_method", "CREATE INDEX idx_indcand_method ON indirect_dispatch_candidates(method_address)"),
("idx_indcand_vtable", "CREATE INDEX idx_indcand_vtable ON indirect_dispatch_candidates(vtable_address)"),
("idx_indsites_offset_slot", "CREATE INDEX idx_indsites_offset_slot ON indirect_dispatch_sites(vptr_offset, slot)"),
("idx_vptrw_vtable", "CREATE INDEX idx_vptrw_vtable ON vptr_writes(vtable_address)"),
("idx_vptrw_offset", "CREATE INDEX idx_vptrw_offset ON vptr_writes(vptr_offset)"),
("idx_xrefs_target", "CREATE INDEX idx_xrefs_target ON xrefs(target)"),
("idx_xrefs_source", "CREATE INDEX idx_xrefs_source ON xrefs(source)"),
("idx_xrefs_source_func", "CREATE INDEX idx_xrefs_source_func ON xrefs(source_func)"),
("idx_xrefs_kind", "CREATE INDEX idx_xrefs_kind ON xrefs(kind)"),
("idx_xrefs_instruction", "CREATE INDEX idx_xrefs_instruction ON xrefs(instruction)"),
("idx_xrefs_target_label", "CREATE INDEX idx_xrefs_target_label ON xrefs(target_label)"),
];
for (name, sql) in indices {
tracing::debug!(index = name, "creating analysis index");
self.conn.execute_batch(sql)?;
}
Ok(())
}
/// Back-compat wrapper for callers that want the full pre-Phase-3
/// "everything in one shot" behaviour. Equivalent to
/// `ingest_instructions` + `write_analysis_results` with no M3 vtables /
/// M7 strings.
#[tracing::instrument(skip_all, name = "db.write_disasm")]
pub fn write_disasm(
&mut self,
pe: &[u8],
info: &DisasmInfo,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
xrefs: &XrefMap,
) -> anyhow::Result<()> {
self.ingest_instructions(pe, info, func_analysis, labels)?;
self.write_analysis_results(pe, info, func_analysis, labels, xrefs, &[], &[], &[], None, &[])?;
Ok(())
}
/// M10 — write the parsed `.tls` directory + callback array. No-op
/// when `tls` is `None` (binary has no `.tls` section).
#[tracing::instrument(skip_all, name = "db.write_tls")]
pub fn write_tls(
&mut self,
tls: Option<&xenia_xex::tls::TlsInfo>,
) -> anyhow::Result<()> {
let Some(t) = tls else { return Ok(()); };
self.conn.execute(
"INSERT INTO tls_info (raw_data_start, raw_data_end, index_address,
callback_array, zero_fill_size, characteristics)
VALUES (?, ?, ?, ?, ?, ?)",
params![
t.raw_data_start as i64,
t.raw_data_end as i64,
t.index_address as i64,
t.callback_array as i64,
t.zero_fill_size as i64,
t.characteristics as i64,
],
)?;
let mut stmt = self.conn.prepare(
"INSERT INTO tls_callbacks (slot, address) VALUES (?, ?)"
)?;
for (i, cb) in t.callbacks.iter().enumerate() {
stmt.execute(params![i as i64, cb.address as i64])?;
}
metrics::counter!("db.rows", "table" => "tls_callbacks").increment(t.callbacks.len() as u64);
tracing::info!(rows = t.callbacks.len(), table = "tls_callbacks", "tls write complete");
Ok(())
}
/// Phase-3 SQL-views layer — defines additive read-only views over
/// `instructions` (and optionally `xrefs`/`functions`/`labels`).
/// See [`crate::sql_views`] for the SQL definitions.
///
/// Called when `--analyze=sql` or `--analyze=both` is in effect.
#[tracing::instrument(skip_all, name = "db.create_sql_views")]
pub fn create_sql_views(&mut self) -> anyhow::Result<()> {
for (name, sql) in crate::sql_views::ALL_VIEWS {
tracing::debug!(view = name, "creating SQL view");
self.conn.execute_batch(sql)?;
}
Ok(())
}
/// Cross-check: count branch xrefs found by the SQL view that are absent
/// from the Rust-pass `xrefs` table (and vice versa). Returns
/// `(sql_only, rust_only)` row counts. Both should be zero — the two
/// surfaces produce identical edges by construction. A non-zero count
/// signals drift between the formatter's `mnemonic` column and
/// `xref.rs`'s opcode classification, and is logged as a warning by the
/// caller.
#[tracing::instrument(skip_all, name = "db.cross_check_branch_xrefs")]
pub fn cross_check_branch_xrefs(&self) -> anyhow::Result<(u64, u64)> {
let sql_only: i64 = self.conn.query_row(
"SELECT COUNT(*) FROM v_branch_xrefs vb \
LEFT JOIN xrefs x \
ON x.source = vb.source AND x.target = vb.target AND x.kind = vb.kind \
WHERE x.source IS NULL",
[], |row| row.get(0)
)?;
let rust_only: i64 = self.conn.query_row(
"SELECT COUNT(*) FROM xrefs x \
LEFT JOIN v_branch_xrefs vb \
ON vb.source = x.source AND vb.target = x.target AND vb.kind = x.kind \
WHERE x.kind IN ('call','j','br') AND vb.source IS NULL",
[], |row| row.get(0)
)?;
Ok((sql_only as u64, rust_only as u64))
}
// ── Trace layer (written by exec when flags enabled) ─────────────────────
/// Create the opt-in trace tables. No-op if all flags are false.
pub fn prepare_trace_tables(
&mut self,
trace_instructions: bool,
trace_imports: bool,
trace_branches: bool,
) -> anyhow::Result<()> {
self.trace_instructions = trace_instructions;
self.trace_imports = trace_imports;
self.trace_branches = trace_branches;
if trace_instructions {
self.conn.execute_batch("
CREATE TABLE exec_trace (
address BIGINT NOT NULL, -- absolute VA of the instruction
cycle BIGINT NOT NULL, -- monotonic instruction counter (execution order)
r3 BIGINT NOT NULL, -- r3 at time of execution
r4 BIGINT NOT NULL, -- r4 at time of execution
lr BIGINT NOT NULL, -- link register
sp BIGINT NOT NULL -- stack pointer
);
")?;
}
if trace_imports {
self.conn.execute_batch("
CREATE TABLE import_calls (
address BIGINT NOT NULL, -- VA of the import thunk
cycle BIGINT NOT NULL, -- instruction counter at interception
module VARCHAR NOT NULL, -- library name (e.g. xboxkrnl.exe)
ordinal BIGINT NOT NULL, -- ordinal within the module
name VARCHAR NOT NULL, -- resolved symbol name
arg_r3 BIGINT NOT NULL, -- first argument (r3)
arg_r4 BIGINT NOT NULL, -- second argument (r4)
arg_r5 BIGINT NOT NULL, -- third argument (r5)
arg_r6 BIGINT NOT NULL, -- fourth argument (r6)
return_value BIGINT NOT NULL -- r3 after the call returns
);
")?;
}
if trace_branches {
self.conn.execute_batch("
CREATE TABLE branch_trace (
cycle BIGINT NOT NULL, -- instruction counter
source BIGINT NOT NULL, -- VA of the branch instruction
target BIGINT NOT NULL, -- VA of the branch destination
kind VARCHAR NOT NULL, -- call | return | jump | branch
lr BIGINT NOT NULL -- link register at time of branch
);
")?;
}
Ok(())
}
pub fn log_instruction(&mut self, entry: ExecTraceEntry) {
if !self.trace_instructions { return; }
self.exec_buffer.push(entry);
if self.exec_buffer.len() as u64 >= batch_size() {
self.flush_exec();
}
}
pub fn log_import_call(&mut self, entry: ImportCallEntry) {
if !self.trace_imports { return; }
self.import_buffer.push(entry);
if self.import_buffer.len() >= 1000 {
self.flush_imports();
}
}
pub fn log_branch(&mut self, entry: BranchTraceEntry) {
if !self.trace_branches { return; }
self.branch_buffer.push(entry);
if self.branch_buffer.len() as u64 >= batch_size() {
self.flush_branches();
}
}
fn flush_exec(&mut self) {
if self.exec_buffer.is_empty() { return; }
let mut appender = self.conn.appender("exec_trace").unwrap();
for e in &self.exec_buffer {
appender.append_row(params![
e.address as i64,
e.cycle as i64,
e.r3 as i64,
e.r4 as i64,
e.lr as i64,
e.sp as i64,
]).ok();
}
appender.flush().ok();
self.exec_count += self.exec_buffer.len() as u64;
self.exec_buffer.clear();
}
fn flush_imports(&mut self) {
if self.import_buffer.is_empty() { return; }
let mut appender = self.conn.appender("import_calls").unwrap();
for e in &self.import_buffer {
appender.append_row(params![
e.address as i64,
e.cycle as i64,
e.module.as_str(),
e.ordinal as i64,
e.name.as_str(),
e.arg_r3 as i64,
e.arg_r4 as i64,
e.arg_r5 as i64,
e.arg_r6 as i64,
e.return_value as i64,
]).ok();
}
appender.flush().ok();
self.import_count += self.import_buffer.len() as u64;
self.import_buffer.clear();
}
fn flush_branches(&mut self) {
if self.branch_buffer.is_empty() { return; }
let mut appender = self.conn.appender("branch_trace").unwrap();
for e in &self.branch_buffer {
appender.append_row(params![
e.cycle as i64,
e.source as i64,
e.target as i64,
e.kind,
e.lr as i64,
]).ok();
}
appender.flush().ok();
self.branch_count += self.branch_buffer.len() as u64;
self.branch_buffer.clear();
}
/// Flush remaining trace buffers and create their indices.
#[tracing::instrument(skip_all, name = "db.finalize_traces")]
pub fn finalize_traces(&mut self) -> anyhow::Result<()> {
self.flush_exec();
self.flush_imports();
self.flush_branches();
if self.trace_instructions {
tracing::debug!("creating idx_exec_trace_address");
self.conn.execute_batch("CREATE INDEX idx_exec_trace_address ON exec_trace(address);")?;
tracing::debug!("creating idx_exec_trace_cycle");
self.conn.execute_batch("CREATE INDEX idx_exec_trace_cycle ON exec_trace(cycle);")?;
}
if self.trace_imports {
tracing::debug!("creating idx_import_calls_name");
self.conn.execute_batch("CREATE INDEX idx_import_calls_name ON import_calls(name);")?;
tracing::debug!("creating idx_import_calls_cycle");
self.conn.execute_batch("CREATE INDEX idx_import_calls_cycle ON import_calls(cycle);")?;
}
if self.trace_branches {
tracing::debug!("creating idx_branch_trace_source");
self.conn.execute_batch("CREATE INDEX idx_branch_trace_source ON branch_trace(source);")?;
tracing::debug!("creating idx_branch_trace_target");
self.conn.execute_batch("CREATE INDEX idx_branch_trace_target ON branch_trace(target);")?;
tracing::debug!("creating idx_branch_trace_kind");
self.conn.execute_batch("CREATE INDEX idx_branch_trace_kind ON branch_trace(kind);")?;
tracing::debug!("creating idx_branch_trace_cycle");
self.conn.execute_batch("CREATE INDEX idx_branch_trace_cycle ON branch_trace(cycle);")?;
}
metrics::counter!("db.rows", "table" => "exec_trace").increment(self.exec_count);
metrics::counter!("db.rows", "table" => "import_calls").increment(self.import_count);
metrics::counter!("db.rows", "table" => "branch_trace").increment(self.branch_count);
tracing::info!(
instructions = self.exec_count,
imports = self.import_count,
branches = self.branch_count,
"trace totals"
);
Ok(())
}
}
/// Backwards-compatible wrapper that writes the full base + disasm layers.
pub fn write_db(
path: &Path,
pe: &[u8],
info: &DisasmInfo,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
_import_map: &HashMap<u32, String>,
xrefs: &XrefMap,
) -> anyhow::Result<()> {
let mut w = DbWriter::open_fresh(path)?;
w.write_base(info)?;
w.write_disasm(pe, info, func_analysis, labels, xrefs)?;
Ok(())
}
// ── Helpers ────────────────────────────────────────────────────────────────
fn insert_metadata(conn: &Connection, info: &DisasmInfo) -> anyhow::Result<()> {
let mut stmt = conn.prepare("INSERT INTO metadata (key, value) VALUES (?, ?)")?;
stmt.execute(params!["image_base", format!("0x{:08X}", info.image_base)])?;
stmt.execute(params!["entry_point", format!("0x{:08X}", info.entry_point)])?;
if let Some(name) = info.original_pe_name {
stmt.execute(params!["original_pe_name", name])?;
}
if let Some(title_id) = info.title_id {
stmt.execute(params!["title_id", format!("0x{:08X}", title_id)])?;
}
if let Some(media_id) = info.media_id {
stmt.execute(params!["media_id", format!("0x{:08X}", media_id)])?;
}
Ok(())
}
fn insert_sections(conn: &Connection, sections: &[xenia_xex::pe::PeSection]) -> anyhow::Result<()> {
let mut stmt = conn.prepare(
"INSERT INTO sections (name, virtual_address, virtual_size, raw_offset, raw_size, flags, is_code)
VALUES (?, ?, ?, ?, ?, ?, ?)"
)?;
for s in sections {
stmt.execute(params![
s.name,
s.virtual_address as i64,
s.virtual_size as i64,
s.raw_offset as i64,
s.raw_size as i64,
s.flags as i64,
s.is_code(),
])?;
}
Ok(())
}
fn insert_imports(conn: &Connection, info: &DisasmInfo) -> anyhow::Result<()> {
let mut stmt = conn.prepare(
"INSERT INTO imports (library, ordinal, name, record_type, address)
VALUES (?, ?, ?, ?, ?)"
)?;
for lib in info.import_libraries {
for imp in &lib.imports {
let resolved = crate::resolve_ordinal(&lib.name, imp.ordinal);
stmt.execute(params![
lib.name,
imp.ordinal as i64,
resolved,
imp.record_type as i64,
imp.address as i64,
])?;
}
}
Ok(())
}
fn insert_functions(
conn: &Connection,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
let mut stmt = conn.prepare(
"INSERT INTO functions
(address, name, end_address, frame_size, saved_gprs, is_leaf, is_saverestore,
pdata_validated, pdata_length, has_eh)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)"
)?;
for (&addr, fi) in &func_analysis.functions {
let name = labels.get(&addr)
.cloned()
.unwrap_or_else(|| format!("sub_{addr:08X}"));
stmt.execute(params![
addr as i64,
name,
fi.end as i64,
fi.frame_size as i64,
fi.saved_gprs as i64,
fi.is_leaf,
fi.is_saverestore,
fi.pdata_validated,
fi.pdata_length.map(|n| n as i64),
fi.has_eh,
])?;
}
Ok(())
}
fn insert_vtables(
conn: &Connection,
vtables: &[crate::vtables::Vtable],
_pe: &[u8],
_image_base: u32,
) -> anyhow::Result<()> {
if vtables.is_empty() { return Ok(()); }
let mut stmt = conn.prepare(
"INSERT INTO vtables
(address, length, col_address, class_name, rtti_present, base_classes_json)
VALUES (?, ?, ?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
let mut count = 0u64;
for v in vtables {
stmt.execute(params![
v.address as i64,
v.length as i64,
v.col_address.map(|a| a as i64),
v.class_name.as_str(),
v.rtti_present,
v.base_classes_json.as_deref(),
])?;
count += 1;
}
metrics::counter!("db.rows", "table" => "vtables").increment(count);
tracing::info!(rows = count, table = "vtables", "bulk insert complete");
Ok(())
}
fn insert_methods_and_classes(
conn: &Connection,
vtables: &[crate::vtables::Vtable],
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
if vtables.is_empty() { return Ok(()); }
// methods rows
let methods = crate::vtables::methods_table(vtables, labels);
if !methods.is_empty() {
let mut stmt = conn.prepare(
"INSERT INTO methods
(vtable_address, slot, function_address, mangled_name, demangled_name)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
for (vt_addr, slot, fn_addr, mangled, demangled) in &methods {
stmt.execute(params![
*vt_addr as i64,
*slot as i64,
*fn_addr as i64,
mangled.as_deref(),
demangled.as_deref(),
])?;
}
metrics::counter!("db.rows", "table" => "methods").increment(methods.len() as u64);
tracing::info!(rows = methods.len(), table = "methods", "bulk insert complete");
}
// classes rows (deduped by class_name, first-detected wins)
let classes = crate::vtables::classes_table(vtables);
if !classes.is_empty() {
let mut stmt = conn.prepare(
"INSERT INTO classes
(name, vtable_address, rtti_present, base_classes_json)
VALUES (?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
for (name, vt_addr, rtti, bases) in &classes {
stmt.execute(params![
name.as_str(),
*vt_addr as i64,
*rtti,
bases.as_deref(),
])?;
}
metrics::counter!("db.rows", "table" => "classes").increment(classes.len() as u64);
tracing::info!(rows = classes.len(), table = "classes", "bulk insert complete");
}
Ok(())
}
fn insert_strings(
conn: &Connection,
strings: &[crate::strings::DetectedString],
) -> anyhow::Result<()> {
if strings.is_empty() { return Ok(()); }
let mut stmt = conn.prepare(
"INSERT INTO strings (address, encoding, length, content) VALUES (?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
let mut count = 0u64;
for s in strings {
stmt.execute(params![
s.address as i64,
s.encoding,
s.length as i64,
s.content.as_str(),
])?;
count += 1;
}
metrics::counter!("db.rows", "table" => "strings").increment(count);
tracing::info!(rows = count, table = "strings", "bulk insert complete");
Ok(())
}
fn insert_eh_records(
conn: &Connection,
records: &[crate::eh_scope::EhFuncInfo],
) -> anyhow::Result<()> {
if records.is_empty() { return Ok(()); }
let mut stmt_fi = conn.prepare(
"INSERT INTO eh_funcinfo
(address, magic, max_state, p_unwind_map, n_try_blocks,
p_try_block_map, n_ip_map_entries, p_ip_to_state_map,
p_es_type_list, eh_flags)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
let mut stmt_unwind = conn.prepare(
"INSERT INTO eh_unwind_map
(funcinfo_address, state_index, to_state, action_pc)
VALUES (?, ?, ?, ?) ON CONFLICT DO NOTHING"
)?;
let mut stmt_try = conn.prepare(
"INSERT INTO eh_try_blocks
(funcinfo_address, try_index, try_low, try_high, catch_high,
n_catches, p_handler_array)
VALUES (?, ?, ?, ?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
let mut n_fi = 0u64;
let mut n_unwind = 0u64;
let mut n_try = 0u64;
for r in records {
stmt_fi.execute(params![
r.address as i64, r.magic as i64, r.max_state as i64,
r.p_unwind_map as i64, r.n_try_blocks as i64,
r.p_try_block_map as i64, r.n_ip_map_entries as i64,
r.p_ip_to_state_map as i64,
r.p_es_type_list.map(|p| p as i64),
r.eh_flags.map(|f| f as i64),
])?;
n_fi += 1;
for (i, e) in r.unwind_map.iter().enumerate() {
stmt_unwind.execute(params![
r.address as i64, i as i64, e.to_state as i64, e.action_pc as i64,
])?;
n_unwind += 1;
}
for (i, t) in r.try_blocks.iter().enumerate() {
stmt_try.execute(params![
r.address as i64, i as i64,
t.try_low as i64, t.try_high as i64, t.catch_high as i64,
t.n_catches as i64, t.p_handler_array as i64,
])?;
n_try += 1;
}
}
metrics::counter!("db.rows", "table" => "eh_funcinfo").increment(n_fi);
metrics::counter!("db.rows", "table" => "eh_unwind_map").increment(n_unwind);
metrics::counter!("db.rows", "table" => "eh_try_blocks").increment(n_try);
tracing::info!(
funcinfo = n_fi, unwind = n_unwind, try_blocks = n_try,
"EH scope-table insert complete"
);
Ok(())
}
fn insert_typed_ind_dispatch(
conn: &Connection,
t: &crate::ind_dispatch_typed::TypedIndirectResult,
) -> anyhow::Result<()> {
if !t.dispatches.is_empty() {
let mut stmt_site = conn.prepare(
"INSERT INTO indirect_dispatch_sites
(dispatch_pc, vptr_offset, slot, candidate_count)
VALUES (?, ?, ?, ?) ON CONFLICT DO NOTHING"
)?;
let mut stmt_cand = conn.prepare(
"INSERT INTO indirect_dispatch_candidates
(dispatch_pc, vtable_address, method_address)
VALUES (?, ?, ?) ON CONFLICT DO NOTHING"
)?;
let mut n_sites = 0u64;
let mut n_cand = 0u64;
for d in &t.dispatches {
stmt_site.execute(params![
d.dispatch_pc as i64,
d.vptr_offset as i64,
d.slot as i64,
d.candidate_vtables.len() as i64,
])?;
n_sites += 1;
for (vt, m) in d.candidate_vtables.iter().zip(d.method_pcs.iter()) {
stmt_cand.execute(params![
d.dispatch_pc as i64, *vt as i64, *m as i64,
])?;
n_cand += 1;
}
}
metrics::counter!("db.rows", "table" => "indirect_dispatch_sites").increment(n_sites);
metrics::counter!("db.rows", "table" => "indirect_dispatch_candidates").increment(n_cand);
tracing::info!(sites = n_sites, candidates = n_cand, "typed indirect-dispatch insert complete");
}
if !t.vptr_writes.is_empty() {
let mut stmt = conn.prepare(
"INSERT INTO vptr_writes
(writer_pc, vtable_address, vptr_offset, writer_function)
VALUES (?, ?, ?, ?) ON CONFLICT DO NOTHING"
)?;
let mut n = 0u64;
for w in &t.vptr_writes {
stmt.execute(params![
w.writer_pc as i64,
w.vtable_addr as i64,
w.vptr_offset as i64,
w.writer_function as i64,
])?;
n += 1;
}
metrics::counter!("db.rows", "table" => "vptr_writes").increment(n);
tracing::info!(rows = n, "vptr_writes insert complete");
}
Ok(())
}
fn insert_funcptr_arrays(
conn: &Connection,
arrays: &[crate::funcptr_arrays::FuncPtrArray],
) -> anyhow::Result<()> {
if arrays.is_empty() { return Ok(()); }
let mut stmt_arr = conn.prepare(
"INSERT INTO function_pointer_arrays (address, length, kind) VALUES (?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
let mut stmt_ent = conn.prepare(
"INSERT INTO function_pointer_array_entries (array_address, slot, function_address)
VALUES (?, ?, ?) ON CONFLICT DO NOTHING"
)?;
let mut n_arr = 0u64;
let mut n_ent = 0u64;
for a in arrays {
let inserted = stmt_arr.execute(params![
a.address as i64, a.length as i64, a.kind,
])?;
if inserted > 0 { n_arr += 1; }
for (i, &fn_va) in a.entries.iter().enumerate() {
stmt_ent.execute(params![a.address as i64, i as i64, fn_va as i64])?;
n_ent += 1;
}
}
metrics::counter!("db.rows", "table" => "function_pointer_arrays").increment(n_arr);
metrics::counter!("db.rows", "table" => "function_pointer_array_entries").increment(n_ent);
tracing::info!(arrays = n_arr, entries = n_ent, "function-pointer arrays insert complete");
Ok(())
}
fn insert_demangled_from_labels(
conn: &Connection,
labels: &HashMap<u32, String>,
import_libraries: &[xenia_xex::header::ImportLibrary],
) -> anyhow::Result<()> {
let mut stmt = conn.prepare(
"INSERT INTO demangled_names
(address, mangled, raw_demangled, namespace_path, class_name,
method_name, params_signature)
VALUES (?, ?, ?, ?, ?, ?, ?)"
)?;
let mut count = 0u64;
for (&addr, name) in labels {
// The label table holds raw symbol names (`?...@...`). Imports come
// wrapped as `__imp_<lib>_<sym>`; strip the `__imp_<lib>_` prefix to
// recover any mangled inner name (rare for kernel imports but
// defensive). For now, skip imports entirely — they're handled below
// via `import_libraries`.
if name.starts_with("__imp_") {
continue;
}
if let Some(d) = crate::demangle::demangle(name) {
stmt.execute(params![
addr as i64,
d.mangled,
d.raw_demangled,
d.namespace_path,
d.class_name,
d.method_name,
d.params_signature,
])?;
count += 1;
}
}
// Defensive: also demangle any import name that happens to be mangled.
for lib in import_libraries {
for imp in &lib.imports {
let resolved = crate::resolve_ordinal(&lib.name, imp.ordinal);
if let Some(name) = resolved
&& let Some(d) = crate::demangle::demangle(name)
{
stmt.execute(params![
imp.address as i64,
d.mangled,
d.raw_demangled,
d.namespace_path,
d.class_name,
d.method_name,
d.params_signature,
])?;
count += 1;
}
}
}
metrics::counter!("db.rows", "table" => "demangled_names").increment(count);
tracing::info!(rows = count, table = "demangled_names", "demangler complete");
Ok(())
}
fn insert_pdata_entries(
conn: &Connection,
entries: &[xenia_xex::pdata::PdataEntry],
) -> anyhow::Result<()> {
if entries.is_empty() {
return Ok(());
}
let mut stmt = conn.prepare(
"INSERT INTO pdata_entries
(begin_address, end_address, function_length, prolog_length, flags)
VALUES (?, ?, ?, ?, ?)
ON CONFLICT DO NOTHING"
)?;
for e in entries {
stmt.execute(params![
e.begin_address as i64,
e.end_address() as i64,
e.function_length as i64,
e.prolog_length as i64,
e.flags as i64,
])?;
}
Ok(())
}
fn insert_labels(
conn: &Connection,
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
let mut stmt = conn.prepare(
"INSERT INTO labels (address, name, kind) VALUES (?, ?, ?) ON CONFLICT DO NOTHING"
)?;
for (&addr, name) in labels {
let kind = if name.starts_with("sub_") || name == "entry_point" {
"function"
} else if name.starts_with("__imp_") {
"import"
} else if name.starts_with("__savegprlr_") || name.starts_with("__restgprlr_") {
"saverestore"
} else if name.starts_with("loc_") {
"local"
} else if name.starts_with("dat_") {
"data"
} else {
"other"
};
stmt.execute(params![addr as i64, name, kind])?;
}
Ok(())
}
fn insert_instructions_streaming(
conn: &Connection,
pe: &[u8],
info: &DisasmInfo,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
let mut appender = conn.appender("instructions")?;
let mut total: u64 = 0;
for section in info.sections {
if !section.is_code() { continue; }
let va_start = info.image_base + section.virtual_address;
let va_end = info.image_base + section.virtual_address + section.virtual_size;
let items = crate::disasm::enrich_section(
pe, info.image_base, &section.name, va_start, va_end, func_analysis, labels,
);
total += crate::sinks::duckdb::append_instructions(&mut appender, items)?;
}
appender.flush()?;
metrics::counter!("db.rows", "table" => "instructions").increment(total);
tracing::info!(rows = total, table = "instructions", "bulk insert complete");
Ok(())
}
fn insert_xrefs_streaming(
conn: &Connection,
xrefs: &XrefMap,
pe: &[u8],
image_base: u32,
func_analysis: &FuncAnalysis,
labels: &HashMap<u32, String>,
) -> anyhow::Result<()> {
let mut appender = conn.appender("xrefs")?;
let mut count: u64 = 0;
for (&target, refs) in xrefs {
let target_label = labels.get(&target).map(|s| s.as_str());
for xref in refs {
let kind = xref.kind.db_tag();
let instruction: Option<String> = {
let off = xref.source.wrapping_sub(image_base) as usize;
if off + 4 <= pe.len() {
let raw = u32::from_be_bytes([pe[off], pe[off+1], pe[off+2], pe[off+3]]);
let d = xenia_cpu::decode(raw, xref.source);
let t = xenia_cpu::disasm::format(&d);
// Prefer the simplified mnemonic when present (matches what
// a human reading the .asm file sees for that line).
Some(t.ext_mnemonic.unwrap_or(t.mnemonic))
} else {
None
}
};
let source_func = func_analysis.functions
.range(..=xref.source)
.next_back()
.map(|(&a, _)| a as i64);
let source_label = resolve_source_label(
xref.source, func_analysis, labels,
);
let addr_mode = xref.addr_mode.map(|m| m.tag());
appender.append_row(params![
xref.source as i64,
target as i64,
kind,
addr_mode,
instruction.as_deref(),
source_func,
source_label.as_str(),
target_label,
])?;
count += 1;
}
}
appender.flush()?;
metrics::counter!("db.rows", "table" => "xrefs").increment(count);
tracing::info!(rows = count, table = "xrefs", "bulk insert complete");
Ok(())
}