Adds an MSVC name-demangling layer in front of M3's vtable / RTTI work: - New `xenia_analysis::demangle` wraps the `msvc-demangler` crate (a Rust port of LLVM's `MicrosoftDemangle.cpp`). `demangle()` short-circuits on non-mangled inputs (`?` prefix check); `demangle_or_raw()` always returns a record (raw passthrough on parse failure). - Heuristic split of the formatted demangled string into structured fields `(namespace_path, class_name, method_name, params_signature)`. Top-level paren / template-bracket aware, so `a::b<c::d>::e` and signatures with templated arg types parse correctly. - DB: new `demangled_names(address, mangled, raw_demangled, namespace_path, class_name, method_name, params_signature)` with indices on address / class_name / method_name. Populated from any label whose name starts with `?` plus any import name that happens to be mangled. For Sylpheed (a fully stripped binary) this table is empty out-of-the-box; the layer's value lands in M3, which will append rows for every RTTI TypeDescriptor name found in `.rdata`. Tests 610→617 (+7 demangler unit tests covering early-out, raw fallback, member function form, RTTI form, qname split, paren-template safety, and top-level `::` splitting). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
5.9 KiB
xenia-analysis schema reference
Authoritative documentation for the DuckDB tables and SQL views produced by
xenia-rs dis --db sylpheed.db. Track schema changes here alongside any
update to the db_schema_golden test fixture.
The base + disasm tables (metadata, sections, imports, functions,
labels, instructions, xrefs, opt-in exec_trace / import_calls /
branch_trace) are documented inline in src/db.rs doc comment. This file
collects layered analysis additions and forward-work notes.
Layer M1 — .pdata boundary correction (landed)
Schema additions
functions.pdata_validated BOOLEAN NOT NULL—truewhen the row'saddressmatches aRUNTIME_FUNCTION.BeginAddressfrom.pdata. Linker ground truth.functions.pdata_length BIGINT NULL—function_length(bytes) from the matching pdata entry;NULLwhen the row is prologue-only.- New table
pdata_entries(begin_address BIGINT PRIMARY KEY, end_address BIGINT, function_length BIGINT, prolog_length BIGINT, flags BIGINT)— every parsed.pdataRUNTIME_FUNCTIONentry (raw, before any merge with prologue analysis). - Index
idx_functions_pdata_validatedonfunctions(pdata_validated).
What this layer does
- Parses
.pdata8-byteRUNTIME_FUNCTIONentries (PowerPC PE32 layout): word 0BeginAddress(absolute VA), word 1 packed{prolog_length:8, function_length:22, flags:2}, both big-endian. - Unions pdata
BeginAddressvalues into the function-candidate set fed to the prologue walker, so functions our prologue heuristic missed still get rows. - When pdata supplies a longer
function_lengththan the prologue walk found, extendsend_addressto the pdata-implied end (catches mis-split where the walker stopped at an earlyblr). - After the walker, performs a forward pass that trims
function.endto the next start when they overlap (catches mis-merge where one row spanned two prologues — the audit-031sub_824D23B0/sub_824D29F0case).
What this layer does NOT do
- Does not adjust prolog-derived
frame_size/saved_gprsfrom.pdata'sprolog_lengthfield — those remain prologue-only inferences. - Does not classify functions further than the existing
is_leaf/is_saverestorecolumns. Class membership is M3. - Does not detect functions whose entries are missing from BOTH
.pdataand the bl-target scan (extremely rare; would require executable-byte linear sweep).
Reference docs
- Microsoft PE32+ exception data spec for PowerPC RUNTIME_FUNCTION.
- xenia-canary
src/xenia/cpu/xex_module.cc:1570-1587— canary's reference parser (extractsBeginAddressonly; we additionally decode word 1).
Validation queries
-- All pdata entries found
SELECT COUNT(*) FROM pdata_entries; -- ~23073 for Sylpheed
-- Functions cross-validated against pdata
SELECT COUNT(*) FROM functions WHERE pdata_validated;
-- Functions detected ONLY by prologue (orphans of pdata)
SELECT COUNT(*) FROM functions WHERE NOT pdata_validated;
-- Pdata orphans NOT yet in functions (should be 0 after this layer)
SELECT COUNT(*) FROM pdata_entries p
LEFT JOIN functions f ON f.address = p.begin_address
WHERE f.address IS NULL;
-- Audit-031 mis-merge resolved: 0x824D29F0 should have its own row
SELECT name FROM functions WHERE address = 2186674160; -- 0x824D29F0
Layer M2 — MSVC C++ name demangler (landed)
Schema additions
- New table
demangled_names(address BIGINT NULL, mangled VARCHAR NOT NULL, raw_demangled VARCHAR NOT NULL, namespace_path VARCHAR NULL, class_name VARCHAR NULL, method_name VARCHAR NULL, params_signature VARCHAR NULL). - Indices on
address,class_name,method_name.
What this layer does
- Wraps
msvc_demangler::demangle(a Rust port of LLVM'sMicrosoftDemangle.cpp) and splits the formatted output into structured fields via a heuristic top-level parser (handles templates and nested parens correctly). - Populates
demangled_namesfrom any label whose name starts with?plus any import name that happens to be mangled (defensive — typical kernel imports use C names).
What this layer does NOT do
- Does not parse the AST returned by
msvc_demangler::parse— uses the formatted string and a heuristic split. Adequate for typical class member functions and RTTI strings; exotic template / lambda forms still getraw_demangledpopulated but may have NULL structured fields. - Does not yet ingest RTTI strings discovered in
.rdata— that's M3's job; M3 will append rows to this table at the addresses where it finds RTTI TypeDescriptors.
Reference docs
msvc-demanglercrate (https://docs.rs/msvc-demangler/0.11).- LLVM
MicrosoftDemangle.cpp(the parser this crate ports).
Layer M3 — Vtable + RTTI detection (planned)
Adds vtables, methods, classes tables. Heuristic vtable scan over
.rdata + .data, optional MSVC RTTI CompleteObjectLocator → TypeDescriptor
walk, anonymous-class fallback when RTTI is stripped. See
crates/xenia-analysis/src/vtables.rs (when landed).
Layer M4 — Class-aware probe targeting (planned)
CLI extension only — no schema changes. --pc-probe=Class::method and
--pc-probe-class=ClassName resolve via M3's tables. See
crates/xenia-analysis/src/lookup.rs (when landed).
Forward work (M5–M12, not yet landed)
- M5 — indirect-dispatch reachability via vtable+CTR dataflow.
- M6 — extended
xrefs.kind='write'for indexed/byte-reverse/multiword/VMX/DCBZ/atomic stores withaddr_modecolumn. - M7 —
.rdataASCII / UTF-16 string pool detection cross-referenced with PCs. - M8 — dispatch-table heuristics beyond vtables (e.g. function-pointer arrays in
.data). - M9 —
__CxxFrameHandlerexception scope-table parsing. - M10 —
.tlssection / TLS slot tracking. - M11 —
__xc_a/__xc_zstatic-initializer driver detection. - M12 — comparative-PC-trace mode for canary diff (runtime side, not analyzer).