Files
xenia-rs/crates/xenia-analysis/SCHEMA.md
MechaCat02 70120465a3 M1: parse .pdata RUNTIME_FUNCTION; cross-validate function boundaries
Adds an authoritative function-boundary source from the linker:
- New `xenia_xex::pdata` parses .pdata 8-byte entries (BeginAddress + packed
  prolog/length/flags). Bit layout per Microsoft PE32 PowerPC spec: prolog in
  bits 0..7, function_length in bits 8..29, flags in 30..31.
- `func::analyze_with_pdata` unions pdata BeginAddresses into the candidate
  set, attaches `pdata_validated`/`pdata_length` to each `FuncInfo`, and trims
  any function whose `end` overlaps the next start (catches mis-merge where
  one row spanned two prologues — the audit-031 sub_824D23B0/sub_824D29F0
  case).
- DB: extends `functions` with `pdata_validated BOOLEAN`, `pdata_length BIGINT`;
  new table `pdata_entries`; index on pdata_validated.
- New `crates/xenia-analysis/SCHEMA.md` documents M1 layer + forward work.

Validation on Sylpheed: 25481 functions (was 12156) / 23073 pdata_validated /
0 orphans / 0 mis-merges. Audit-031 mis-merge resolved: sub_824D29F0 now has
its own row with `pdata_length=280` (70 dwords); sub_824D23B0 now correctly
ends at 0x824D2878 (`pdata_length=1224` matches prologue walk).

Tests 605→610. New 5-test pdata unit suite covers bit layout + sentinel +
out-of-range filtering + real-world layout round-trip.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-08 19:44:02 +02:00

4.9 KiB
Raw Blame History

xenia-analysis schema reference

Authoritative documentation for the DuckDB tables and SQL views produced by xenia-rs dis --db sylpheed.db. Track schema changes here alongside any update to the db_schema_golden test fixture.

The base + disasm tables (metadata, sections, imports, functions, labels, instructions, xrefs, opt-in exec_trace / import_calls / branch_trace) are documented inline in src/db.rs doc comment. This file collects layered analysis additions and forward-work notes.


Layer M1 — .pdata boundary correction (landed)

Schema additions

  • functions.pdata_validated BOOLEAN NOT NULLtrue when the row's address matches a RUNTIME_FUNCTION.BeginAddress from .pdata. Linker ground truth.
  • functions.pdata_length BIGINT NULLfunction_length (bytes) from the matching pdata entry; NULL when the row is prologue-only.
  • New table pdata_entries(begin_address BIGINT PRIMARY KEY, end_address BIGINT, function_length BIGINT, prolog_length BIGINT, flags BIGINT) — every parsed .pdata RUNTIME_FUNCTION entry (raw, before any merge with prologue analysis).
  • Index idx_functions_pdata_validated on functions(pdata_validated).

What this layer does

  • Parses .pdata 8-byte RUNTIME_FUNCTION entries (PowerPC PE32 layout): word 0 BeginAddress (absolute VA), word 1 packed {prolog_length:8, function_length:22, flags:2}, both big-endian.
  • Unions pdata BeginAddress values into the function-candidate set fed to the prologue walker, so functions our prologue heuristic missed still get rows.
  • When pdata supplies a longer function_length than the prologue walk found, extends end_address to the pdata-implied end (catches mis-split where the walker stopped at an early blr).
  • After the walker, performs a forward pass that trims function.end to the next start when they overlap (catches mis-merge where one row spanned two prologues — the audit-031 sub_824D23B0 / sub_824D29F0 case).

What this layer does NOT do

  • Does not adjust prolog-derived frame_size / saved_gprs from .pdata's prolog_length field — those remain prologue-only inferences.
  • Does not classify functions further than the existing is_leaf / is_saverestore columns. Class membership is M3.
  • Does not detect functions whose entries are missing from BOTH .pdata and the bl-target scan (extremely rare; would require executable-byte linear sweep).

Reference docs

  • Microsoft PE32+ exception data spec for PowerPC RUNTIME_FUNCTION.
  • xenia-canary src/xenia/cpu/xex_module.cc:1570-1587 — canary's reference parser (extracts BeginAddress only; we additionally decode word 1).

Validation queries

-- All pdata entries found
SELECT COUNT(*) FROM pdata_entries;            -- ~23073 for Sylpheed
-- Functions cross-validated against pdata
SELECT COUNT(*) FROM functions WHERE pdata_validated;
-- Functions detected ONLY by prologue (orphans of pdata)
SELECT COUNT(*) FROM functions WHERE NOT pdata_validated;
-- Pdata orphans NOT yet in functions (should be 0 after this layer)
SELECT COUNT(*) FROM pdata_entries p
LEFT JOIN functions f ON f.address = p.begin_address
WHERE f.address IS NULL;
-- Audit-031 mis-merge resolved: 0x824D29F0 should have its own row
SELECT name FROM functions WHERE address = 2186674160;  -- 0x824D29F0

Layer M2 — MSVC C++ name demangler (planned)

Adds demangled_names(address, mangled, namespace_path, class_name, method_name, params_signature, raw_demangled). Populates from any label / import / RTTI string starting with ?. Falls back to raw_demangled = mangled when the parser cannot decode (e.g. exotic templates). See crates/xenia-analysis/src/demangle.rs (when landed).

Layer M3 — Vtable + RTTI detection (planned)

Adds vtables, methods, classes tables. Heuristic vtable scan over .rdata + .data, optional MSVC RTTI CompleteObjectLocator → TypeDescriptor walk, anonymous-class fallback when RTTI is stripped. See crates/xenia-analysis/src/vtables.rs (when landed).

Layer M4 — Class-aware probe targeting (planned)

CLI extension only — no schema changes. --pc-probe=Class::method and --pc-probe-class=ClassName resolve via M3's tables. See crates/xenia-analysis/src/lookup.rs (when landed).


Forward work (M5M12, not yet landed)

  • M5 — indirect-dispatch reachability via vtable+CTR dataflow.
  • M6 — extended xrefs.kind='write' for indexed/byte-reverse/multiword/VMX/DCBZ/atomic stores with addr_mode column.
  • M7.rdata ASCII / UTF-16 string pool detection cross-referenced with PCs.
  • M8 — dispatch-table heuristics beyond vtables (e.g. function-pointer arrays in .data).
  • M9__CxxFrameHandler exception scope-table parsing.
  • M10.tls section / TLS slot tracking.
  • M11__xc_a / __xc_z static-initializer driver detection.
  • M12 — comparative-PC-trace mode for canary diff (runtime side, not analyzer).