M3: vtable scan + MSVC RTTI walk + 3 new tables

Adds detection of statically-allocated MSVC vtables in .rdata/.data:
- New `xenia_analysis::vtables` walks read-only sections looking for runs of
  ≥3 contiguous big-endian u32 values where each value lands on a known
  function start (from M1's corrected functions table). 2-slot runs are
  rejected to keep false-positive rate down.
- For each candidate the MSVC RTTI walk vtable[-1] → CompleteObjectLocator
  → TypeDescriptor → mangled name is attempted; on success the demangled
  class name is recorded along with a best-effort RTTIClassHierarchyDescriptor
  walk to fill base_classes_json. On failure (RTTI stripped — common for
  shipped game binaries) the class is named ANON_Class_<fnv1a-hash> keyed
  by sorted method-PC list, so identical vtables collapse to one entry.
- DB: new tables `vtables`, `methods`, `classes` with indices on
  function_address and rtti_present. `write_analysis_results` takes a
  `&[Vtable]` slice; `write_disasm` (back-compat) passes empty.
- cmd_dis wires the scan after xref analysis using
  `func_analysis.functions.keys()` as the function-start oracle.

Validation on Sylpheed (RTTI stripped, as expected): 722 vtables / 499
unique classes / 5571 methods. Sanity invariant: every methods.function_address
joins to functions.address (0 broken refs). Largest vtable: 131 slots.

Tests 617→621 (+4 vtable unit tests covering 3-slot detect, 2-slot reject,
synth name stability, and synth name divergence).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-08 20:17:45 +02:00
parent bd5753311e
commit 1d6c51fbf8
6 changed files with 620 additions and 8 deletions

View File

@@ -102,12 +102,45 @@ SELECT name FROM functions WHERE address = 2186674160; -- 0x824D29F0
- `msvc-demangler` crate (`https://docs.rs/msvc-demangler/0.11`).
- LLVM `MicrosoftDemangle.cpp` (the parser this crate ports).
## Layer M3 — Vtable + RTTI detection (planned)
## Layer M3 — Vtable + RTTI detection (landed)
Adds `vtables`, `methods`, `classes` tables. Heuristic vtable scan over
`.rdata` + `.data`, optional MSVC RTTI `CompleteObjectLocator → TypeDescriptor`
walk, anonymous-class fallback when RTTI is stripped. See
`crates/xenia-analysis/src/vtables.rs` (when landed).
### Schema additions
- `vtables(address PK, length, col_address NULL, class_name, rtti_present,
base_classes_json NULL)` — every detected static vtable.
- `methods(vtable_address, slot, function_address, mangled_name NULL,
demangled_name NULL, PRIMARY KEY (vtable_address, slot))` — one row per
method slot.
- `classes(name PK, vtable_address, rtti_present, base_classes_json NULL)` —
deduped by class name (first-detected vtable wins).
- Indices: `methods.function_address`, `classes.rtti_present`.
### What this layer does
- Walks `.rdata` and `.data` looking for runs of ≥3 consecutive 4-byte BE
values where each value is a known function start (from M1's corrected
`functions` table). Single-2-method vtables are intentionally rejected to
control false-positive rate.
- Attempts the MSVC RTTI walk `vtable[-1] → CompleteObjectLocator → TypeDescriptor`
for each candidate. When successful, the demangled `class ClassName`
string fills `class_name` and a best-effort
`RTTIClassHierarchyDescriptor` walk fills `base_classes_json` (JSON array
of base class names).
- Falls back to `ANON_Class_<8-hex>` keyed by FNV-1a hash of the sorted
method-PC tuple when RTTI is absent (typical for shipped game binaries).
Identical vtables across the binary (multiple instances) collapse to the
same anonymous name.
### What this layer does NOT do
- Vtables built at runtime in heap-allocated memory (e.g. by ctors copying
static templates) are out of scope — only static `.rdata`/`.data` content.
- Multiple-inheritance "extra" vftables (one per base subobject) are detected
as independent vtables with no link between them.
- Inheritance-tree walking beyond `RTTIClassHierarchyDescriptor`'s direct
base list is not attempted.
### Reference docs
- openrce.org "Reversing Microsoft Visual C++" — RTTI layout articles
(CompleteObjectLocator at vtable[-1]; TypeDescriptor at COL+0xC; mangled
name at TD+0x8).
## Layer M4 — Class-aware probe targeting (planned)