Files
xenia-rs/crates/xenia-analysis/SCHEMA.md
MechaCat02 e428ce33aa M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle.
All four are smaller-scope and additive; lockstep determinism
unaffected (analyzer-only changes).

## M9.5 — __CxxFrameHandler scope-table parsing

- New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the
  three documented MSVC FuncInfo signatures (0x19930520/21/22) on
  4-byte alignment. Each match is parsed as the documented struct
  (BE u32 fields), with sanity caps on max_state / n_try_blocks /
  pointer validity.
- Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap
  (TryBlockMapEntry, 20 bytes) into one row each.
- New tables eh_funcinfo, eh_unwind_map, eh_try_blocks.
- Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) /
  10,019 unwind entries / 315 try-blocks.

## M11.5 — Static-init driver chain detection

- New `xenia_analysis::static_init` module. Walks every function
  looking for the canonical _initterm loop: lwz cursor; mtctr;
  bcctrl; addi cursor, cursor, 4 bounded by a compare against another
  constant register. Extracts (array_start, array_end) and reads
  the array.
- Reuses `function_pointer_arrays` table — drivers' arrays land with
  kind='static_init' (replacing M11's prologue-heuristic output where
  the structurally-grounded pattern fires).
- Sylpheed yield: 0 drivers detected — the binary's static-init
  structure does not match the canonical CRT loop. Infrastructure
  ready; future M11.6 can relax.

## VMX vector-store xrefs (M6 follow-up)

- Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31
  dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and
  stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode=
  'x_form_indexed'. Static resolution still requires both rA and rB
  constant.
- Sylpheed yield: 110 newly-detected stvx writes.

## Shift_JIS + UTF-8 localised-string detection (M7 follow-up)

- Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS
  X 0208 lead/trail byte ranges + half-width katakana pass-through)
  and scan_utf8 (2- and 3-byte sequences). At least one multi-byte
  unit required so pure-ASCII strings aren't double-counted.
- SJIS bytes rendered as \xHH escapes for diagnostic readability;
  full SJIS→UTF-8 decoding deferred.
- Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text)
  + 39 UTF-8.

## Tests

- +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state)
- +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern)
- +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string)

Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results
signature updated for new EH parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:36:53 +02:00

26 KiB
Raw Blame History

xenia-analysis schema reference

Authoritative documentation for the DuckDB tables and SQL views produced by xenia-rs dis --db sylpheed.db. Track schema changes here alongside any update to the db_schema_golden test fixture.

The base + disasm tables (metadata, sections, imports, functions, labels, instructions, xrefs, opt-in exec_trace / import_calls / branch_trace) are documented inline in src/db.rs doc comment. This file collects layered analysis additions and forward-work notes.


Layer M1 — .pdata boundary correction (landed)

Schema additions

  • functions.pdata_validated BOOLEAN NOT NULLtrue when the row's address matches a RUNTIME_FUNCTION.BeginAddress from .pdata. Linker ground truth.
  • functions.pdata_length BIGINT NULLfunction_length (bytes) from the matching pdata entry; NULL when the row is prologue-only.
  • New table pdata_entries(begin_address BIGINT PRIMARY KEY, end_address BIGINT, function_length BIGINT, prolog_length BIGINT, flags BIGINT) — every parsed .pdata RUNTIME_FUNCTION entry (raw, before any merge with prologue analysis).
  • Index idx_functions_pdata_validated on functions(pdata_validated).

What this layer does

  • Parses .pdata 8-byte RUNTIME_FUNCTION entries (PowerPC PE32 layout): word 0 BeginAddress (absolute VA), word 1 packed {prolog_length:8, function_length:22, flags:2}, both big-endian.
  • Unions pdata BeginAddress values into the function-candidate set fed to the prologue walker, so functions our prologue heuristic missed still get rows.
  • When pdata supplies a longer function_length than the prologue walk found, extends end_address to the pdata-implied end (catches mis-split where the walker stopped at an early blr).
  • After the walker, performs a forward pass that trims function.end to the next start when they overlap (catches mis-merge where one row spanned two prologues — the audit-031 sub_824D23B0 / sub_824D29F0 case).

What this layer does NOT do

  • Does not adjust prolog-derived frame_size / saved_gprs from .pdata's prolog_length field — those remain prologue-only inferences.
  • Does not classify functions further than the existing is_leaf / is_saverestore columns. Class membership is M3.
  • Does not detect functions whose entries are missing from BOTH .pdata and the bl-target scan (extremely rare; would require executable-byte linear sweep).

Reference docs

  • Microsoft PE32+ exception data spec for PowerPC RUNTIME_FUNCTION.
  • xenia-canary src/xenia/cpu/xex_module.cc:1570-1587 — canary's reference parser (extracts BeginAddress only; we additionally decode word 1).

Validation queries

-- All pdata entries found
SELECT COUNT(*) FROM pdata_entries;            -- ~23073 for Sylpheed
-- Functions cross-validated against pdata
SELECT COUNT(*) FROM functions WHERE pdata_validated;
-- Functions detected ONLY by prologue (orphans of pdata)
SELECT COUNT(*) FROM functions WHERE NOT pdata_validated;
-- Pdata orphans NOT yet in functions (should be 0 after this layer)
SELECT COUNT(*) FROM pdata_entries p
LEFT JOIN functions f ON f.address = p.begin_address
WHERE f.address IS NULL;
-- Audit-031 mis-merge resolved: 0x824D29F0 should have its own row
SELECT name FROM functions WHERE address = 2186674160;  -- 0x824D29F0

Layer M2 — MSVC C++ name demangler (landed)

Schema additions

  • New table demangled_names(address BIGINT NULL, mangled VARCHAR NOT NULL, raw_demangled VARCHAR NOT NULL, namespace_path VARCHAR NULL, class_name VARCHAR NULL, method_name VARCHAR NULL, params_signature VARCHAR NULL).
  • Indices on address, class_name, method_name.

What this layer does

  • Wraps msvc_demangler::demangle (a Rust port of LLVM's MicrosoftDemangle.cpp) and splits the formatted output into structured fields via a heuristic top-level parser (handles templates and nested parens correctly).
  • Populates demangled_names from any label whose name starts with ? plus any import name that happens to be mangled (defensive — typical kernel imports use C names).

What this layer does NOT do

  • Does not parse the AST returned by msvc_demangler::parse — uses the formatted string and a heuristic split. Adequate for typical class member functions and RTTI strings; exotic template / lambda forms still get raw_demangled populated but may have NULL structured fields.
  • Does not yet ingest RTTI strings discovered in .rdata — that's M3's job; M3 will append rows to this table at the addresses where it finds RTTI TypeDescriptors.

Reference docs

  • msvc-demangler crate (https://docs.rs/msvc-demangler/0.11).
  • LLVM MicrosoftDemangle.cpp (the parser this crate ports).

Layer M3 — Vtable + RTTI detection (landed)

Schema additions

  • vtables(address PK, length, col_address NULL, class_name, rtti_present, base_classes_json NULL) — every detected static vtable.
  • methods(vtable_address, slot, function_address, mangled_name NULL, demangled_name NULL, PRIMARY KEY (vtable_address, slot)) — one row per method slot.
  • classes(name PK, vtable_address, rtti_present, base_classes_json NULL) — deduped by class name (first-detected vtable wins).
  • Indices: methods.function_address, classes.rtti_present.

What this layer does

  • Walks .rdata and .data looking for runs of ≥3 consecutive 4-byte BE values where each value is a known function start (from M1's corrected functions table). Single-2-method vtables are intentionally rejected to control false-positive rate.
  • Attempts the MSVC RTTI walk vtable[-1] → CompleteObjectLocator → TypeDescriptor for each candidate. When successful, the demangled class ClassName string fills class_name and a best-effort RTTIClassHierarchyDescriptor walk fills base_classes_json (JSON array of base class names).
  • Falls back to ANON_Class_<8-hex> keyed by FNV-1a hash of the sorted method-PC tuple when RTTI is absent (typical for shipped game binaries). Identical vtables across the binary (multiple instances) collapse to the same anonymous name.

What this layer does NOT do

  • Vtables built at runtime in heap-allocated memory (e.g. by ctors copying static templates) are out of scope — only static .rdata/.data content.
  • Multiple-inheritance "extra" vftables (one per base subobject) are detected as independent vtables with no link between them.
  • Inheritance-tree walking beyond RTTIClassHierarchyDescriptor's direct base list is not attempted.

Reference docs

  • openrce.org "Reversing Microsoft Visual C++" — RTTI layout articles (CompleteObjectLocator at vtable[-1]; TypeDescriptor at COL+0xC; mangled name at TD+0x8).

Layer M4 — Class-aware probe targeting (landed)

CLI extension only — no schema changes. The probe-token grammar adds three symbolic forms on top of the existing 0xADDR literal:

  • Class::method — joins classes × methods × demangled_names to find every PC whose vtable belongs to that class and whose demangled method_name matches.
  • Class::* — joins classes × methods to find every method PC of that class.
  • function_name — falls back to functions.name lookup for free functions / saverestore stubs / labels.

Numeric tokens never touch the DB (preserves zero-IO fast path; lockstep digest unaffected). Symbolic tokens require the DuckDB at --probe-db PATH or XENIA_PROBE_DB; default is sylpheed.db next to the .iso when present.

Resolution happens BEFORE guest exec begins, so it cannot affect the lockstep digest.

See crates/xenia-analysis/src/lookup.rs.


Layer M5 — Indirect-dispatch reachability (landed)

Schema additions

  • New value 'ind_call' in the xrefs.kind set.
  • New SQL view v_indirect_reachability_from_entry — strict superset of v_reachability_from_entry, taking ind_call edges in the BFS.

What this layer does

  • Walks each FuncAnalysis.functions entry with a per-basic-block register tracker. Recognises the canonical static-vtable pattern: lis+addi → lwz off(rA) → mtctr → bcctrl, where rA ends up holding a known vtable's start address from M3.
  • Honours the PowerPC ABI: bl-style calls (op 18 / 16 with LK=1) clobber volatile r0..r12 + ctr but preserve non-volatile r13..r31, so a vtable pointer parked in r30/r31 before a call survives.
  • Treats every M3 loc_* label as a basic-block boundary (kills register state) so jump-IN paths cannot induce false positives.

What this layer does NOT do (and observed impact)

  • Vtable pointer loaded from a this-pointer field (lwz r_vt, off(rA) where rA = this) — by far the dominant pattern in real C++ — is unresolvable without alias / points-to analysis.
  • On Sylpheed: the layer detects 0 edges. The binary's 1,001 lis+addi references into vtables are mostly constructor-side vptr writes (stw rVtable, vptr_offset(this)), not direct dispatches. The renderer hunt's audit-009 cluster therefore needs a future M5.5 with this-flow tracking before this layer surfaces it.

Reference docs

  • IBM PowerPC ABI: register-save convention (volatile r0..r12 + ctr, non-volatile r13..r31).

Layer M7 — String / constant-pool detection (landed)

Schema additions

  • New table strings(address PK, encoding, length, content).
  • Index idx_strings_encoding.

What this layer does

  • Scans .rdata for runs of length ≥ 6 of printable ASCII bytes followed by a NUL terminator.
  • Scans .rdata for UTF-16LE runs of length ≥ 6 code units (printable-ASCII basic plane only) followed by a u16 NUL terminator.
  • Cross-reference is implicit: existing xrefs.kind='ref' rows whose target falls in strings.address's exact match set name the referencing PCs. SQL: SELECT s.content, x.source FROM xrefs x JOIN strings s ON s.address = x.target WHERE x.kind='ref'.

What this layer does NOT do

  • No UTF-8 multibyte / non-ASCII basic plane in either encoding.
  • No .data scan (read-only-section bias).
  • No multi-byte CJK encodings — Japanese text in localised builds may be represented in shift_jis / utf-8 with non-printable bytes that this scanner skips.

Sylpheed yield

  • 6,311 ASCII strings (including full embedded HLSL shader source).
  • 0 UTF-16LE strings (binary uses ASCII / native CJK encoding).
  • 9,132 lis+addi sites cross-reference into the detected strings — names the source PCs that reference each string.

Layer M6 — Extended store-class xrefs + addr_mode column (landed)

Schema additions

  • xrefs.addr_mode VARCHAR NULL — sub-classifies how the source instruction computes its target. NULL for control-flow edges (call / ind_call / j / br); one of the following tags for data edges:
    • d_form — standard signed-16 displacement (lwz/stw/lfs/stfs/etc.)
    • lis_addi — address materialised via lis + addi register tracking
    • lis_ori — address materialised via lis + ori
    • multiwordlmw / stmw (one xref per slot; up to 32-rS slots)
    • x_form_indexedstwx / stbx / sthx / stwux / stbux / sthux / stdx / stdux / lwzx / lbzx / lhzx / lhax / lwzux / lbzux / lhzux / lhaux / ldx / ldux — emitted only when both rA and rB are tracked constants
    • x_form_byterevstwbrx / sthbrx / lwbrx / lhbrx
    • atomicstwcx. / stdcx. reservation-conditional stores
    • dcbz — cache-line clear (32-byte zero at rA+rB)
  • Index idx_xrefs_addr_mode.

What this layer does

  • Tags every existing data xref with its addressing mode (d_form for the bulk; lis_addi / lis_ori for the lift-and-add cases that produce DataRef rows).
  • Adds new dispatch for opcode 47 (stmw) and 46 (lmw), expanding to per-slot DataWrite / DataRead rows.
  • Adds new dispatch for opcode 31 X-form: stores, atomic, byte-reverse, dcbz. X-form rows are emitted ONLY when both rA and rB resolve to known constants (otherwise the address is runtime-dependent and we skip).

What this layer does NOT do

  • VMX / VMX128 vector stores (opcode 31 with vector XO codes) are not emitted — they always have register-indexed addresses that the lis+addi tracker can't usually resolve, and detecting them adds noise without improving target resolution.
  • The dominant runtime-of-stwx pattern (rA = base, rB = runtime index) is not resolved — by design; mem-watch covers the runtime side per VERIFY-B.

Sylpheed yield

  • 28,834 lis_addi refs, 18,485 d_form reads, 3,288 d_form writes — the existing baseline now properly tagged.
  • 442 newly-detected x_form_indexed reads — primarily lwzx/lhzx reads from in-table dispatch (each pair (rA,rB) resolved statically).
  • 40 newly-detected atomic writes — every stwcx. site with a resolvable address; useful for reservation-table audits.
  • 9 lis_ori refs.
  • 0 multiword / dcbz / byterev — these instructions exist in the binary but are not in lis+addi-tracked code paths.

Layer M8 + M11 — Function-pointer arrays beyond vtables (landed)

Schema additions

  • New table function_pointer_arrays(address PK, length, kind) where kind is 'vtable' (M3 re-emit), 'dispatch_table' (M8), or 'static_init' (M11).
  • New table function_pointer_array_entries(array_address, slot, function_address, PRIMARY KEY (array_address, slot)) — one row per slot of every detected array (vtable + non-vtable).
  • Indices on function_pointer_arrays.kind and function_pointer_array_entries.function_address.

What this layer does

  • Walks .rdata (only — .data produces too many false positives) for runs of ≥ 2 consecutive 4-byte BE values where each value is a known function entry from M1's functions table.
  • Skips runs whose start matches an M3 vtable head — those are re-emitted in this table with kind='vtable' for unified queries but not re-classified.
  • Heuristically classifies non-vtable runs:
    • static_init (M11): every entry's first instruction is mfspr r12, LR AND the next is stwu r1, -N(r1) with N ≤ 0x80 (or a save-stub bl). Mirrors the typical C++ static-initialiser prologue.
    • dispatch_table (M8): everything else.

What this layer does NOT do

  • Does not parse symbol-table-bracketed regions like __xc_a / __xc_z / __xi_a / __xi_z directly — Sylpheed's symbol table is stripped.
  • Does not chain multi-segment static-init drivers; future M11.5 could walk the entry-point's static-init driver call chain to surface ground-truth ctor PCs.
  • 2-slot runs in .rdata may be false positives where two struct fields happen to alias function VAs; downstream queries should use a length filter (WHERE length >= 3) when high precision matters.

Sylpheed yield

  • 722 vtables (M3 re-emit) + 388 dispatch_tables = 1,110 arrays in function_pointer_arrays.
  • 0 static_init detected — Sylpheed's ctors don't all match the conservative prologue heuristic. Lengths concentrate at 2 slots (typical of switch-case jump tables).

Layer M9 — has_eh from .pdata exception flag (landed)

Schema additions

  • functions.has_eh BOOLEAN NOT NULL — true when .pdata's exception- handler-present bit (bit 31 of word 1, the high bit) is set.
  • Index idx_functions_has_eh.

What this layer does

  • Derived directly from M1's already-parsed pdata.flags bit field (no new parsing). The bit was always available in pdata_entries.flags; this layer surfaces it as a first-class column on functions.

What this layer does NOT do

  • Does not parse the actual __CxxFrameHandler / __C_specific_handler scope-table records that the exception bit gates. Walking those tables would let us name try/catch ranges and per-state cleanup actions, but is out of scope for a derive-only milestone.

Sylpheed yield

  • 2,975 of 23,073 pdata-validated functions have has_eh=true (12.9%) — plausible MSVC C++ EH coverage rate. Largest EH function: 26,328 bytes (sub_823518F0).

Layer M10 — .tls section / TLS directory (landed)

Schema additions

  • New table tls_info(raw_data_start, raw_data_end, index_address, callback_array, zero_fill_size, characteristics) — at most one row (the IMAGE_TLS_DIRECTORY32).
  • New table tls_callbacks(slot PK, address) — one row per resolved TLS callback function.

What this layer does

  • Reads the first 24 bytes of the .tls section as an IMAGE_TLS_DIRECTORY32 and walks the zero-terminated callback array.
  • All addresses stored as absolute VAs.

What this layer does NOT do

  • Does not parse the raw TLS template content (the variable initialiser block); just records its start/end VAs.

Sylpheed yield

  • 0 rows — Sylpheed has no .tls section. Infrastructure ready for any binary that uses __declspec(thread) storage.

Layer M12 — --lr-trace runtime canary-diff harness (landed)

Runtime additions (no DB)

  • New CLI flag --lr-trace=PC[,PC,...] on exec — comma-separated PCs to capture as JSONL records on every fire. Symbolic tokens (Class::method) resolve via M4's lookup against --probe-db. Settable via XENIA_LR_TRACE.
  • New CLI flag --lr-trace-out=PATH — writes JSONL to a file (one record per line). Stdout when omitted. Settable via XENIA_LR_TRACE_OUT.
  • New kernel state fields lr_trace_pcs: HashSet<u32> + lr_trace_writer: Option<Mutex<File>> and helper KernelState::fire_lr_trace_if_match(hw_id) invoked from the per-instruction probe slot.

JSONL record fields

pc, tid, hw, cycle, r3, r4, r5, r6, lr — superset of what xenia-canary's --log_lr_on_pc patch emits, with a cycle counter added for cross-run reproducibility.

What this layer does NOT do

  • Does not capture VMX / FP register state (only GPRs r3..r6).
  • Does not buffer / batch records — one write_all per fire. For high-frequency probes (e.g. tight loops at >1M fires/sec), redirect to a file and use a SSD.

Determinism

Lockstep digest unaffected: probe firing happens after the per-instr hooks for ctor/branch probes and only emits side-channel output. Verified end-of-session: check sylpheed.iso --stable-digest -n 2M ×2 produced byte-identical digests (instructions=2000005).


Layer M5.5 — this-flow indirect-dispatch resolution (landed)

Schema additions

  • New table vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function) — every detected stw rVtable, vptr_off(rThis) site.
  • New table indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count) — one row per resolved dispatch.
  • New table indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address) — one row per (dispatch × candidate vtable). Joined to existing xrefs.kind='ind_call' edges (one ind_call row per candidate).
  • New indices on vptr_writes.vtable_address, vptr_writes.vptr_offset, indirect_dispatch_candidates.method_address, indirect_dispatch_candidates.vtable_address, indirect_dispatch_sites.(vptr_offset, slot).

What this layer does (class-membership inference)

  1. Phase 1 — vptr-write scan: walk every function with the lis+addi tracker; whenever stw rA, off(rB) writes a known M3 vtable address, record (vtable_addr, vptr_offset, writer_pc).
  2. Phase 2 — invert: build vtables_by_offset[vptr_off] = {V} for the set of vtables ever written at that offset.
  3. Phase 3 — dispatch detection: walk back ≤16 instructions from each bcctrl/bctr LK=1, find the canonical lwz vt, off(this); lwz fn, slot*4(vt); mtctr fn chain. Extract (vptr_off, slot). Bail on register clobber, branch, or label boundary.
  4. Phase 4 — emit: for each (dispatch_pc, vptr_off, slot), emit one xrefs.kind='ind_call' row per candidate vtable that has a matching slot. Multi-candidate rows are an over-approximation.

What this layer does NOT do

  • No alias resolution at multi-candidate sites — emits one edge per matching vtable. Downstream queries should filter indirect_dispatch_sites WHERE candidate_count=1 for high-confidence edges.
  • No flow-sensitive analysis: register state is killed at every label (basic-block boundary) and at bl/bcl calls (volatile r0..r12 + ctr). We do NOT propagate values across calls in the chain-walker.
  • No tracking of vptr writes via X-form indexed (stwx), VMX, or multiword stores. Only D-form stw rA, off(rB).
  • Does not synthesise vptr writes for inlined / elided constructors. If a class never has a writer at offset vptr_off, dispatches through that offset find no candidates.

Sylpheed yield

  • 567 vptr writes covering 214 distinct vtables (~30% of M3's 722).
  • 29 distinct vptr offsets used; offset 0 dominates (501/567 = 88%, single-inheritance).
  • 6,842 dispatch sites resolved: 97 single-candidate (high-confidence) + 6,745 multi-candidate (over-approximation).
  • 687,963 ind_call xref rows total.
  • 2,746 newly-reachable functions via the M5 BFS view (v_indirect_reachability_from_entry) compared to call/j/br alone.
  • Audit-009 cluster (renderer plateau): functions newly visible include 0x823BC9E0, 0x823BC290, 0x823BC5A0, 0x823BB158, 0x823BB1E0, 0x823BCAF0, 0x823BC4C8 — actionable starting points for the cluster's reachability hunt.

Reference docs

  • IBM PowerPC ABI (volatile/non-volatile register partition).
  • Itanium C++ ABI on vtable layout (offset-from-this model adapted by MSVC for Win32 PPC).

Layer M9.5 — __CxxFrameHandler scope-table parsing (landed)

Schema additions

  • New table eh_funcinfo(address PK, magic, max_state, p_unwind_map, n_try_blocks, p_try_block_map, n_ip_map_entries, p_ip_to_state_map, p_es_type_list, eh_flags).
  • New table eh_unwind_map(funcinfo_address, state_index, to_state, action_pc, PRIMARY KEY (funcinfo_address, state_index)).
  • New table eh_try_blocks(funcinfo_address, try_index, try_low, try_high, catch_high, n_catches, p_handler_array, PRIMARY KEY (funcinfo_address, try_index)).

What this layer does

  • Magic-scans .rdata for the documented MSVC FuncInfo signatures (0x19930520 / 0x19930521 / 0x19930522), reading 4-byte BE values on 4-byte alignment.
  • Sanity-checks max_state ≤ 10,000, n_try_blocks ≤ 1,000, all internal pointers landing in valid sections.
  • Walks pUnwindMap (8-byte UnwindMapEntry) and pTryBlockMap (20-byte TryBlockMapEntry) into one row each.

What this layer does NOT do

  • Does not associate FuncInfo records with their owning function via the bl __CxxFrameHandler registration site — joins to functions by best-effort PC-range queries. A future M9.6 can chase the registration to make the link explicit.
  • Does not parse pHandlerArray (per-try-block catch type info).

Sylpheed yield

  • 2,588 FuncInfo records (all version 0x19930522).
  • 10,019 unwind-map entries.
  • 315 try-blocks across the binary.

Layer M11.5 — Static-init driver chain detection (landed)

Schema additions

  • Reuses existing function_pointer_arrays table — drivers' arrays are emitted with kind='static_init', replacing M11's prologue-heuristic output where the structurally-grounded pattern fires.

What this layer does

  • Walks every detected function looking for the canonical _initterm- style loop: lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4 bounded by a comparison against another constant register.
  • Extracts (array_start, array_end) from the cursor's initial constant value and the end-comparand register.
  • Reads the array, validates each entry against func_analysis.functions, and emits the array as static_init.

What this layer does NOT do

  • Doesn't handle drivers with multiple back-to-back trampoline loops.
  • Doesn't follow _initterm_e return-status semantics — both _initterm and _initterm_e match if the loop body matches.

Sylpheed yield

  • 0 drivers detected. Sylpheed's static-init structure does not match the canonical CRT loop pattern; the binary likely calls ctors via another mechanism (inline at the entry point, or via a different driver shape). Infrastructure ready for any binary with the documented MSVC pattern.

Layer VMX — Vector-store xrefs (M6 follow-up, landed)

Extends the M6 X-form opcode-31 dispatch in xref.rs with AltiVec/VMX vector loads and stores. New entries (XO codes):

  • lvx (103), lvxl (359), lvebx (7), lvehx (39), lvewx (71) — addr_mode='x_form_indexed', kind='read'.
  • stvx (231), stvxl (487), stvebx (135), stvehx (167), stvewx (199) — addr_mode='x_form_indexed', kind='write'.

Same constraint as M6: rows emitted only when both rA and rB resolve to known constants (rare but useful).

Sylpheed yield

  • 110 stvx writes newly resolved.

Layer SJIS+UTF-8 — Localised-string detection (M7 follow-up, landed)

Extends xenia_analysis::strings::analyze with two additional scanners.

Shift_JIS detection

Per JIS X 0208: lead byte ∈ [0x81, 0x9F] [0xE0, 0xEF]; trail byte ∈ [0x40, 0x7E] [0x80, 0xFC]. Single-byte ASCII and JIS half-width katakana (0xA1..=0xDF) are passed through. At least one multi-byte pair must be present (so we don't double-count pure ASCII). SJIS bytes are rendered as \\xHH escapes in the content column for diagnostic readability — full SJIS→UTF-8 decoding is a future enhancement.

UTF-8 detection

Validates 2-byte (110xxxxx 10xxxxxx) and 3-byte (1110xxxx 10xxxxxx 10xxxxxx) sequences plus printable ASCII. Skips 4-byte (supplementary plane) which is rare in game text.

Sylpheed yield

  • 790 Shift_JIS strings (Japanese debug + UI text, including [WARNING] ードに割り当てるエフェクトIDの指定がない ノードデータが見つからない style mission strings).
  • 39 UTF-8 strings.
  • 6,311 ASCII strings (unchanged from M7).

Forward work (not yet landed)

  • M9.6 — link eh_funcinfo records back to their owning functions via bl __CxxFrameHandler registration sites + per-try-block pHandlerArray parsing.
  • M11.6 — relax M11.5 to detect non-canonical static-init driver shapes (_initterm_e with status return, custom drivers).
  • Full SJIS → UTF-8 decoding in the strings.content column.
  • VMX128 (opcode 4) vector-store xrefs — separate encoding space, low ROI; document if Sylpheed's renderer cluster uses it.