Files
xenia-rs/crates/xenia-analysis/SCHEMA.md
MechaCat02 e428ce33aa M9.5 + M11.5 + VMX + SJIS/UTF-8: close the post-M5.5 deferred set
Closes the four remaining deferred follow-up items in one bundle.
All four are smaller-scope and additive; lockstep determinism
unaffected (analyzer-only changes).

## M9.5 — __CxxFrameHandler scope-table parsing

- New `xenia_analysis::eh_scope` module. Magic-scans .rdata for the
  three documented MSVC FuncInfo signatures (0x19930520/21/22) on
  4-byte alignment. Each match is parsed as the documented struct
  (BE u32 fields), with sanity caps on max_state / n_try_blocks /
  pointer validity.
- Walks pUnwindMap (UnwindMapEntry, 8 bytes) and pTryBlockMap
  (TryBlockMapEntry, 20 bytes) into one row each.
- New tables eh_funcinfo, eh_unwind_map, eh_try_blocks.
- Sylpheed yield: 2,588 FuncInfo (all version 0x19930522) /
  10,019 unwind entries / 315 try-blocks.

## M11.5 — Static-init driver chain detection

- New `xenia_analysis::static_init` module. Walks every function
  looking for the canonical _initterm loop: lwz cursor; mtctr;
  bcctrl; addi cursor, cursor, 4 bounded by a compare against another
  constant register. Extracts (array_start, array_end) and reads
  the array.
- Reuses `function_pointer_arrays` table — drivers' arrays land with
  kind='static_init' (replacing M11's prologue-heuristic output where
  the structurally-grounded pattern fires).
- Sylpheed yield: 0 drivers detected — the binary's static-init
  structure does not match the canonical CRT loop. Infrastructure
  ready; future M11.6 can relax.

## VMX vector-store xrefs (M6 follow-up)

- Adds AltiVec/VMX X-form load/store XOs to the M6 opcode-31
  dispatch: lvx/lvxl/lvebx/lvehx/lvewx (reads) and
  stvx/stvxl/stvebx/stvehx/stvewx (writes), all addr_mode=
  'x_form_indexed'. Static resolution still requires both rA and rB
  constant.
- Sylpheed yield: 110 newly-detected stvx writes.

## Shift_JIS + UTF-8 localised-string detection (M7 follow-up)

- Extends `xenia_analysis::strings::analyze` with scan_shift_jis (JIS
  X 0208 lead/trail byte ranges + half-width katakana pass-through)
  and scan_utf8 (2- and 3-byte sequences). At least one multi-byte
  unit required so pure-ASCII strings aren't double-counted.
- SJIS bytes rendered as \xHH escapes for diagnostic readability;
  full SJIS→UTF-8 decoding deferred.
- Sylpheed yield: 790 Shift_JIS strings (Japanese debug + UI text)
  + 39 UTF-8.

## Tests

- +2 EH (parses_minimal_funcinfo_v0, rejects_bogus_max_state)
- +2 static_init (detects_canonical_initterm_loop, rejects_function_without_pattern)
- +2 strings (detects_shift_jis_string, detects_utf8_multibyte_string)

Tests 649→655 (+6 unit tests). DB schema golden + write_analysis_results
signature updated for new EH parameter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-10 00:36:53 +02:00

571 lines
26 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# `xenia-analysis` schema reference
Authoritative documentation for the DuckDB tables and SQL views produced by
`xenia-rs dis --db sylpheed.db`. Track schema changes here alongside any
update to the `db_schema_golden` test fixture.
The base + disasm tables (`metadata`, `sections`, `imports`, `functions`,
`labels`, `instructions`, `xrefs`, opt-in `exec_trace` / `import_calls` /
`branch_trace`) are documented inline in `src/db.rs` doc comment. This file
collects layered analysis additions and forward-work notes.
---
## Layer M1 — `.pdata` boundary correction (landed)
### Schema additions
- `functions.pdata_validated BOOLEAN NOT NULL``true` when the row's
`address` matches a `RUNTIME_FUNCTION.BeginAddress` from `.pdata`. Linker
ground truth.
- `functions.pdata_length BIGINT NULL``function_length` (bytes) from the
matching pdata entry; `NULL` when the row is prologue-only.
- New table `pdata_entries(begin_address BIGINT PRIMARY KEY, end_address
BIGINT, function_length BIGINT, prolog_length BIGINT, flags BIGINT)` — every
parsed `.pdata` `RUNTIME_FUNCTION` entry (raw, before any merge with
prologue analysis).
- Index `idx_functions_pdata_validated` on `functions(pdata_validated)`.
### What this layer does
- Parses `.pdata` 8-byte `RUNTIME_FUNCTION` entries (PowerPC PE32 layout):
word 0 `BeginAddress` (absolute VA), word 1 packed
`{prolog_length:8, function_length:22, flags:2}`, both big-endian.
- Unions pdata `BeginAddress` values into the function-candidate set fed to
the prologue walker, so functions our prologue heuristic missed still get
rows.
- When pdata supplies a longer `function_length` than the prologue walk
found, extends `end_address` to the pdata-implied end (catches mis-split
where the walker stopped at an early `blr`).
- After the walker, performs a forward pass that trims `function.end` to the
next start when they overlap (catches mis-merge where one row spanned two
prologues — the audit-031 `sub_824D23B0` / `sub_824D29F0` case).
### What this layer does NOT do
- Does not adjust prolog-derived `frame_size` / `saved_gprs` from `.pdata`'s
`prolog_length` field — those remain prologue-only inferences.
- Does not classify functions further than the existing `is_leaf` /
`is_saverestore` columns. Class membership is M3.
- Does not detect functions whose entries are missing from BOTH `.pdata`
and the bl-target scan (extremely rare; would require executable-byte
linear sweep).
### Reference docs
- Microsoft PE32+ exception data spec for PowerPC RUNTIME_FUNCTION.
- xenia-canary `src/xenia/cpu/xex_module.cc:1570-1587` — canary's reference
parser (extracts `BeginAddress` only; we additionally decode word 1).
### Validation queries
```sql
-- All pdata entries found
SELECT COUNT(*) FROM pdata_entries; -- ~23073 for Sylpheed
-- Functions cross-validated against pdata
SELECT COUNT(*) FROM functions WHERE pdata_validated;
-- Functions detected ONLY by prologue (orphans of pdata)
SELECT COUNT(*) FROM functions WHERE NOT pdata_validated;
-- Pdata orphans NOT yet in functions (should be 0 after this layer)
SELECT COUNT(*) FROM pdata_entries p
LEFT JOIN functions f ON f.address = p.begin_address
WHERE f.address IS NULL;
-- Audit-031 mis-merge resolved: 0x824D29F0 should have its own row
SELECT name FROM functions WHERE address = 2186674160; -- 0x824D29F0
```
---
## Layer M2 — MSVC C++ name demangler (landed)
### Schema additions
- New table `demangled_names(address BIGINT NULL, mangled VARCHAR NOT NULL,
raw_demangled VARCHAR NOT NULL, namespace_path VARCHAR NULL,
class_name VARCHAR NULL, method_name VARCHAR NULL,
params_signature VARCHAR NULL)`.
- Indices on `address`, `class_name`, `method_name`.
### What this layer does
- Wraps `msvc_demangler::demangle` (a Rust port of LLVM's
`MicrosoftDemangle.cpp`) and splits the formatted output into structured
fields via a heuristic top-level parser (handles templates and nested parens
correctly).
- Populates `demangled_names` from any label whose name starts with `?` plus
any import name that happens to be mangled (defensive — typical kernel
imports use C names).
### What this layer does NOT do
- Does not parse the AST returned by `msvc_demangler::parse` — uses the formatted
string and a heuristic split. Adequate for typical class member functions
and RTTI strings; exotic template / lambda forms still get `raw_demangled`
populated but may have NULL structured fields.
- Does not yet ingest RTTI strings discovered in `.rdata` — that's M3's job;
M3 will append rows to this table at the addresses where it finds RTTI
TypeDescriptors.
### Reference docs
- `msvc-demangler` crate (`https://docs.rs/msvc-demangler/0.11`).
- LLVM `MicrosoftDemangle.cpp` (the parser this crate ports).
## Layer M3 — Vtable + RTTI detection (landed)
### Schema additions
- `vtables(address PK, length, col_address NULL, class_name, rtti_present,
base_classes_json NULL)` — every detected static vtable.
- `methods(vtable_address, slot, function_address, mangled_name NULL,
demangled_name NULL, PRIMARY KEY (vtable_address, slot))` — one row per
method slot.
- `classes(name PK, vtable_address, rtti_present, base_classes_json NULL)` —
deduped by class name (first-detected vtable wins).
- Indices: `methods.function_address`, `classes.rtti_present`.
### What this layer does
- Walks `.rdata` and `.data` looking for runs of ≥3 consecutive 4-byte BE
values where each value is a known function start (from M1's corrected
`functions` table). Single-2-method vtables are intentionally rejected to
control false-positive rate.
- Attempts the MSVC RTTI walk `vtable[-1] → CompleteObjectLocator → TypeDescriptor`
for each candidate. When successful, the demangled `class ClassName`
string fills `class_name` and a best-effort
`RTTIClassHierarchyDescriptor` walk fills `base_classes_json` (JSON array
of base class names).
- Falls back to `ANON_Class_<8-hex>` keyed by FNV-1a hash of the sorted
method-PC tuple when RTTI is absent (typical for shipped game binaries).
Identical vtables across the binary (multiple instances) collapse to the
same anonymous name.
### What this layer does NOT do
- Vtables built at runtime in heap-allocated memory (e.g. by ctors copying
static templates) are out of scope — only static `.rdata`/`.data` content.
- Multiple-inheritance "extra" vftables (one per base subobject) are detected
as independent vtables with no link between them.
- Inheritance-tree walking beyond `RTTIClassHierarchyDescriptor`'s direct
base list is not attempted.
### Reference docs
- openrce.org "Reversing Microsoft Visual C++" — RTTI layout articles
(CompleteObjectLocator at vtable[-1]; TypeDescriptor at COL+0xC; mangled
name at TD+0x8).
## Layer M4 — Class-aware probe targeting (landed)
CLI extension only — no schema changes. The probe-token grammar adds three
symbolic forms on top of the existing `0xADDR` literal:
- `Class::method` — joins `classes` × `methods` × `demangled_names` to find
every PC whose vtable belongs to that class and whose demangled
`method_name` matches.
- `Class::*` — joins `classes` × `methods` to find every method PC of that
class.
- `function_name` — falls back to `functions.name` lookup for free functions
/ saverestore stubs / labels.
Numeric tokens never touch the DB (preserves zero-IO fast path; lockstep
digest unaffected). Symbolic tokens require the DuckDB at `--probe-db PATH`
or `XENIA_PROBE_DB`; default is `sylpheed.db` next to the .iso when present.
Resolution happens BEFORE guest exec begins, so it cannot affect the
lockstep digest.
See `crates/xenia-analysis/src/lookup.rs`.
---
## Layer M5 — Indirect-dispatch reachability (landed)
### Schema additions
- New value `'ind_call'` in the `xrefs.kind` set.
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
`v_reachability_from_entry`, taking `ind_call` edges in the BFS.
### What this layer does
- Walks each `FuncAnalysis.functions` entry with a per-basic-block register
tracker. Recognises the canonical static-vtable pattern:
`lis+addi → lwz off(rA) → mtctr → bcctrl`, where `rA` ends up holding a
known vtable's start address from M3.
- Honours the PowerPC ABI: `bl`-style calls (op 18 / 16 with LK=1) clobber
volatile r0..r12 + ctr but preserve non-volatile r13..r31, so a vtable
pointer parked in r30/r31 before a call survives.
- Treats every M3 `loc_*` label as a basic-block boundary (kills register
state) so jump-IN paths cannot induce false positives.
### What this layer does NOT do (and observed impact)
- Vtable pointer loaded from a `this`-pointer field
(`lwz r_vt, off(rA)` where `rA = this`) — by far the dominant pattern in
real C++ — is unresolvable without alias / points-to analysis.
- On Sylpheed: the layer detects 0 edges. The binary's 1,001 lis+addi
references into vtables are mostly constructor-side **vptr writes**
(`stw rVtable, vptr_offset(this)`), not direct dispatches. The renderer
hunt's audit-009 cluster therefore needs a future M5.5 with `this`-flow
tracking before this layer surfaces it.
### Reference docs
- IBM PowerPC ABI: register-save convention (volatile r0..r12 + ctr,
non-volatile r13..r31).
## Layer M7 — String / constant-pool detection (landed)
### Schema additions
- New table `strings(address PK, encoding, length, content)`.
- Index `idx_strings_encoding`.
### What this layer does
- Scans `.rdata` for runs of length ≥ 6 of printable ASCII bytes followed by
a NUL terminator.
- Scans `.rdata` for UTF-16LE runs of length ≥ 6 code units (printable-ASCII
basic plane only) followed by a u16 NUL terminator.
- Cross-reference is implicit: existing `xrefs.kind='ref'` rows whose
`target` falls in `strings.address`'s exact match set name the referencing
PCs. SQL: `SELECT s.content, x.source FROM xrefs x JOIN strings s
ON s.address = x.target WHERE x.kind='ref'`.
### What this layer does NOT do
- No UTF-8 multibyte / non-ASCII basic plane in either encoding.
- No `.data` scan (read-only-section bias).
- No multi-byte CJK encodings — Japanese text in localised builds may be
represented in shift_jis / utf-8 with non-printable bytes that this
scanner skips.
### Sylpheed yield
- 6,311 ASCII strings (including full embedded HLSL shader source).
- 0 UTF-16LE strings (binary uses ASCII / native CJK encoding).
- 9,132 lis+addi sites cross-reference into the detected strings — names
the source PCs that reference each string.
## Layer M6 — Extended store-class xrefs + `addr_mode` column (landed)
### Schema additions
- `xrefs.addr_mode VARCHAR NULL` — sub-classifies how the source instruction
computes its target. NULL for control-flow edges (call / ind_call / j /
br); one of the following tags for data edges:
- `d_form` — standard signed-16 displacement (lwz/stw/lfs/stfs/etc.)
- `lis_addi` — address materialised via `lis + addi` register tracking
- `lis_ori` — address materialised via `lis + ori`
- `multiword` — `lmw / stmw` (one xref per slot; up to 32-rS slots)
- `x_form_indexed` — `stwx / stbx / sthx / stwux / stbux / sthux / stdx /
stdux / lwzx / lbzx / lhzx / lhax / lwzux / lbzux / lhzux / lhaux / ldx /
ldux` — emitted only when both rA and rB are tracked constants
- `x_form_byterev` — `stwbrx / sthbrx / lwbrx / lhbrx`
- `atomic` — `stwcx. / stdcx.` reservation-conditional stores
- `dcbz` — cache-line clear (32-byte zero at rA+rB)
- Index `idx_xrefs_addr_mode`.
### What this layer does
- Tags every existing data xref with its addressing mode (`d_form` for the
bulk; `lis_addi` / `lis_ori` for the lift-and-add cases that produce
DataRef rows).
- Adds new dispatch for opcode 47 (`stmw`) and 46 (`lmw`), expanding to
per-slot DataWrite / DataRead rows.
- Adds new dispatch for opcode 31 X-form: stores, atomic, byte-reverse,
dcbz. X-form rows are emitted ONLY when both rA and rB resolve to known
constants (otherwise the address is runtime-dependent and we skip).
### What this layer does NOT do
- VMX / VMX128 vector stores (opcode 31 with vector XO codes) are not
emitted — they always have register-indexed addresses that the
lis+addi tracker can't usually resolve, and detecting them adds noise
without improving target resolution.
- The dominant runtime-of-stwx pattern (rA = base, rB = runtime index) is
not resolved — by design; mem-watch covers the runtime side per VERIFY-B.
### Sylpheed yield
- 28,834 `lis_addi` refs, 18,485 `d_form` reads, 3,288 `d_form` writes —
the existing baseline now properly tagged.
- **442 newly-detected `x_form_indexed` reads** — primarily lwzx/lhzx
reads from in-table dispatch (each pair (rA,rB) resolved statically).
- **40 newly-detected `atomic` writes** — every `stwcx.` site with a
resolvable address; useful for reservation-table audits.
- 9 `lis_ori` refs.
- 0 multiword / dcbz / byterev — these instructions exist in the binary
but are not in lis+addi-tracked code paths.
## Layer M8 + M11 — Function-pointer arrays beyond vtables (landed)
### Schema additions
- New table `function_pointer_arrays(address PK, length, kind)` where
`kind` is `'vtable'` (M3 re-emit), `'dispatch_table'` (M8), or
`'static_init'` (M11).
- New table `function_pointer_array_entries(array_address, slot,
function_address, PRIMARY KEY (array_address, slot))` — one row per
slot of every detected array (vtable + non-vtable).
- Indices on `function_pointer_arrays.kind` and
`function_pointer_array_entries.function_address`.
### What this layer does
- Walks `.rdata` (only — `.data` produces too many false positives) for
runs of ≥ 2 consecutive 4-byte BE values where each value is a known
function entry from M1's `functions` table.
- Skips runs whose start matches an M3 vtable head — those are re-emitted
in this table with `kind='vtable'` for unified queries but not
re-classified.
- Heuristically classifies non-vtable runs:
- `static_init` (M11): every entry's first instruction is `mfspr r12, LR`
AND the next is `stwu r1, -N(r1)` with `N ≤ 0x80` (or a save-stub `bl`).
Mirrors the typical C++ static-initialiser prologue.
- `dispatch_table` (M8): everything else.
### What this layer does NOT do
- Does not parse symbol-table-bracketed regions like `__xc_a` / `__xc_z`
/ `__xi_a` / `__xi_z` directly — Sylpheed's symbol table is stripped.
- Does not chain multi-segment static-init drivers; future M11.5 could
walk the entry-point's static-init driver call chain to surface
ground-truth ctor PCs.
- 2-slot runs in `.rdata` may be false positives where two struct fields
happen to alias function VAs; downstream queries should use a length
filter (`WHERE length >= 3`) when high precision matters.
### Sylpheed yield
- 722 vtables (M3 re-emit) + 388 dispatch_tables = 1,110 arrays in
`function_pointer_arrays`.
- 0 static_init detected — Sylpheed's ctors don't all match the
conservative prologue heuristic. Lengths concentrate at 2 slots
(typical of switch-case jump tables).
## Layer M9 — `has_eh` from `.pdata` exception flag (landed)
### Schema additions
- `functions.has_eh BOOLEAN NOT NULL` — true when `.pdata`'s exception-
handler-present bit (bit 31 of word 1, the high bit) is set.
- Index `idx_functions_has_eh`.
### What this layer does
- Derived directly from M1's already-parsed `pdata.flags` bit field (no
new parsing). The bit was always available in `pdata_entries.flags`;
this layer surfaces it as a first-class column on `functions`.
### What this layer does NOT do
- Does not parse the actual `__CxxFrameHandler` / `__C_specific_handler`
scope-table records that the exception bit gates. Walking those tables
would let us name try/catch ranges and per-state cleanup actions, but
is out of scope for a derive-only milestone.
### Sylpheed yield
- 2,975 of 23,073 pdata-validated functions have `has_eh=true` (12.9%) —
plausible MSVC C++ EH coverage rate. Largest EH function: 26,328 bytes
(`sub_823518F0`).
## Layer M10 — `.tls` section / TLS directory (landed)
### Schema additions
- New table `tls_info(raw_data_start, raw_data_end, index_address,
callback_array, zero_fill_size, characteristics)` — at most one row
(the IMAGE_TLS_DIRECTORY32).
- New table `tls_callbacks(slot PK, address)` — one row per resolved TLS
callback function.
### What this layer does
- Reads the first 24 bytes of the `.tls` section as an
`IMAGE_TLS_DIRECTORY32` and walks the zero-terminated callback array.
- All addresses stored as absolute VAs.
### What this layer does NOT do
- Does not parse the raw TLS template content (the variable initialiser
block); just records its start/end VAs.
### Sylpheed yield
- 0 rows — Sylpheed has no `.tls` section. Infrastructure ready for any
binary that uses `__declspec(thread)` storage.
## Layer M12 — `--lr-trace` runtime canary-diff harness (landed)
### Runtime additions (no DB)
- New CLI flag `--lr-trace=PC[,PC,...]` on `exec` — comma-separated PCs
to capture as JSONL records on every fire. Symbolic tokens (`Class::method`)
resolve via M4's lookup against `--probe-db`. Settable via
`XENIA_LR_TRACE`.
- New CLI flag `--lr-trace-out=PATH` — writes JSONL to a file (one
record per line). Stdout when omitted. Settable via `XENIA_LR_TRACE_OUT`.
- New kernel state fields `lr_trace_pcs: HashSet<u32>` +
`lr_trace_writer: Option<Mutex<File>>` and helper
`KernelState::fire_lr_trace_if_match(hw_id)` invoked from the
per-instruction probe slot.
### JSONL record fields
`pc, tid, hw, cycle, r3, r4, r5, r6, lr` — superset of what
xenia-canary's `--log_lr_on_pc` patch emits, with a cycle counter added
for cross-run reproducibility.
### What this layer does NOT do
- Does not capture VMX / FP register state (only GPRs r3..r6).
- Does not buffer / batch records — one `write_all` per fire. For
high-frequency probes (e.g. tight loops at >1M fires/sec), redirect
to a file and use a SSD.
### Determinism
Lockstep digest unaffected: probe firing happens after the per-instr
hooks for ctor/branch probes and only emits side-channel output. Verified
end-of-session: `check sylpheed.iso --stable-digest -n 2M` ×2 produced
byte-identical digests (`instructions=2000005`).
---
## Layer M5.5 — `this`-flow indirect-dispatch resolution (landed)
### Schema additions
- New table `vptr_writes(writer_pc, vtable_address, vptr_offset, writer_function)` —
every detected `stw rVtable, vptr_off(rThis)` site.
- New table `indirect_dispatch_sites(dispatch_pc PK, vptr_offset, slot, candidate_count)` —
one row per resolved dispatch.
- New table `indirect_dispatch_candidates(dispatch_pc, vtable_address, method_address)` —
one row per (dispatch × candidate vtable). Joined to existing
`xrefs.kind='ind_call'` edges (one ind_call row per candidate).
- New indices on `vptr_writes.vtable_address`, `vptr_writes.vptr_offset`,
`indirect_dispatch_candidates.method_address`,
`indirect_dispatch_candidates.vtable_address`,
`indirect_dispatch_sites.(vptr_offset, slot)`.
### What this layer does (class-membership inference)
1. **Phase 1 — vptr-write scan**: walk every function with the lis+addi
tracker; whenever `stw rA, off(rB)` writes a known M3 vtable address,
record `(vtable_addr, vptr_offset, writer_pc)`.
2. **Phase 2 — invert**: build `vtables_by_offset[vptr_off] = {V}` for the
set of vtables ever written at that offset.
3. **Phase 3 — dispatch detection**: walk back ≤16 instructions from each
`bcctrl`/`bctr LK=1`, find the canonical
`lwz vt, off(this); lwz fn, slot*4(vt); mtctr fn` chain. Extract
`(vptr_off, slot)`. Bail on register clobber, branch, or label
boundary.
4. **Phase 4 — emit**: for each `(dispatch_pc, vptr_off, slot)`, emit one
`xrefs.kind='ind_call'` row per candidate vtable that has a
matching slot. Multi-candidate rows are an over-approximation.
### What this layer does NOT do
- No alias resolution at multi-candidate sites — emits one edge per
matching vtable. Downstream queries should filter
`indirect_dispatch_sites WHERE candidate_count=1` for high-confidence
edges.
- No flow-sensitive analysis: register state is killed at every label
(basic-block boundary) and at `bl`/`bcl` calls (volatile r0..r12 +
ctr). We do NOT propagate values across calls in the chain-walker.
- No tracking of vptr writes via X-form indexed (`stwx`), VMX, or
multiword stores. Only D-form `stw rA, off(rB)`.
- Does not synthesise vptr writes for inlined / elided constructors.
If a class never has a writer at offset `vptr_off`, dispatches
through that offset find no candidates.
### Sylpheed yield
- 567 vptr writes covering 214 distinct vtables (~30% of M3's 722).
- 29 distinct vptr offsets used; offset 0 dominates (501/567 = 88%,
single-inheritance).
- **6,842 dispatch sites resolved**: 97 single-candidate
(high-confidence) + 6,745 multi-candidate (over-approximation).
- 687,963 `ind_call` xref rows total.
- **2,746 newly-reachable functions** via the M5 BFS view
(`v_indirect_reachability_from_entry`) compared to call/j/br alone.
- Audit-009 cluster (renderer plateau): functions newly visible
include `0x823BC9E0`, `0x823BC290`, `0x823BC5A0`, `0x823BB158`,
`0x823BB1E0`, `0x823BCAF0`, `0x823BC4C8` — actionable starting
points for the cluster's reachability hunt.
### Reference docs
- IBM PowerPC ABI (volatile/non-volatile register partition).
- Itanium C++ ABI on vtable layout (offset-from-`this` model adapted
by MSVC for Win32 PPC).
## Layer M9.5 — `__CxxFrameHandler` scope-table parsing (landed)
### Schema additions
- New table `eh_funcinfo(address PK, magic, max_state, p_unwind_map,
n_try_blocks, p_try_block_map, n_ip_map_entries, p_ip_to_state_map,
p_es_type_list, eh_flags)`.
- New table `eh_unwind_map(funcinfo_address, state_index, to_state, action_pc,
PRIMARY KEY (funcinfo_address, state_index))`.
- New table `eh_try_blocks(funcinfo_address, try_index, try_low, try_high,
catch_high, n_catches, p_handler_array,
PRIMARY KEY (funcinfo_address, try_index))`.
### What this layer does
- Magic-scans `.rdata` for the documented MSVC FuncInfo signatures
(0x19930520 / 0x19930521 / 0x19930522), reading 4-byte BE values
on 4-byte alignment.
- Sanity-checks `max_state` ≤ 10,000, `n_try_blocks` ≤ 1,000, all
internal pointers landing in valid sections.
- Walks `pUnwindMap` (8-byte UnwindMapEntry) and `pTryBlockMap`
(20-byte TryBlockMapEntry) into one row each.
### What this layer does NOT do
- Does not associate FuncInfo records with their owning function via
the `bl __CxxFrameHandler` registration site — joins to `functions`
by best-effort PC-range queries. A future M9.6 can chase the
registration to make the link explicit.
- Does not parse `pHandlerArray` (per-try-block catch type info).
### Sylpheed yield
- 2,588 FuncInfo records (all version 0x19930522).
- 10,019 unwind-map entries.
- 315 try-blocks across the binary.
## Layer M11.5 — Static-init driver chain detection (landed)
### Schema additions
- Reuses existing `function_pointer_arrays` table — drivers' arrays are
emitted with `kind='static_init'`, replacing M11's prologue-heuristic
output where the structurally-grounded pattern fires.
### What this layer does
- Walks every detected function looking for the canonical `_initterm`-
style loop: `lwz cursor; mtctr; bcctrl; addi cursor, cursor, 4`
bounded by a comparison against another constant register.
- Extracts `(array_start, array_end)` from the cursor's initial
constant value and the end-comparand register.
- Reads the array, validates each entry against
`func_analysis.functions`, and emits the array as `static_init`.
### What this layer does NOT do
- Doesn't handle drivers with multiple back-to-back trampoline loops.
- Doesn't follow `_initterm_e` return-status semantics — both
`_initterm` and `_initterm_e` match if the loop body matches.
### Sylpheed yield
- 0 drivers detected. Sylpheed's static-init structure does not match
the canonical CRT loop pattern; the binary likely calls ctors via
another mechanism (inline at the entry point, or via a different
driver shape). Infrastructure ready for any binary with the
documented MSVC pattern.
## Layer VMX — Vector-store xrefs (M6 follow-up, landed)
Extends the M6 X-form opcode-31 dispatch in `xref.rs` with AltiVec/VMX
vector loads and stores. New entries (XO codes):
- `lvx` (103), `lvxl` (359), `lvebx` (7), `lvehx` (39), `lvewx` (71)
— `addr_mode='x_form_indexed'`, `kind='read'`.
- `stvx` (231), `stvxl` (487), `stvebx` (135), `stvehx` (167),
`stvewx` (199) — `addr_mode='x_form_indexed'`, `kind='write'`.
Same constraint as M6: rows emitted only when both `rA` and `rB`
resolve to known constants (rare but useful).
### Sylpheed yield
- 110 `stvx` writes newly resolved.
## Layer SJIS+UTF-8 — Localised-string detection (M7 follow-up, landed)
Extends `xenia_analysis::strings::analyze` with two additional scanners.
### Shift_JIS detection
Per JIS X 0208: lead byte ∈ [0x81, 0x9F] [0xE0, 0xEF];
trail byte ∈ [0x40, 0x7E] [0x80, 0xFC]. Single-byte ASCII and JIS
half-width katakana (0xA1..=0xDF) are passed through. At least one
multi-byte pair must be present (so we don't double-count pure ASCII).
SJIS bytes are rendered as `\\xHH` escapes in the `content` column for
diagnostic readability — full SJIS→UTF-8 decoding is a future
enhancement.
### UTF-8 detection
Validates 2-byte (`110xxxxx 10xxxxxx`) and 3-byte
(`1110xxxx 10xxxxxx 10xxxxxx`) sequences plus printable ASCII. Skips
4-byte (supplementary plane) which is rare in game text.
### Sylpheed yield
- 790 Shift_JIS strings (Japanese debug + UI text, including
`[WARNING] ードに割り当てるエフェクトIDの指定がない ノードデータが見つからない` style mission strings).
- 39 UTF-8 strings.
- 6,311 ASCII strings (unchanged from M7).
## Forward work (not yet landed)
- **M9.6** — link `eh_funcinfo` records back to their owning functions
via `bl __CxxFrameHandler` registration sites + per-try-block
`pHandlerArray` parsing.
- **M11.6** — relax M11.5 to detect non-canonical static-init driver
shapes (`_initterm_e` with status return, custom drivers).
- Full SJIS → UTF-8 decoding in the `strings.content` column.
- VMX128 (opcode 4) vector-store xrefs — separate encoding space, low
ROI; document if Sylpheed's renderer cluster uses it.