M5+M7: indirect-dispatch reachability + .rdata string detection
Two MEDIUM milestones bundled (both opportunistic per plan; both small).
## M5 — indirect-dispatch reachability
- `xenia_analysis::indirect`: per-basic-block register tracker over each
detected function. Recognises the canonical static-vtable pattern
`lis+addi → lwz off(rA) → mtctr → bcctrl` where rA holds a known M3
vtable address. Emits one `Xref { kind: IndirectCall }` per resolvable
bcctrl site.
- PowerPC ABI awareness: `bl`-style calls clobber volatile r0..r12 + ctr
but preserve non-volatile r13..r31, so a vtable pointer parked in r30/r31
before a call survives.
- Label-based basic-block boundaries kill register state — bounds
false-positive risk for jump-IN paths.
- New `XrefKind::IndirectCall` variant (DB tag `'ind_call'`).
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
`v_reachability_from_entry`, taking `ind_call` edges in the BFS.
Sylpheed yield: 0 edges detected. The binary's 1,001 static lis+addi
references into vtables are nearly all constructor-side vptr writes, not
dispatches; real method dispatch goes through `this->vptr` which requires
alias analysis we explicitly don't do. Documented in SCHEMA.md as the
expected limitation. Three unit tests cover the synthetic-correctness path.
## M7 — string / constant-pool detection
- `xenia_analysis::strings`: scans `.rdata` for runs of ≥ 6 printable
ASCII bytes (NUL-terminated) and ≥ 6 UTF-16LE code units (basic-plane
printable ASCII, NUL u16 terminator).
- New `strings(address PK, encoding, length, content)` table + encoding index.
- Implicit cross-ref via existing `xrefs.kind='ref'` rows whose target
matches a strings.address.
Sylpheed yield: 6,311 ASCII strings (including embedded HLSL shader source
and AS_CB_SURFACE_SWIZZLE_* assertion strings). 9,132 lis+addi sites
cross-reference detected strings — names source PCs near each string in
one query. Four unit tests cover encoding detection, NUL termination, and
short-run rejection.
Tests 626→633 (+3 indirect, +4 strings).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -166,11 +166,70 @@ See `crates/xenia-analysis/src/lookup.rs`.
|
||||
|
||||
---
|
||||
|
||||
## Forward work (M5–M12, not yet landed)
|
||||
## Layer M5 — Indirect-dispatch reachability (landed)
|
||||
|
||||
### Schema additions
|
||||
- New value `'ind_call'` in the `xrefs.kind` set.
|
||||
- New SQL view `v_indirect_reachability_from_entry` — strict superset of
|
||||
`v_reachability_from_entry`, taking `ind_call` edges in the BFS.
|
||||
|
||||
### What this layer does
|
||||
- Walks each `FuncAnalysis.functions` entry with a per-basic-block register
|
||||
tracker. Recognises the canonical static-vtable pattern:
|
||||
`lis+addi → lwz off(rA) → mtctr → bcctrl`, where `rA` ends up holding a
|
||||
known vtable's start address from M3.
|
||||
- Honours the PowerPC ABI: `bl`-style calls (op 18 / 16 with LK=1) clobber
|
||||
volatile r0..r12 + ctr but preserve non-volatile r13..r31, so a vtable
|
||||
pointer parked in r30/r31 before a call survives.
|
||||
- Treats every M3 `loc_*` label as a basic-block boundary (kills register
|
||||
state) so jump-IN paths cannot induce false positives.
|
||||
|
||||
### What this layer does NOT do (and observed impact)
|
||||
- Vtable pointer loaded from a `this`-pointer field
|
||||
(`lwz r_vt, off(rA)` where `rA = this`) — by far the dominant pattern in
|
||||
real C++ — is unresolvable without alias / points-to analysis.
|
||||
- On Sylpheed: the layer detects 0 edges. The binary's 1,001 lis+addi
|
||||
references into vtables are mostly constructor-side **vptr writes**
|
||||
(`stw rVtable, vptr_offset(this)`), not direct dispatches. The renderer
|
||||
hunt's audit-009 cluster therefore needs a future M5.5 with `this`-flow
|
||||
tracking before this layer surfaces it.
|
||||
|
||||
### Reference docs
|
||||
- IBM PowerPC ABI: register-save convention (volatile r0..r12 + ctr,
|
||||
non-volatile r13..r31).
|
||||
|
||||
## Layer M7 — String / constant-pool detection (landed)
|
||||
|
||||
### Schema additions
|
||||
- New table `strings(address PK, encoding, length, content)`.
|
||||
- Index `idx_strings_encoding`.
|
||||
|
||||
### What this layer does
|
||||
- Scans `.rdata` for runs of length ≥ 6 of printable ASCII bytes followed by
|
||||
a NUL terminator.
|
||||
- Scans `.rdata` for UTF-16LE runs of length ≥ 6 code units (printable-ASCII
|
||||
basic plane only) followed by a u16 NUL terminator.
|
||||
- Cross-reference is implicit: existing `xrefs.kind='ref'` rows whose
|
||||
`target` falls in `strings.address`'s exact match set name the referencing
|
||||
PCs. SQL: `SELECT s.content, x.source FROM xrefs x JOIN strings s
|
||||
ON s.address = x.target WHERE x.kind='ref'`.
|
||||
|
||||
### What this layer does NOT do
|
||||
- No UTF-8 multibyte / non-ASCII basic plane in either encoding.
|
||||
- No `.data` scan (read-only-section bias).
|
||||
- No multi-byte CJK encodings — Japanese text in localised builds may be
|
||||
represented in shift_jis / utf-8 with non-printable bytes that this
|
||||
scanner skips.
|
||||
|
||||
### Sylpheed yield
|
||||
- 6,311 ASCII strings (including full embedded HLSL shader source).
|
||||
- 0 UTF-16LE strings (binary uses ASCII / native CJK encoding).
|
||||
- 9,132 lis+addi sites cross-reference into the detected strings — names
|
||||
the source PCs that reference each string.
|
||||
|
||||
## Forward work (M6, M8–M12, not yet landed)
|
||||
|
||||
- **M5** — indirect-dispatch reachability via vtable+CTR dataflow.
|
||||
- **M6** — extended `xrefs.kind='write'` for indexed/byte-reverse/multiword/VMX/DCBZ/atomic stores with `addr_mode` column.
|
||||
- **M7** — `.rdata` ASCII / UTF-16 string pool detection cross-referenced with PCs.
|
||||
- **M8** — dispatch-table heuristics beyond vtables (e.g. function-pointer arrays in `.data`).
|
||||
- **M9** — `__CxxFrameHandler` exception scope-table parsing.
|
||||
- **M10** — `.tls` section / TLS slot tracking.
|
||||
|
||||
Reference in New Issue
Block a user