M6: addr_mode column on xrefs + extended store/load classes
Adds finer-grained addressing-mode classification to every data xref row plus new dispatch for instruction families not previously emitted: - New `xrefs.addr_mode VARCHAR NULL` column. NULL for control-flow edges (call / ind_call / j / br); one of d_form / lis_addi / lis_ori / multiword / x_form_indexed / x_form_byterev / atomic / dcbz for data edges. Index idx_xrefs_addr_mode. - New `xenia_analysis::xref::AddrMode` enum + Xref::addr_mode field. - Opcode 46/47 (lmw/stmw) expand to one xref per slot — D-form multi-word load/store now resolves all (32-rS) consecutive addresses. - Opcode 31 X-form dispatch — stwx/stbx/sthx/stwux/stbux/sthux/stdx/stdux, lwzx/lbzx/lhzx/lhax/lwzux/lbzux/lhzux/lhaux/ldx/ldux, stwcx./stdcx. (atomic), stwbrx/sthbrx/lwbrx/lhbrx (byte-reverse), dcbz (cache-line clear). - X-form rows are emitted ONLY when both rA and rB resolve to known constants (rare but present); the dominant runtime-indexed pattern remains correctly skipped. Sylpheed yield (regen on master + merge): - 442 newly-detected x_form_indexed reads (lwzx/lhzx into static tables). - 40 newly-detected atomic writes (stwcx./stdcx. with resolvable address). - 28,834 lis_addi refs, 18,485 d_form reads, 3,288 d_form writes — every pre-existing data row now tagged. - 0 multiword / dcbz / byterev (these instructions exist but aren't on lis+addi-tracked code paths). Tests 633→636 (+3 xref unit tests covering AddrMode tag uniqueness, data-edge addr_mode round-trip, control-edge None invariant). Schema golden updated (xrefs gains addr_mode column). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -227,9 +227,55 @@ See `crates/xenia-analysis/src/lookup.rs`.
|
||||
- 9,132 lis+addi sites cross-reference into the detected strings — names
|
||||
the source PCs that reference each string.
|
||||
|
||||
## Forward work (M6, M8–M12, not yet landed)
|
||||
## Layer M6 — Extended store-class xrefs + `addr_mode` column (landed)
|
||||
|
||||
### Schema additions
|
||||
- `xrefs.addr_mode VARCHAR NULL` — sub-classifies how the source instruction
|
||||
computes its target. NULL for control-flow edges (call / ind_call / j /
|
||||
br); one of the following tags for data edges:
|
||||
- `d_form` — standard signed-16 displacement (lwz/stw/lfs/stfs/etc.)
|
||||
- `lis_addi` — address materialised via `lis + addi` register tracking
|
||||
- `lis_ori` — address materialised via `lis + ori`
|
||||
- `multiword` — `lmw / stmw` (one xref per slot; up to 32-rS slots)
|
||||
- `x_form_indexed` — `stwx / stbx / sthx / stwux / stbux / sthux / stdx /
|
||||
stdux / lwzx / lbzx / lhzx / lhax / lwzux / lbzux / lhzux / lhaux / ldx /
|
||||
ldux` — emitted only when both rA and rB are tracked constants
|
||||
- `x_form_byterev` — `stwbrx / sthbrx / lwbrx / lhbrx`
|
||||
- `atomic` — `stwcx. / stdcx.` reservation-conditional stores
|
||||
- `dcbz` — cache-line clear (32-byte zero at rA+rB)
|
||||
- Index `idx_xrefs_addr_mode`.
|
||||
|
||||
### What this layer does
|
||||
- Tags every existing data xref with its addressing mode (`d_form` for the
|
||||
bulk; `lis_addi` / `lis_ori` for the lift-and-add cases that produce
|
||||
DataRef rows).
|
||||
- Adds new dispatch for opcode 47 (`stmw`) and 46 (`lmw`), expanding to
|
||||
per-slot DataWrite / DataRead rows.
|
||||
- Adds new dispatch for opcode 31 X-form: stores, atomic, byte-reverse,
|
||||
dcbz. X-form rows are emitted ONLY when both rA and rB resolve to known
|
||||
constants (otherwise the address is runtime-dependent and we skip).
|
||||
|
||||
### What this layer does NOT do
|
||||
- VMX / VMX128 vector stores (opcode 31 with vector XO codes) are not
|
||||
emitted — they always have register-indexed addresses that the
|
||||
lis+addi tracker can't usually resolve, and detecting them adds noise
|
||||
without improving target resolution.
|
||||
- The dominant runtime-of-stwx pattern (rA = base, rB = runtime index) is
|
||||
not resolved — by design; mem-watch covers the runtime side per VERIFY-B.
|
||||
|
||||
### Sylpheed yield
|
||||
- 28,834 `lis_addi` refs, 18,485 `d_form` reads, 3,288 `d_form` writes —
|
||||
the existing baseline now properly tagged.
|
||||
- **442 newly-detected `x_form_indexed` reads** — primarily lwzx/lhzx
|
||||
reads from in-table dispatch (each pair (rA,rB) resolved statically).
|
||||
- **40 newly-detected `atomic` writes** — every `stwcx.` site with a
|
||||
resolvable address; useful for reservation-table audits.
|
||||
- 9 `lis_ori` refs.
|
||||
- 0 multiword / dcbz / byterev — these instructions exist in the binary
|
||||
but are not in lis+addi-tracked code paths.
|
||||
|
||||
## Forward work (M8–M12, not yet landed)
|
||||
|
||||
- **M6** — extended `xrefs.kind='write'` for indexed/byte-reverse/multiword/VMX/DCBZ/atomic stores with `addr_mode` column.
|
||||
- **M8** — dispatch-table heuristics beyond vtables (e.g. function-pointer arrays in `.data`).
|
||||
- **M9** — `__CxxFrameHandler` exception scope-table parsing.
|
||||
- **M10** — `.tls` section / TLS slot tracking.
|
||||
|
||||
Reference in New Issue
Block a user