xenia-rs/audit-runs/review-a-step1b-crowbar-v2/investigation.md

# Crowbar v2 — Step 0 (A) vs (B) verdict + new finding

**Date**: 2026-05-21
**Predecessor**: v1 at `audit-runs/review-a-step1-crowbar/`.
**Status**: LANDED diagnostic; ESCALATED before Step 2 install — neither
(A) nor (B) was the issue.

## TL;DR

- **(A) is FALSIFIED.** Ours's XEX loader populates the vtable region
  `0x8200A1E8..+512` correctly. 254/256 nonzero bytes in the first 256;
  128/128 nonzero u32 slots in the first 512 bytes. **Worker stub slots
  35/36/37/38 each hold real PPC fn pointers** in the `0x8250xxxx`
  range:
  - `vtable[35] @ 0x8200A274 = 0x82506B08`
  - `vtable[36] @ 0x8200A278 = 0x82506DE8`
  - `vtable[37] @ 0x8200A27C = 0x82508530`
  - `vtable[38] @ 0x8200A280 = 0x82508A88`
- **(B) is FALSIFIED.** There is no "runtime vtable install" step to
  mirror — the vtable contents come from `.rdata` and are present
  before the crowbar fires. The AUDIT-068 S3/S4 POD-copy writes
  `0x8200A1E8` (vtable BASE) at `[ctx+0]` — a POINTER write — not the
  vtable contents themselves.
- **NEW CASE (C) discovered**: the ctx-object layout is wider than the
  4 u32s AUDIT-068 S3 captured. `[ctx+44]` is a pointer to a SECOND
  object whose vtable+60 (slot 15) is dispatched by `sub_82506DE8` (=
  vtable[36] of ctx, called by worker tid=15's entry stub at
  `0x82506558`). Since we left `[ctx+44]` zero, the worker reads
  `[0]=0`, dereferences as vtable, computes CTR=`[vtable+60]=0`, and
  `bctrl` faults at PC=0.

## v1 framing vs v2 ground truth

v1's `crowbar-on-stderr.log` showed `FAULT: PC in unmapped memory
cycle=20000167 pc=0x00000000 hw_id=0`. v1's hypothesis was
"vtable[35] at `0x8200A274` is uninitialized/null, branch goes to
PC=0." v2 Step 0 diagnostic dumps the vtable region and shows that
hypothesis is **wrong** — every slot is populated.

The enriched FAULT log added by v2 captured the smoking gun:

```
FAULT: PC in unmapped memory cycle=20000166 pc=0x00000000 hw_id=0
  tid=Some(15) lr=0x82506e38 ctr=0x00000000 r3=0x00000000 r4=0
  r29=0 r30=<ctx_ptr> r31=<...>
```

`lr=0x82506e38` is one instruction past `bctrl` at `0x82506e34`. The
sequence in `sub_82506DE8` (which IS vtable[36], reached by worker
tid=15's stub at `0x82506558` → `lwz r11, 0(r3); lwz r11, 144(r11);
mtctr r11; bctrl`):

```
0x82506de8: mflr r12
0x82506dec: bl   0x825F0F8C
0x82506df0: stwu r1, -144(r1)
0x82506df4: mr   r30, r3              ; r30 = ctx_ptr
0x82506df8: lwz  r11, 0(r30)          ; r11 = 0x8200A1E8 (vtable)
0x82506dfc: lwz  r11, 260(r11)        ; r11 = vtable[65] (a fn)
0x82506e00: mtctr r11
0x82506e04: bctrl                     ; OK — returns
0x82506e08: rlwinm r11, r3, 0, 29, 29 ; bit 2 of r3
0x82506e10: bne cr6, 0x825070D4       ; if bit set: branch away
0x82506e18: lwz r3, 44(r30)           ; r3 = [ctx+44]    <-- ZERO
0x82506e28: lwz r11, 0(r3)            ; r11 = [0]       <-- ZERO
0x82506e2c: lwz r11, 60(r11)          ; r11 = [60]      <-- ZERO
0x82506e30: mtctr r11                 ; CTR = 0
0x82506e34: bctrl                     ; LR := 0x82506e38, PC := 0
0x82506e38: <fault: PC unmapped>
```

So vtable[36] called vtable[65] (a real fn that returns OK), then
dispatched into `[ctx+44]` treated as another object. Our crowbar
left `[ctx+44]=0`, so the dispatch faulted.

## Why (B) framing missed this

The brief framed (B) as "vtable contents are constructed at runtime".
That's not true — vtable contents are static `.rdata`. What
AUDIT-068's S4 captured is the **ctor chain** that constructs the
**ctx instance** (the heap object):

- `sub_824FECE0` (deepest): writes `[ctx+4]=ctx, [ctx+8]=ctx,
  [ctx+12]=1`. Also calls `0x8284DD1C` with `r3=ctx+16` (likely a
  linked-list/container init).
- `sub_825065E8` (middle): chains to deepest, then writes
  `[ctx+0]=0x8200A908` (intermediate vtable), then `bl 0x825051D8`.
- `sub_824FD240` (most-derived): chains to middle, then writes
  `[ctx+0]=0x8200A1E8` (final vtable). Returns.

None of these three ctors writes `[ctx+44]`. So `[ctx+44]` must be
written by either:
1. **Allocator initial-state** (zero-fill? guest-side memset?), OR
2. **A factory function ABOVE the ctor chain** (the caller of
   `sub_824FD240` that allocates ctx, calls ctor, then assigns fields
   including `+44`).

AUDIT-064 named the caller chain `sub_824F8398 → sub_824F7CD0 →
sub_824F7800 → [bl at +0x38 = sub_824FD240]`. So `sub_824F7800` is
likely the factory that does the `+44` field assignment AFTER the
ctor returns. Without disassembling `sub_824F7800` and tracing each
field-store, we can't synthesize the missing fields.

## Why escalating is the right call now

Per the brief's tripstone #6 — 2-hour timebox. We've already
discovered the framing was wrong and the gap is wider than v2 was
scoped to fix. The honest moves are:

1. **Stop and document** the new finding (this doc + memory entry).
2. **Recommend the next session's investigation**: disassemble
   `sub_824F7800` (and `sub_824F7CD0`, `sub_824F8398`) field-by-field
   to enumerate every store-to-r31 / store-to-ctx_ptr after the ctor
   chain returns. Mirror those stores in a crowbar v3.
3. Alternative — much wider: build a canary read-probe sweep over
   `[ctx+0..ctx+128]` to capture the live state. ~200 LOC canary
   instrumentation; trades complexity for ground-truth.

## Run-determined ctx addresses for reference

- v1's crowbar (in ours): `ctx_ptr = 0x4D1D9000` (heap_alloc bump
  cursor at trigger time).
- Canary's natural ctx (per AUDIT-068 S4): `0xBCE25340` and
  `0xBCE251C0` were captured in different cold runs (arena drift).
  The probe at `0xBCE251C0..+8` confirmed `[ctx+0]=0x8200A1E8`,
  `[ctx+4]=ctx`, `[ctx+8]=ctx` (the doubly-linked list head).

## LOC delta this session

- `crates/xenia-kernel/src/exports.rs`: +95 LOC (two helpers
  `crowbar_dump_vtable_region` and
  `crowbar_maybe_install_vtable_from_file`; plus call sites in
  `crowbar_force_spawn_workers`).
- `crates/xenia-app/src/main.rs`: +9 LOC (enriched FAULT log with
  tid/lr/ctr/r3/r4/r29/r30/r31).
- Total: ~104 LOC additive over v1. Within budget.

## What was NOT done

- vtable-bin install: implemented but unused (env-gated, defaults
  to no-op). Kept in tree for v3 if a future session captures
  canary's vtable bytes for cross-validation, BUT now we know that's
  unnecessary because ours's vtable is correct.
- 3×OFF + 3×ON cold-run sweep: v2 produces the same crash signature
  as v1 because the gap is the ctx-field, not the vtable. A 6-run
  sweep would show identical progression metrics (`swaps=1, draws=0,
  render_targets=0` ON; same numbers OFF) — confirmed by spot-check
  of one ON run. Skipping the full sweep to honour the timebox.
- canary cache wipe/restore: not needed since no canary changes were
  made this session.

## Files

- `step0-diag-stderr.log`: first run, vtable dump only (256 bytes).
- `step0b-diag.log`: second run, 512-byte vtable dump.
- `step0c-diag.log`: third run, with enriched FAULT log (captured
  tid=15, lr=0x82506e38, ctr=0, r3=0, r30=ctx_ptr).