handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
157
audit-runs/review-a-step1b-crowbar-v2/investigation.md
Normal file
157
audit-runs/review-a-step1b-crowbar-v2/investigation.md
Normal file
@@ -0,0 +1,157 @@
|
||||
# Crowbar v2 — Step 0 (A) vs (B) verdict + new finding
|
||||
|
||||
**Date**: 2026-05-21
|
||||
**Predecessor**: v1 at `audit-runs/review-a-step1-crowbar/`.
|
||||
**Status**: LANDED diagnostic; ESCALATED before Step 2 install — neither
|
||||
(A) nor (B) was the issue.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- **(A) is FALSIFIED.** Ours's XEX loader populates the vtable region
|
||||
`0x8200A1E8..+512` correctly. 254/256 nonzero bytes in the first 256;
|
||||
128/128 nonzero u32 slots in the first 512 bytes. **Worker stub slots
|
||||
35/36/37/38 each hold real PPC fn pointers** in the `0x8250xxxx`
|
||||
range:
|
||||
- `vtable[35] @ 0x8200A274 = 0x82506B08`
|
||||
- `vtable[36] @ 0x8200A278 = 0x82506DE8`
|
||||
- `vtable[37] @ 0x8200A27C = 0x82508530`
|
||||
- `vtable[38] @ 0x8200A280 = 0x82508A88`
|
||||
- **(B) is FALSIFIED.** There is no "runtime vtable install" step to
|
||||
mirror — the vtable contents come from `.rdata` and are present
|
||||
before the crowbar fires. The AUDIT-068 S3/S4 POD-copy writes
|
||||
`0x8200A1E8` (vtable BASE) at `[ctx+0]` — a POINTER write — not the
|
||||
vtable contents themselves.
|
||||
- **NEW CASE (C) discovered**: the ctx-object layout is wider than the
|
||||
4 u32s AUDIT-068 S3 captured. `[ctx+44]` is a pointer to a SECOND
|
||||
object whose vtable+60 (slot 15) is dispatched by `sub_82506DE8` (=
|
||||
vtable[36] of ctx, called by worker tid=15's entry stub at
|
||||
`0x82506558`). Since we left `[ctx+44]` zero, the worker reads
|
||||
`[0]=0`, dereferences as vtable, computes CTR=`[vtable+60]=0`, and
|
||||
`bctrl` faults at PC=0.
|
||||
|
||||
## v1 framing vs v2 ground truth
|
||||
|
||||
v1's `crowbar-on-stderr.log` showed `FAULT: PC in unmapped memory
|
||||
cycle=20000167 pc=0x00000000 hw_id=0`. v1's hypothesis was
|
||||
"vtable[35] at `0x8200A274` is uninitialized/null, branch goes to
|
||||
PC=0." v2 Step 0 diagnostic dumps the vtable region and shows that
|
||||
hypothesis is **wrong** — every slot is populated.
|
||||
|
||||
The enriched FAULT log added by v2 captured the smoking gun:
|
||||
|
||||
```
|
||||
FAULT: PC in unmapped memory cycle=20000166 pc=0x00000000 hw_id=0
|
||||
tid=Some(15) lr=0x82506e38 ctr=0x00000000 r3=0x00000000 r4=0
|
||||
r29=0 r30=<ctx_ptr> r31=<...>
|
||||
```
|
||||
|
||||
`lr=0x82506e38` is one instruction past `bctrl` at `0x82506e34`. The
|
||||
sequence in `sub_82506DE8` (which IS vtable[36], reached by worker
|
||||
tid=15's stub at `0x82506558` → `lwz r11, 0(r3); lwz r11, 144(r11);
|
||||
mtctr r11; bctrl`):
|
||||
|
||||
```
|
||||
0x82506de8: mflr r12
|
||||
0x82506dec: bl 0x825F0F8C
|
||||
0x82506df0: stwu r1, -144(r1)
|
||||
0x82506df4: mr r30, r3 ; r30 = ctx_ptr
|
||||
0x82506df8: lwz r11, 0(r30) ; r11 = 0x8200A1E8 (vtable)
|
||||
0x82506dfc: lwz r11, 260(r11) ; r11 = vtable[65] (a fn)
|
||||
0x82506e00: mtctr r11
|
||||
0x82506e04: bctrl ; OK — returns
|
||||
0x82506e08: rlwinm r11, r3, 0, 29, 29 ; bit 2 of r3
|
||||
0x82506e10: bne cr6, 0x825070D4 ; if bit set: branch away
|
||||
0x82506e18: lwz r3, 44(r30) ; r3 = [ctx+44] <-- ZERO
|
||||
0x82506e28: lwz r11, 0(r3) ; r11 = [0] <-- ZERO
|
||||
0x82506e2c: lwz r11, 60(r11) ; r11 = [60] <-- ZERO
|
||||
0x82506e30: mtctr r11 ; CTR = 0
|
||||
0x82506e34: bctrl ; LR := 0x82506e38, PC := 0
|
||||
0x82506e38: <fault: PC unmapped>
|
||||
```
|
||||
|
||||
So vtable[36] called vtable[65] (a real fn that returns OK), then
|
||||
dispatched into `[ctx+44]` treated as another object. Our crowbar
|
||||
left `[ctx+44]=0`, so the dispatch faulted.
|
||||
|
||||
## Why (B) framing missed this
|
||||
|
||||
The brief framed (B) as "vtable contents are constructed at runtime".
|
||||
That's not true — vtable contents are static `.rdata`. What
|
||||
AUDIT-068's S4 captured is the **ctor chain** that constructs the
|
||||
**ctx instance** (the heap object):
|
||||
|
||||
- `sub_824FECE0` (deepest): writes `[ctx+4]=ctx, [ctx+8]=ctx,
|
||||
[ctx+12]=1`. Also calls `0x8284DD1C` with `r3=ctx+16` (likely a
|
||||
linked-list/container init).
|
||||
- `sub_825065E8` (middle): chains to deepest, then writes
|
||||
`[ctx+0]=0x8200A908` (intermediate vtable), then `bl 0x825051D8`.
|
||||
- `sub_824FD240` (most-derived): chains to middle, then writes
|
||||
`[ctx+0]=0x8200A1E8` (final vtable). Returns.
|
||||
|
||||
None of these three ctors writes `[ctx+44]`. So `[ctx+44]` must be
|
||||
written by either:
|
||||
1. **Allocator initial-state** (zero-fill? guest-side memset?), OR
|
||||
2. **A factory function ABOVE the ctor chain** (the caller of
|
||||
`sub_824FD240` that allocates ctx, calls ctor, then assigns fields
|
||||
including `+44`).
|
||||
|
||||
AUDIT-064 named the caller chain `sub_824F8398 → sub_824F7CD0 →
|
||||
sub_824F7800 → [bl at +0x38 = sub_824FD240]`. So `sub_824F7800` is
|
||||
likely the factory that does the `+44` field assignment AFTER the
|
||||
ctor returns. Without disassembling `sub_824F7800` and tracing each
|
||||
field-store, we can't synthesize the missing fields.
|
||||
|
||||
## Why escalating is the right call now
|
||||
|
||||
Per the brief's tripstone #6 — 2-hour timebox. We've already
|
||||
discovered the framing was wrong and the gap is wider than v2 was
|
||||
scoped to fix. The honest moves are:
|
||||
|
||||
1. **Stop and document** the new finding (this doc + memory entry).
|
||||
2. **Recommend the next session's investigation**: disassemble
|
||||
`sub_824F7800` (and `sub_824F7CD0`, `sub_824F8398`) field-by-field
|
||||
to enumerate every store-to-r31 / store-to-ctx_ptr after the ctor
|
||||
chain returns. Mirror those stores in a crowbar v3.
|
||||
3. Alternative — much wider: build a canary read-probe sweep over
|
||||
`[ctx+0..ctx+128]` to capture the live state. ~200 LOC canary
|
||||
instrumentation; trades complexity for ground-truth.
|
||||
|
||||
## Run-determined ctx addresses for reference
|
||||
|
||||
- v1's crowbar (in ours): `ctx_ptr = 0x4D1D9000` (heap_alloc bump
|
||||
cursor at trigger time).
|
||||
- Canary's natural ctx (per AUDIT-068 S4): `0xBCE25340` and
|
||||
`0xBCE251C0` were captured in different cold runs (arena drift).
|
||||
The probe at `0xBCE251C0..+8` confirmed `[ctx+0]=0x8200A1E8`,
|
||||
`[ctx+4]=ctx`, `[ctx+8]=ctx` (the doubly-linked list head).
|
||||
|
||||
## LOC delta this session
|
||||
|
||||
- `crates/xenia-kernel/src/exports.rs`: +95 LOC (two helpers
|
||||
`crowbar_dump_vtable_region` and
|
||||
`crowbar_maybe_install_vtable_from_file`; plus call sites in
|
||||
`crowbar_force_spawn_workers`).
|
||||
- `crates/xenia-app/src/main.rs`: +9 LOC (enriched FAULT log with
|
||||
tid/lr/ctr/r3/r4/r29/r30/r31).
|
||||
- Total: ~104 LOC additive over v1. Within budget.
|
||||
|
||||
## What was NOT done
|
||||
|
||||
- vtable-bin install: implemented but unused (env-gated, defaults
|
||||
to no-op). Kept in tree for v3 if a future session captures
|
||||
canary's vtable bytes for cross-validation, BUT now we know that's
|
||||
unnecessary because ours's vtable is correct.
|
||||
- 3×OFF + 3×ON cold-run sweep: v2 produces the same crash signature
|
||||
as v1 because the gap is the ctx-field, not the vtable. A 6-run
|
||||
sweep would show identical progression metrics (`swaps=1, draws=0,
|
||||
render_targets=0` ON; same numbers OFF) — confirmed by spot-check
|
||||
of one ON run. Skipping the full sweep to honour the timebox.
|
||||
- canary cache wipe/restore: not needed since no canary changes were
|
||||
made this session.
|
||||
|
||||
## Files
|
||||
|
||||
- `step0-diag-stderr.log`: first run, vtable dump only (256 bytes).
|
||||
- `step0b-diag.log`: second run, 512-byte vtable dump.
|
||||
- `step0c-diag.log`: third run, with enriched FAULT log (captured
|
||||
tid=15, lr=0x82506e38, ctr=0, r3=0, r30=ctx_ptr).
|
||||
Reference in New Issue
Block a user