feat(kernel): KRNBUG-AUDIT-003 — vtable/RTTI class probe at handle creation + wait

Adds a read-only MSVC RTTI traversal helper (`read_class_at_this`)
and a `probe_create_stack_classes` integration that walks each
captured back-chain frame for handle creates in `--trace-handles-focus`
and probes each frame's most-likely `this` candidate (live r31/r30/r3
for frame 0; saved-r31/r30 from the prologue spill area at [fp-12]/
[fp-16] for deeper frames). False-positive guard rejects the CRT
static-init iterator pattern (vtable's first two slots must be image-
range function pointers — PPC instruction words like `mflr r12` are
not in 0x82xxxxxx).

`dump_thread_diagnostic` now takes `&GuestMemory` so the FOCUS report
prints, for each parked waiter, a WAIT-THREAD block with full back-
chain frames and per-slot saved-register dump for offline lookup.

End-to-end finding (-n 500M producer-trace):
  * Handle 0x100c dispatcher = 0x828F3D08 (image rdata; verified by
    sub_82181750 disasm + xref table). [this+0] = -1 sentinel — POD
    job queue, NOT a C++ polymorphic class.
  * Handle 0x15e0 dispatcher = 0x828F4070 (same shape).
  * Handle 0x1004's 8-instance pool members still TBD (MSVC ctors
    didn't preserve `this` in r31).
  * 0x42450b5c is a separate audit class (heap-allocated, parks via
    non-`do_wait_single` path).

Decisive xref audit: every reference to 0x828F3D08 / 0x828F4070 in
the static analysis is in a ctor or the CRT init driver. NO producer
code references either dispatcher base. Confirms `signal_attempts=0`
is unreachable-producer, not broken-producer.

Tests: 581 → 586 green (+5: RTTI-intact / RTTI-stripped / non-object
/ cstring / probe_create_stack integration). `--stable-digest -n
100M` instructions=100000002 unchanged. Master HEAD prior: 6440261.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-03 21:14:56 +02:00
parent 6440261e2e
commit f84e947547
4 changed files with 636 additions and 7 deletions

View File

@@ -4308,3 +4308,152 @@ matching message-push code path**. Steps:
The walker is reusable: any handle added to `--trace-handles-focus`
will get a 6-frame stack at creation time. Add new candidates
freely — cost on the unfocused hot path is one `HashSet::contains`.
### KRNBUG-AUDIT-003 — vtable/RTTI class probe + dispatcher identification
**Status:** landed (diagnostic only; no behaviour change). Verified
end-to-end against 5 unit tests + the producer-trace pass at -n 500M.
**Site:** `crates/xenia-kernel/src/state.rs` — new `ClassReadout`
enum + `read_class_at_this(this, mem)` + `probe_create_stack_classes(
ctx, frames, mem)` + private helpers (`is_likely_guest_heap_ptr`,
`is_likely_image_ptr`, `read_ascii_cstring`).
`crates/xenia-kernel/src/audit.rs` — extended `HandleAuditTrail` with
`created_class_probes: Vec<String>` + new
`record_create_with_stack_and_probes`.
`crates/xenia-app/src/main.rs` — `dump_thread_diagnostic` now takes
`&GuestMemory`; FOCUS report prints WAIT-THREAD blocks with per-frame
back-chain + saved register slots + class probes.
**Why it exists:** AUDIT-002 gave us back-chain frames at handle
creation. AUDIT-003's promise was "recover the dispatcher's MSVC C++
class name via vtable[-4] → COL → TypeDescriptor" so the producer
hunt could read "who should call `Class::Submit` but doesn't"
instead of "who should signal handle X."
**Probe correctness:** MSVC RTTI traversal (`vtable[-4]` = COL,
`COL+0x0c` = TypeDescriptor, `TypeDescriptor+8` = NUL-terminated
mangled name starting `.?A`). False-positive guard: at least the
first two vtable slots must be image-range function pointers. This
rejects the CRT static-init iterator pattern where `r31` holds a
pointer into the init-fn array and `[r31]` is a function PC, not a
vtable.
**Verification:**
- Workspace tests: 581 → **586** (+5: 4 new in `state.rs` exercising
RTTI-intact / RTTI-stripped / non-object / `read_ascii_cstring`
termination + 1 integration test for `probe_create_stack_classes`).
- `--stable-digest -n 100M` lockstep oracle:
`instructions=100000002` (unchanged).
- `sylpheed_n50m` golden: passes.
- End-to-end: 500M producer-trace run captured at
`audit-runs/audit-003/run-500m-v4.txt`. RC=0.
### KRNBUG-AUDIT-003 finding — dispatcher addresses + decisive xref audit
**Run:** `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=
0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
**Handle 0x100c — dispatcher at `0x828F3D08`:**
Confirmed three ways:
1. Per-frame saved-r31 capture at handle creation:
```
frame=1 lr=0x821817c0 saved-r31=0x828f3d08 ← per-instance ctor
frame=2 lr=0x82180114 saved-r31=0x828f3d08 ← bridge ctor (same value)
```
2. Disassembly of `sub_82181750` at +0x14:
`addis r11, r0, 0x828F; addi r31, r11, 15624` ⇒
`r31 = 0x828F3D08` (the `this` for the per-instance ctor).
3. Field-level write tracking via `xrefs.kind=write`:
`pc=0x82181778 in sub_82181750 — stw r11, 0(r31)` writes -1 to
`[this+0]`.
**`[this+0] = -1` is decisive: this is a hand-rolled POD job-queue
struct, not a C++ polymorphic class.** No vtable means no RTTI;
"class name" doesn't exist in MSVC mangled form. The probe correctly
rejected 0x828F3D08 as a class candidate.
Field layout (from sub_82181750 disasm):
```
[this+ 0] = -1 ; sentinel (not a vtable)
[this+ 4..12] = 0
[this+20] = 0 (halfword)
[this+36] = 0
[this+40] = 7 ; count or version
[this+44..(44+256)] ; sub-region init by `bl 0x8284DCEC`
[this+72] = thread_handle ; set after thread spawn
[this+76] = event_handle ; = 0x100c, set after silph::Event ctor
[this+88..104] = 0
```
Worker is `sub_82181830`: receives r3=this, copies r28=this and
r29=&this[44], does `silph::Thread::SetProcessor(CURRENT, 5)`,
then `lwarx`/`stwcx.` on `&this[80]`. Wait-side telemetry confirms:
the parked thread's spilled r28-r31 area has 0x828F3D08 (=r28 base)
and 0x828F3D34 (= base+44 = r29).
**Handle 0x15e0 — dispatcher at `0x828F4070`:**
Confirmed via xref table. Same shape as 0x100c (POD job queue, not
a C++ class). Constructed by `sub_821701C8` + `sub_8216F618`.
**Handle 0x1004 — 8-instance pool, member addresses still TBD.**
The MSVC ctors for the per-instance and bridge functions did not
preserve `this` in r31 across the call into `silph::Event::Ctor`,
so the saved-r31 chain captured at create time shows
stack-relative pointers (frames 1, 2, 5) and the CRT init-fn
iterator pointer 0x82870180 (frames 3, 4) instead of the pool
member's `this`. Recovering the 8 pool addresses requires hooking
`sub_8217C850`'s entry to capture r3 at each of its 8 calls from
the static ctor at `0x8280F810`.
**Handle 0x42450b5c — separate bug class.** Heap-allocated
(0x4xxxxxxx is user-heap range), parks via non-`do_wait_single`
path. AUDIT-003's image-rdata-focused probe doesn't apply. Track
under a separate audit ID.
**Decisive xref audit — producer is unreached:**
```
0x828F3D08 (handle 0x100c) — 4 references in static analysis:
pc=0x82180100 in sub_821800D8 (kind=ref) — bridge ctor
pc=0x8218176c in sub_82181750 (kind=ref) — per-instance ctor
pc=0x82181778 in sub_82181750 (kind=write) — per-instance ctor init
pc=0x8284caa4 in sub_8280C2C0 (kind=ref) — CRT init driver
0x828F4070 (handle 0x15e0) — 5 references:
pc=0x8216f650 in sub_8216F618 (kind=ref) — bridge ctor
pc=0x8216f674 in sub_8216F618 (kind=ref) — bridge ctor
pc=0x821701e4 in sub_821701C8 (kind=ref) — per-instance ctor
pc=0x82170330 in sub_821701C8 (kind=ref) — per-instance ctor
pc=0x8284c9a4 in sub_8280C2C0 (kind=ref) — CRT init driver
```
**Every xref is in a ctor or the CRT.** No producer code references
either dispatcher base. Confirms AUDIT-001/002's `signal_attempts=0`:
the producer is unreached, not broken. The static analysis would
miss producers that operate via a `this` register passed through a
function arg (no constant-load), but the simple
"`load_const dispatcher_addr; call submit(this, work)`" pattern
**is not present** in the binary for 0x828F3D08 / 0x828F4070.
**Recommendation for next session (no implementation here):**
1. Investigate the call-chain `main() → sub_82181C20 → sub_82181750`.
sub_82181C20 is a subsystem driver — it constructs the queue and
should ALSO wire it into a feeder. If the feeder is itself a
static-init that's never invoked, the trail leads back to the
CRT init array driver (`sub_824ACB38`, walks
0x82870010..0x828708D4) and whatever scheduling subsystem is
supposed to drive those.
2. Hook `sub_8217C850` entry under `--trace-handles-focus=0x1004` to
capture r3 at each of its 8 calls — those are the pool member
`this` addresses for handle 0x1004's 8-instance pool.
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
Open KRNBUG-AUDIT-004 for that wait path.