feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump

Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.

Adds two read-only diagnostics for the parked-waiter producer hunt:

  * `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
    if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
    line capturing live r3 (= `this` in MSVC PPC ctors), lr
    (= return site), sp, plus an 8-frame back-chain with
    saved-r31/r30 per frame. Fires once per hit, exactly what the
    8-instance-pool probe needed.

  * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
    run (after the FOCUS report in `dump_thread_diagnostic`), each
    address gets a 128-byte hex + be32 + ASCII dump. Used to
    inspect the static dispatcher / job-queue struct layouts
    AUDIT-003 identified.

Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.

KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):

1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
   Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
   0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
   r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
   three handles are Meyers-style singletons with one dispatcher
   each. The "called 8 times" claim came from miscounting raw
   entries to the OUTER getter sub_8217C850 — but that getter is
   itself a Meyers-singleton-getter; only the FIRST entry cascades
   through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).

2. **The producer indirection layer is the singleton-getter
   itself.** Static byte-scan of .rdata / .data shows 0 hits for
   the dispatcher addresses — no static registry table holds them.
   But the xrefs table for the OUTER getters reveals 5–6 callers
   each, MOSTLY non-create-chain, sharing the canonical producer
   pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
   0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
   AUDIT-003 xref audit was necessary but not sufficient — it
   correctly saw "no direct producer references" but missed the
   singleton-getter indirection layer.

3. **Dispatcher struct layouts** (128-byte dumps captured at -n
   50M --halt-on-deadlock):
     - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
       thread_handle at +0x48 (0x1010), self-pointer at +0x74,
       capacity 7 at +0x28, queue empty (+0/+3C = -1).
     - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
       sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
     - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
       4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
       0x4xxxxxxx range — noticeably different layout from the
       other two pure POD job queues.

Files:
  crates/xenia-kernel/src/state.rs   ctor_probe_pcs / dump_addrs +
                                     fire_ctor_probe_if_match + 2 tests
  crates/xenia-app/src/main.rs       Exec --ctor-probe / --dump-addr
                                     CLI parsing, prologue hook,
                                     end-of-run struct dumper
  audit-findings.md                  KRNBUG-AUDIT-004 entry
  audit-runs/audit-004/              50M probe runs (v1 outer-getter
                                     hits, v2 inner-ctor hits proving
                                     the singleton hypothesis)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-04 17:09:47 +02:00
parent 48eed258f0
commit 7108d6d131
5 changed files with 2601 additions and 0 deletions

View File

@@ -4457,3 +4457,168 @@ function arg (no constant-load), but the simple
3. Treat 0x42450b5c independently. AUDIT-002's hook missed it because
the parking site (PC=0x824cd4f4) isn't routed through `do_wait_single`.
Open KRNBUG-AUDIT-004 for that wait path.
---
### KRNBUG-AUDIT-004 — `--ctor-probe` PC hook + `--dump-addr` struct dump; producer-indirection layer identified; "8-instance pool" hypothesis falsified
**Status**: landed on master (no-ff merge of feature branch
`dispatcher-probe-audit/p0-ctor-probe-and-struct-dump`). Diagnostic-
only, read-only, lockstep-preserved (`instructions=100000002` at
`-n 100M --stable-digest`).
**Tests**: 586 → **588**.
**What landed (`crates/xenia-kernel/src/state.rs`):**
- `pub ctor_probe_pcs: HashSet<u32>` field on `KernelState` (default
empty).
- `pub fire_ctor_probe_if_match(hw_id, mem)` — fast-rejects when set
is empty; on match prints a one-shot record `CTOR-PROBE pc=...
tid=... hw=... cycle=... sp=... r3=... lr=...` plus an 8-frame
back-chain with saved-r31/r30 per frame. Pure read.
- `pub dump_addrs: Vec<u32>` field for end-of-run struct dumps.
- 2 unit tests: empty-set no-op, set-membership invariant.
**What landed (`crates/xenia-app/src/main.rs`):**
- `--ctor-probe=0x8217C850,0x82181750,...` CLI flag (and
`XENIA_CTOR_PROBE`). Parsed into `kernel.ctor_probe_pcs` at
`cmd_exec_inner` startup.
- `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` CLI flag (and
`XENIA_DUMP_ADDR`). Each address gets a 128-byte hex+be32+ASCII
dump at end-of-run, after the per-handle FOCUS report.
- `worker_prologue` calls `fire_ctor_probe_if_match` after reading
`pc` and before any thunk-dispatch / step-block branch.
`dump_thread_diagnostic` consumes `kernel.dump_addrs`.
**Decisive findings (corrects KRNBUG-AUDIT-002/003):**
1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
Probe ran at `-n 50M --halt-on-deadlock` with PCs
`[0x821783D8, 0x82181750, 0x821701C8]` (the per-instance ctors
for handles 0x1004 / 0x100c / 0x15e0 respectively). Each fired
**EXACTLY ONCE**:
```
CTOR-PROBE pc=0x821783d8 tid=1 hw=0 cycle=1401430 r3=0x828f3ec0 ← handle 0x1004
CTOR-PROBE pc=0x82181750 tid=1 hw=0 cycle=5363599 r3=0x828f3d08 ← handle 0x100c
CTOR-PROBE pc=0x821701c8 tid=1 hw=0 cycle=9203618 r3=0x828f4070 ← handle 0x15e0
```
Handle 0x1004 has a SINGLE dispatcher at **0x828F3EC0**, not 8
pool members. The earlier "called 8 times" claim came from
counting raw entries to the OUTER getter `sub_8217C850`, but
`sub_8217C850` is a Meyers-style singleton-getter — its inner
`bl 0x821783D8` (the per-instance ctor) is gated on a one-shot
init flag at `[0x828F48D8] bit 0`. Subsequent `sub_8217C850`
calls just return the existing slot pointer.
2. **The producer indirection layer IS the singleton-getter
itself.** Static byte-scans of `.rdata` and `.data` show 0 hits
for the dispatcher addresses 0x828F3D08 / 0x828F4070 — so no
registry table holds them. But the `xrefs` table for the OUTER
getters reveals:
```
sub_821800D8 (outer for 0x828F3D08, handle 0x100c): 6 callers
0x821802d8 (sub_82180158+0x180) ← non-create-chain
0x821806e0 (sub_821805C8+0x118) ← non-create-chain
0x82180b28 (sub_82180A10+0x118) ← non-create-chain
0x82180ea0 (sub_82180D90+0x110) ← non-create-chain
0x82181254 (sub_821810E0+0x174) ← non-create-chain
0x82181c54 (sub_82181C28+0x2C) ← create-chain ONLY
sub_8216F618 (outer for 0x828F4070, handle 0x15e0): 5 callers
0x8216f9d4 (sub_8216F818+0x1BC) ← non-create-chain
0x8216fc08 (sub_8216F9F0+0x218) ← non-create-chain
0x821700b8 (sub_8216FF70+0x148) ← non-create-chain
0x821700f4 (sub_821700E0+0x14) ← non-create-chain
0x821707f4 (sub_821707C0+0x34) ← create-chain ONLY
```
The non-create-chain consumers all share the **canonical
producer pattern**:
```
bl outer_singleton_getter ; r3 = dispatcher ptr
lwz r3, OFFSET(r3) ; r3 = an event handle / queue field
bl 0x824AA1D8 ; signal/wake function
```
For 0x100c the offset is 80 (= 0x50); for 0x15e0 the offset is
36 (= 0x24).
So **interpretation (2) of the audit charter is confirmed**:
producers reference the dispatchers via a function-call layer of
indirection, not through direct address materialization. The
xref-table audit in AUDIT-003 (which only catches direct
constant-loads of the dispatcher base) was **necessary but not
sufficient** — it correctly saw "no direct producer references"
but missed the singleton-getter indirection.
3. **Dispatcher struct layouts** (128-byte dumps at `-n 50M
--halt-on-deadlock`):
```
0x828F3D08 (handle 0x100c, per-instance ctor sub_82181750):
+0x00 = 0xFFFFFFFF ; queue head/tail sentinel
+0x28 = 0x00000007 ; capacity = 7
+0x2C = 0x01000000 ; init flag
+0x3C = 0xFFFFFFFF ; secondary sentinel
+0x48 = 0x00001010 ; thread_handle (worker thread)
+0x4C = 0x0000100C ; event_handle (= self handle 0x100c)
+0x50 = 0x00000000 ; producer reads this — currently 0
+0x70 = 0x00000001 ; refcount?
+0x74 = 0x828F3D08 ; self-pointer
0x828F4070 (handle 0x15e0, per-instance ctor sub_821701C8):
+0x00 = 0x01000000 ; init flag
+0x10 = 0xFFFFFFFF ; queue sentinel
+0x1C = 0x000015E4 ; sibling-handle (NOT in our parked
; set — possibly a thread handle)
+0x20 = 0x000015E0 ; event_handle (= self handle 0x15e0)
+0x24 = 0x00000000 ; producer reads this — currently 0
+0x40 = 0xFFFFFFFF ; secondary sentinel
0x828F3EC0 (handle 0x1004, per-instance ctor sub_821783D8):
+0x00 = 0x01000000 ; init flag
+0x10 = 0xFFFFFFFF ; queue sentinel
+0x20 = 0x40541BC0 ; heap pointer (sub-buffer #1)
+0x30 = 0x00000014 ; size 20
+0x34 = 0x0000002F ; size 47
+0x38 = 0x414F5F60 ; heap-range payload (or two halfwords)
+0x3C = 0x40211CA0 ; heap pointer (sub-buffer #2)
+0x44 = 0x405418C0 ; heap pointer (sub-buffer #3)
+0x50 = 0x40111840 ; heap pointer (sub-buffer #4)
+0x58 = 0xFFFFFFFF ; sentinel
+0x5C = 0xFFFFFFFF ; sentinel
+0x76 = 0x000012AC ; possibly thread id
+0x78 = 0x00001004 ; event_handle (= self handle 0x1004)
```
The 0x1004 dispatcher is **noticeably different**: it owns 4
guest-heap sub-buffers in 0x4xxxxxxx range, suggesting it
manages a more complex resource than the other two (which are
pure POD job queues). The +0x78 location of the event_handle
differs from 0x100c's +0x4C and 0x15e0's +0x20, so each
subsystem has its own struct layout (no shared base class).
**Reproduce:**
```bash
cargo run --release -p xenia-app -- exec 'sylpheed.iso' \
--halt-on-deadlock \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
--ctor-probe=0x821783D8,0x82181750,0x821701C8 \
--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0 \
-n 50000000
```
Trace files saved at:
- `audit-runs/audit-004/run-50m-probe.txt` (outer-getter probes)
- `audit-runs/audit-004/run-50m-probe-v2.txt` (inner-ctor probes — singleton hypothesis confirmed)
**Recommendation for next session (do not implement a fix):**
Hook the entry of each non-create-chain consumer site for handle
0x100c (5 sites: 0x821802d8, 0x821806e0, 0x82180b28, 0x82180ea0,
0x82181254) and for handle 0x15e0 (4 sites: 0x8216f9d4, 0x8216fc08,
0x821700b8, 0x821700f4) using `--ctor-probe=...`. If any fire, then
the producer DOES execute and the failure mode is in the wake/signal
chain (probably `lwz r3, OFFSET(r3)` reads zero — see dispatcher dump
[+0x50] = 0 for 0x100c, [+0x24] = 0 for 0x15e0 — and the wake
function 0x824AA1D8 is then called with handle=0). If none fire,
the producer chain is gated upstream (likely a feature flag, init
phase, or RPC handler that never fires). Either way, the next
diagnostic narrows the bug surface dramatically.