feat(kernel): KRNBUG-AUDIT-002 — multi-frame guest stack capture at handle creation

Adds `walk_guest_back_chain` (PPC EABI back-chain walker) and a `record_create_with_stack` audit hook gated on `--trace-handles-focus`. NtCreateEvent / NtCreateSemaphore / NtCreateTimer / XamTaskSchedule now route through the new helper so focused handles capture up to 6 stack frames at allocation time. Diagnostic-only, read-only memory access: unfocused handles pay one HashSet lookup, focused ones pay six back-chain dereferences. Lockstep determinism preserved. End-to-end finding: handles 0x1004 (8-instance pool via static ctor at 0x8280F810), 0x100c (singleton built inside main()), 0x15e0 (singleton in distinct cluster) are silph-framework dispatcher objects whose producer code is unreached at -n 500M. The producer hunt now has class ownership; vtable/RTTI readout is the next step. Tests: 576 → 581 green. `--stable-digest -n 100M` instructions=100000002 unchanged. Master HEAD prior: 9d45efe. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 20:41:06 +02:00
parent 9d45efe5d5
commit 2a9fd1fc86
6 changed files with 403 additions and 7 deletions
--- a/audit-findings.md
+++ b/audit-findings.md
@@ -4117,3 +4117,194 @@ Per the producer-hunt charter the remaining strong candidates are
 routing**. Timer DPC is the next-strongest because the parked
 handles are explicit `Event/Manual`s with no current waker, and
 Xbox 360 timer-driven DPCs are a common signal source.
+
+### KRNBUG-AUDIT-002 — multi-frame stack capture at handle creation
+
+**Status:** landed (diagnostic only; no behaviour change). Walker
+verified end-to-end against the analysed call graph for every
+captured frame.
+
+**Site:** `crates/xenia-kernel/src/audit.rs` (new
+`record_create_with_stack`, new `created_stack: Vec<(u32,u32)>` on
+`HandleAuditTrail`); `crates/xenia-kernel/src/state.rs` (new
+`audit_create_with_ctx` helper + free function
+`walk_guest_back_chain(sp, lr, mem, max)`); `nt_create_event` /
+`nt_create_semaphore` / `nt_create_timer` / `xam_task_schedule` now
+route through the new helper. Dump in `crates/xenia-app/src/main.rs`
+prints `created stack (N frames)` under the per-handle FOCUS report.
+
+**Why it exists:** KRNBUG-AUDIT-001 told us the producer is missing
+for handles `0x1004` / `0x100c` / `0x15e4` (later corrected to
+`0x15e0` — see below) but couldn't tell us *which subsystem owns
+each handle*. The wrapper at `lr=0x824a9f6c` is the same
+`silph::Event` ctor for 83 unique callers, so the immediate LR is
+useless for subsystem identification. The new walker captures up to
+6 stack frames at create time, gated on the focus set so the cost
+is one `HashSet::contains` on the unfocused hot path.
+
+**Walker correctness:** PPC EABI back-chain (`[r1] = prev_sp`,
+saved-LR-of-prev-frame at `[prev_sp - 8]`). Frame 0 is the live
+`(ctx.gpr[1], ctx.lr)` since the wrapper hasn't spilled its own LR
+yet. Sentinels: 0, 0xFFFFFFFF, self-loop. Read-only via
+`MemoryAccess::read_u32` — guest memory and CPU state are not
+mutated, so lockstep determinism is unaffected.
+
+**Verification:**
+- Workspace tests: 576 → 581 (+5: 2 new in `audit.rs` exercising the
+  `record_create_with_stack` path including the disabled-no-op case;
+  3 new in `state.rs` exercising synthetic 3-level back-chain,
+  self-loop sentinel, zero sentinel).
+- `--stable-digest -n 50M` lockstep oracle (`sylpheed_n50m`):
+  bit-identical to checked-in golden (re-confirmed twice).
+- End-to-end: every captured frame's saved-LR matches a `bl`
+  instruction one address earlier in the named function (cross-checked
+  against `sylpheed.db`'s `instructions` table for all 18 captured
+  PCs across handles `0x1004` / `0x100c` / `0x15e0`).
+
+### Producer-trace finding (KRNBUG-AUDIT-002 deliverable)
+
+Run: `exec sylpheed.iso --halt-on-deadlock --trace-handles-focus=0x1004,0x100c,0x15e0,0x42450b5c -n 500_000_000`.
+
+**0x1004 (tid=10 waiter):** static C++ ctor → 8-instance pool
+
+```
+[0] sub_824A9F18 +0x54   silph::Event ctor wrapper (83 callers)
+[1] sub_821783D8 +0x120  per-instance subsystem-init (RtlInitializeCSAndSpinCount + Event ctor)
+[2] sub_8217C850 +0x58   single per-pool-element bridge ctor
+[3] (no func) +0x14      static ctor at 0x8280F810; calls sub_8217C850 EIGHT times
+[4] sub_824ACB38 +0xb8   the CRT static-init driver (walks 0x82870010..0x828708d4)
+[5] entry_point +0x60    the standard CRT entry stub
+```
+
+The 8-instance call from frame 3 is the smoking gun: `0x8280F810`
+is a single C++ static constructor that builds an 8-element array
+of objects, each of which gets its own Critical Section + Event +
+worker thread. This is a **thread pool**, constructed before
+`main()` runs.
+
+**0x100c (tid=2 waiter):** runtime init in `main()` → singleton
+
+```
+[0] sub_824A9F18 +0x54   silph::Event ctor wrapper
+[1] sub_82181750 +0x70   per-instance subsystem-init (same shape: CS + Event)
+[2] sub_821800D8 +0x3c   single-call bridge ctor
+[3] sub_82181C20 +0x38   subsystem driver
+[4] sub_8216EA68 +0x3c   (top-level main; called from entry_point + 0x194 with r3=r4=r5=0)
+[5] entry_point +0x198   right after `bl 0x8216EA68`
+```
+
+Different code cluster (`0x82181xxx`), single instance, constructed
+**inside `main()` itself** — not from C++ static init. This is a
+runtime-allocated singleton subsystem.
+
+**0x15e0 (tid=16 waiter):** runtime init via a third distinct cluster
+
+```
+[0] sub_824A9F18 +0x54   silph::Event ctor wrapper
+[1] sub_821701C8 +0x48   per-instance subsystem-init (CS + Event, callees mirror 0x100c's path)
+[2] sub_8216F618 +0x44   bridge
+[3] sub_821707C0 +0x38   driver
+[4] (no func) +0x?       0x821C5418 — analyser missed this function entry
+[5] sub_82172BA0 +0x1ec  upper-level subsystem driver
+```
+
+Third distinct C++ class in cluster `0x82170xxx`. Same per-instance
+shape (CS + Event + worker thread); different call site than 0x100c.
+
+**Cross-check on the project memory list:** the prior memory listed
+the third handle as `0x15e4`; the actual handle on this run is
+`0x15e0` (off-by-4 in the prior session's transcription). The
+parked-waiter set as of HEAD `9d45efe` is:
+
+| Handle | Tid | Waits via | Trail status | Note |
+|--------|-----|-----------|--------------|------|
+| 0x1004 | 10 | `do_wait_single` | primary | static-ctor pool (8 entries) |
+| 0x100c | 2  | `do_wait_single` | primary | runtime singleton |
+| 0x15e0 | 16 | `do_wait_single` | primary | runtime singleton, distinct class |
+| 0x12f4 | 13,14 | `do_wait_single` | primary | shared Semaphore — 2 waiters |
+| 0x15f8 | 18 | `do_wait_multiple` | primary | Event/Auto |
+| 0x1038 | 4  | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x1038, 0x103c] |
+| 0x10b0 | 5  | `do_wait_multiple` | primary | Event/Auto, in WaitAny[0x10b0, 0x10b4] |
+| 0x42450b5c | 6 | (non-`do_wait_single`) | `<UNCREATED>` `<AUDIT_BLIND>` | guest-memory addr (heap range), bypasses Nt-side audit entirely |
+
+**0x42450b5c is qualitatively different.** Address `>= 0x40000000`
+is the guest user heap range, not a kernel handle table value
+(which start at 0x1000 and increment by 4). tid=6 is parking on a
+guest pointer — almost certainly an embedded `KEVENT` reached via
+`KeWaitForSingleObject(*PDISPATCHER_HEADER)` rather than via a
+handle. Our audit didn't see the wait either (`waits=0` while
+`waiter_count=1`), so the wait path itself bypasses
+`do_wait_single`. Treat as a separate bug class.
+
+### Subsystem identification
+
+All three Event/Manual creators (sub_821783D8, sub_82181750,
+sub_821701C8) follow the **identical 4-callee pattern**:
+
+1. `RtlInitializeCriticalSectionAndSpinCount` (init the per-instance CS)
+2. `sub_824A9F18` (the silph::Event ctor wrapper → `NtCreateEvent`)
+3. + 1-2 silph internal helpers (`sub_82172370`, `sub_824AA3E0`,
+   `sub_8217E948`, `sub_82274C70`, etc.) which initialize a queue
+   and spawn a worker thread
+
+Each parked worker does the same prologue: `silph::Thread::SetProcessor(CURRENT, 5)`
+(via `sub_824AA658(r3=-2, r4=5)`), then either an `lwarx`/`stwcx`
+CAS-spinlock + `RtlEnterCriticalSection` + check for queued work.
+
+This is the canonical **work-queue worker pattern**: a producer
+posts a message to an instance's queue (under the CS) and signals
+the Event; the worker wakes, drains, parks again. The producer
+that should call `Nt/KeSetEvent(handle)` is **never executed**
+within 500M instructions for any of the 3 handles.
+
+The PE's RTTI string table lists thread-related class names in the
+`SilpheedSCS` namespace: `WorkHudThread2`, `WorkHudThreadTaskCaller`,
+`COLLISION_THREAD_PARAM`, `UPDATER_THREAD_PARAM`, `CRenderCommandQueue`,
+`CTaskUpdater`. The 8-element static-init pool for 0x1004 most
+plausibly maps to one of the multi-instance worker classes in this
+list (likely `WorkHudThread2` family — the only `Thread`-suffixed
+multi-instance candidate); the singletons 0x100c and 0x15e0 most
+plausibly map to two of `CTaskUpdater` / `CRenderCommandQueue` /
+similar singletons. Without a live debugger pass to read the
+vtable+RTTI block at the `this` pointer of each worker, the exact
+class assignment is heuristic.
+
+### Recommended next session — surgical producer hunt
+
+The producer for each Event is the **call site that owns the
+matching message-push code path**. Steps:
+
+1. **For each Event handle (0x1004, 0x100c, 0x15e0)**, dump the
+   first 12 bytes of the `this` pointer to read the vtable address
+   (the `this` is in `r3` at the worker's ABI entry — captured in
+   the wait diagnostic as the first arg). Then read vtable[-1] to
+   resolve the RTTI Type Descriptor, which gives the class name.
+   This pinpoints exactly which `SilpheedSCS::*` class each
+   subsystem is.
+2. **Then xref the class name** in the binary to find the
+   producer-side method (`Push*`, `Submit*`, `EnqueueMessage*`,
+   `Notify*`). That method's signal call (likely
+   `silph::Event::Set` → `NtSetEvent` thunk) is what should fire.
+3. **Trace that producer's caller chain** to find the upstream
+   gate. Two failure modes are equally likely:
+   - **(A)** The producer DOES run but signals via `KeSetEvent` on
+     an embedded `KEVENT` field (not the handle-table side), and
+     our HLE `KeSetEvent` doesn't route to the handle-table waiter
+     list. Same family as 0x42450b5c. Smoking gun: `kernel.calls`
+     metric for `KeSetEvent` is non-zero but the audit shows zero
+     signals.
+   - **(B)** The producer is gated by an upstream condition that
+     doesn't trigger — e.g. UI-system message that never arrives,
+     timer-DPC that never fires, vsync interrupt with the wrong
+     APC routing. Smoking gun: `kernel.calls{KeSetEvent}` is zero
+     for that handle.
+4. **0x42450b5c** is a separate bug. Add a parallel
+   `audit_create_with_ctx` hook to whichever wait path tid=6 takes
+   (it's NOT `do_wait_single`); the function span at PC=0x824cd4f4
+   isn't even in the analyser's `functions` table. Likely the
+   `KeWaitForSingleObject(*PDISPATCHER_HEADER)` wrapper. Once the
+   wait path is audited, repeat the producer-trace.
+
+The walker is reusable: any handle added to `--trace-handles-focus`
+will get a 6-frame stack at creation time. Add new candidates
+freely — cost on the unfocused hot path is one `HashSet::contains`.