feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump

Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.

Adds two read-only diagnostics for the parked-waiter producer hunt:

  * `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
    if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
    line capturing live r3 (= `this` in MSVC PPC ctors), lr
    (= return site), sp, plus an 8-frame back-chain with
    saved-r31/r30 per frame. Fires once per hit, exactly what the
    8-instance-pool probe needed.

  * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
    run (after the FOCUS report in `dump_thread_diagnostic`), each
    address gets a 128-byte hex + be32 + ASCII dump. Used to
    inspect the static dispatcher / job-queue struct layouts
    AUDIT-003 identified.

Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.

KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):

1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
   Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
   0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
   r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
   three handles are Meyers-style singletons with one dispatcher
   each. The "called 8 times" claim came from miscounting raw
   entries to the OUTER getter sub_8217C850 — but that getter is
   itself a Meyers-singleton-getter; only the FIRST entry cascades
   through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).

2. **The producer indirection layer is the singleton-getter
   itself.** Static byte-scan of .rdata / .data shows 0 hits for
   the dispatcher addresses — no static registry table holds them.
   But the xrefs table for the OUTER getters reveals 5–6 callers
   each, MOSTLY non-create-chain, sharing the canonical producer
   pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
   0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
   AUDIT-003 xref audit was necessary but not sufficient — it
   correctly saw "no direct producer references" but missed the
   singleton-getter indirection layer.

3. **Dispatcher struct layouts** (128-byte dumps captured at -n
   50M --halt-on-deadlock):
     - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
       thread_handle at +0x48 (0x1010), self-pointer at +0x74,
       capacity 7 at +0x28, queue empty (+0/+3C = -1).
     - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
       sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
     - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
       4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
       0x4xxxxxxx range — noticeably different layout from the
       other two pure POD job queues.

Files:
  crates/xenia-kernel/src/state.rs   ctor_probe_pcs / dump_addrs +
                                     fire_ctor_probe_if_match + 2 tests
  crates/xenia-app/src/main.rs       Exec --ctor-probe / --dump-addr
                                     CLI parsing, prologue hook,
                                     end-of-run struct dumper
  audit-findings.md                  KRNBUG-AUDIT-004 entry
  audit-runs/audit-004/              50M probe runs (v1 outer-getter
                                     hits, v2 inner-ctor hits proving
                                     the singleton hypothesis)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-04 17:09:47 +02:00
parent 48eed258f0
commit 7108d6d131
5 changed files with 2601 additions and 0 deletions

View File

@@ -179,6 +179,30 @@ pub struct KernelState {
/// backend (which lives on the worker thread under `--gpu-thread`).
pub ring_base: u32,
pub ring_size_dwords: u32,
/// Diagnostic. PCs at which the worker prologue fires a one-shot
/// stack/back-chain dump capturing live `r3` (= `this` in MSVC
/// PPC ctors), `lr` (= return site), and the cycle/tid that hit
/// the PC. Populated from `--ctor-probe=0x8217C850,0x...` /
/// `XENIA_CTOR_PROBE`. Empty by default → check is a single
/// `is_empty()` test, no extra cost on the unprobed hot path.
/// Read-only diagnostic — no guest state is mutated, so the
/// `sylpheed_n*m.json` lockstep digest is preserved.
///
/// **Why a per-PC probe instead of per-handle?** The MSVC ctors
/// at `sub_8217C850` (and friends) don't preserve `this` in r31
/// across the inner `bl` to `silph::Event::Construct`, so the
/// AUDIT-002 multi-frame back-chain at `NtCreateEvent` only
/// recovers stack-relative pointers — never the pool-element
/// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this
/// before any save/restore can clobber it.
pub ctor_probe_pcs: std::collections::HashSet<u32>,
/// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32
/// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08,
/// 0x828F4070`. Used to inspect static dispatcher / job-queue /
/// pool struct layouts identified by AUDIT-003. Read-only — the
/// dump is performed by `dump_thread_diagnostic`, never during
/// the hot interpreter loop, so lockstep determinism is unaffected.
pub dump_addrs: Vec<u32>,
}
impl KernelState {
@@ -230,6 +254,8 @@ impl KernelState {
ring_base: 0,
ring_size_dwords: 0,
parallel_active: false,
ctor_probe_pcs: std::collections::HashSet::new(),
dump_addrs: Vec::new(),
};
crate::exports::register_exports(&mut state);
crate::xam::register_exports(&mut state);
@@ -522,6 +548,48 @@ impl KernelState {
self.audit.record_wake(handle, entry);
}
/// Diagnostic. If the live PC for HW slot `hw_id` is in
/// `self.ctor_probe_pcs`, emit a single `CTOR-PROBE` line with
/// the current cycle, tid, hw_id, sp, r3, lr, plus an 8-frame
/// back-chain walk. Read-only — no guest state is mutated, so a
/// run with the probe set is byte-identical to one without (the
/// probe only adds println noise).
///
/// Intended call site: top of `worker_prologue`, after `pc` has
/// been read but before any thunk-dispatch / step-block branch.
/// Fires once per hit — if the same PC is reached again (e.g.
/// the bridge ctor sub_8217C850 called 8 times by the static-
/// init driver), it fires 8 times, which is exactly what we want
/// for pool-element identification.
pub fn fire_ctor_probe_if_match(&self, hw_id: u8, mem: &GuestMemory) {
if self.ctor_probe_pcs.is_empty() {
return;
}
let ctx = self.scheduler.ctx(hw_id);
let pc = ctx.pc;
if !self.ctor_probe_pcs.contains(&pc) {
return;
}
let tid = self.scheduler.tid(hw_id).unwrap_or(0);
let r3 = ctx.gpr[3] as u32;
let lr = ctx.lr as u32;
let sp = ctx.gpr[1] as u32;
let cycle = ctx.cycle_count;
let frames = walk_guest_back_chain(sp, lr, mem, 8);
println!(
"CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}",
pc, tid, hw_id, cycle, sp, r3, lr,
);
for (i, (fp, frame_lr)) in frames.iter().enumerate() {
let saved_r31 = mem.read_u32(fp.wrapping_sub(12));
let saved_r30 = mem.read_u32(fp.wrapping_sub(16));
println!(
" CTOR-PROBE frame={} fp={:#010x} lr={:#010x} saved-r31={:#010x} saved-r30={:#010x}",
i, fp, frame_lr, saved_r31, saved_r30,
);
}
}
/// Read a TLS slot for the currently running HW thread.
pub fn tls_get(&self, index: u32) -> u64 {
self.scheduler.tls_get(index)
@@ -1231,6 +1299,38 @@ mod tests {
/// A NUL-terminated ASCII string is read up to `max`; non-printable
/// bytes mark the candidate as bogus (return empty string). The
/// `.?A` prefix gating in `read_class_at_this` then rejects them.
/// `fire_ctor_probe_if_match` only emits when `pc` matches a
/// configured PC. We assert it's a no-op on miss and a no-panic
/// on hit (the println goes to stdout; we just check the helper
/// reads the back-chain without faulting).
#[test]
fn fire_ctor_probe_if_match_no_op_on_empty_set() {
let mem = GuestMemory::new().expect("memory init");
let state = KernelState::new();
// No probes set → must be a no-op even when the scheduler
// ctx has whatever PC.
state.fire_ctor_probe_if_match(0, &mem);
assert!(state.ctor_probe_pcs.is_empty());
}
#[test]
fn fire_ctor_probe_if_match_only_fires_on_listed_pc() {
// We can't easily redirect stdout under cargo-test, so this
// test mostly verifies the membership check + that no panic
// occurs when frame walking encounters zero/sentinel pages.
// The empty-stack walk returns just `[(sp, lr)]`, exercising
// the loop body once safely.
let mem = GuestMemory::new().expect("memory init");
let mut state = KernelState::new();
state.ctor_probe_pcs.insert(0x8217_C850);
// The default PpcContext on slot 0 has pc=0 (idle sentinel),
// so the probe set membership test misses → no fire.
state.fire_ctor_probe_if_match(0, &mem);
// Sanity: an unrelated PC isn't claimed.
assert!(!state.ctor_probe_pcs.contains(&0x8200_0000));
assert!(state.ctor_probe_pcs.contains(&0x8217_C850));
}
#[test]
fn read_ascii_cstring_handles_termination_and_garbage() {
use xenia_memory::page_table::MemoryProtect;