feat(kernel): KRNBUG-AUDIT-004 — --ctor-probe PC hook + --dump-addr struct dump

Diagnostic-only, read-only. Lockstep `instructions=100000002`
preserved bit-exact at -n 100M --stable-digest. 586 → 588 tests.

Adds two read-only diagnostics for the parked-waiter producer hunt:

  * `--ctor-probe=0x8217C850,0x...` — at every interpreter step,
    if `ctx.pc` is in the configured set, print one `CTOR-PROBE`
    line capturing live r3 (= `this` in MSVC PPC ctors), lr
    (= return site), sp, plus an 8-frame back-chain with
    saved-r31/r30 per frame. Fires once per hit, exactly what the
    8-instance-pool probe needed.

  * `--dump-addr=0x828F3D08,0x828F4070,0x828F3EC0,...` — at end of
    run (after the FOCUS report in `dump_thread_diagnostic`), each
    address gets a 128-byte hex + be32 + ASCII dump. Used to
    inspect the static dispatcher / job-queue struct layouts
    AUDIT-003 identified.

Both gated default-off; empty set is a single `is_empty()` test on
the hot path. No guest state is mutated, so the
`sylpheed_n*m.json` lockstep digest is preserved.

KRNBUG-AUDIT-004 findings (corrects KRNBUG-AUDIT-002/003):

1. **The "8-instance pool" hypothesis for handle 0x1004 is FALSE.**
   Probing the inner per-instance ctors `[0x821783D8, 0x82181750,
   0x821701C8]` at -n 50M shows each fires EXACTLY ONCE with
   r3 = `[0x828F3EC0, 0x828F3D08, 0x828F4070]` respectively. All
   three handles are Meyers-style singletons with one dispatcher
   each. The "called 8 times" claim came from miscounting raw
   entries to the OUTER getter sub_8217C850 — but that getter is
   itself a Meyers-singleton-getter; only the FIRST entry cascades
   through to bl 0x821783D8 (gated on `[0x828F48D8] bit 0`).

2. **The producer indirection layer is the singleton-getter
   itself.** Static byte-scan of .rdata / .data shows 0 hits for
   the dispatcher addresses — no static registry table holds them.
   But the xrefs table for the OUTER getters reveals 5–6 callers
   each, MOSTLY non-create-chain, sharing the canonical producer
   pattern: `bl outer_singleton_getter; lwz r3, OFFSET(r3); bl
   0x824AA1D8` (with OFFSET=80 for 0x100c, =36 for 0x15e0). So the
   AUDIT-003 xref audit was necessary but not sufficient — it
   correctly saw "no direct producer references" but missed the
   singleton-getter indirection layer.

3. **Dispatcher struct layouts** (128-byte dumps captured at -n
   50M --halt-on-deadlock):
     - 0x828F3D08 (handle 0x100c): event_handle at +0x4C (0x100c),
       thread_handle at +0x48 (0x1010), self-pointer at +0x74,
       capacity 7 at +0x28, queue empty (+0/+3C = -1).
     - 0x828F4070 (handle 0x15e0): event_handle at +0x20 (0x15e0),
       sibling-handle 0x15E4 at +0x1C, queue empty (+0x10 = -1).
     - 0x828F3EC0 (handle 0x1004): event_handle at +0x78 (0x1004),
       4 guest-heap sub-buffers at +0x20/+0x3C/+0x44/+0x50 in
       0x4xxxxxxx range — noticeably different layout from the
       other two pure POD job queues.

Files:
  crates/xenia-kernel/src/state.rs   ctor_probe_pcs / dump_addrs +
                                     fire_ctor_probe_if_match + 2 tests
  crates/xenia-app/src/main.rs       Exec --ctor-probe / --dump-addr
                                     CLI parsing, prologue hook,
                                     end-of-run struct dumper
  audit-findings.md                  KRNBUG-AUDIT-004 entry
  audit-runs/audit-004/              50M probe runs (v1 outer-getter
                                     hits, v2 inner-ctor hits proving
                                     the singleton hypothesis)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-04 17:09:47 +02:00
parent 48eed258f0
commit 7108d6d131
5 changed files with 2601 additions and 0 deletions

View File

@@ -176,6 +176,26 @@ enum Commands {
/// `XENIA_XAUDIO_TICK=1`.
#[arg(long)]
xaudio_tick: bool,
/// Diagnostic. Comma-separated list of guest PCs at which the
/// worker prologue prints a one-shot `CTOR-PROBE` line capturing
/// live r3/lr/sp + an 8-frame back-chain. Hit-once-per-hit — for
/// the bridge ctor `sub_8217C850` called 8× from the
/// static-init driver, the probe fires 8×. Use to recover
/// pool-element `this` addresses that MSVC ctors fail to
/// preserve in r31. Read-only; lockstep digest unaffected.
/// Examples: `--ctor-probe=0x8217C850`,
/// `--ctor-probe=0x82181750,0x821701C8`.
#[arg(long)]
ctor_probe: Option<String>,
/// Diagnostic. Comma-separated list of guest addresses to dump
/// (64 bytes each, hex + u32 lanes) at end-of-run, after the
/// per-handle FOCUS report. Used to inspect the static
/// dispatcher / job-queue / pool struct layouts identified by
/// AUDIT-003 (e.g. `--dump-addr=0x828F3D08,0x828F4070`). The
/// dump captures a snapshot at the moment `dump_thread_diagnostic`
/// runs — typically when `--halt-on-deadlock` triggers.
#[arg(long)]
dump_addr: Option<String>,
},
/// Browse XISO disc image contents
Browse {
@@ -331,6 +351,8 @@ fn main() -> Result<()> {
reservations_table,
parallel,
xaudio_tick,
ctor_probe,
dump_addr,
} => cmd_exec(
&path,
max_instructions,
@@ -349,6 +371,8 @@ fn main() -> Result<()> {
reservations_table,
parallel,
xaudio_tick,
ctor_probe.as_deref(),
dump_addr.as_deref(),
),
Commands::Browse { path } => cmd_browse(&path),
Commands::Info { path } => cmd_info(&path),
@@ -550,6 +574,8 @@ fn cmd_exec(
reservations_table: bool,
parallel: bool,
xaudio_tick: bool,
ctor_probe: Option<&str>,
dump_addr: Option<&str>,
) -> Result<()> {
cmd_exec_inner(
path,
@@ -569,6 +595,8 @@ fn cmd_exec(
reservations_table,
parallel,
xaudio_tick,
ctor_probe,
dump_addr,
None,
None,
false,
@@ -607,6 +635,8 @@ fn cmd_check(
reservations_table,
parallel,
xaudio_tick,
None, // ctor_probe — diagnostic, never wanted on goldens
None, // dump_addr — same
out,
expect,
stable_digest,
@@ -631,6 +661,8 @@ fn cmd_exec_inner(
reservations_table: bool,
parallel: bool,
xaudio_tick: bool,
ctor_probe: Option<&str>,
dump_addr: Option<&str>,
digest_out: Option<&str>,
digest_expect: Option<&str>,
stable_digest: bool,
@@ -879,6 +911,89 @@ fn cmd_exec_inner(
}
}
// Diagnostic. Parse `--ctor-probe=0x8217C850,0x...` (or
// `XENIA_CTOR_PROBE=...`) into `kernel.ctor_probe_pcs`. The
// worker prologue checks this set on every step; on a hit it
// prints a single back-chain capture line. Empty set = no
// probes = no-op fast path.
let ctor_probe_combined: Option<String> = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) {
(Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s),
_ => None,
};
if let Some(list) = ctor_probe_combined {
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
u32::from_str_radix(hex, 16)
} else {
token.parse::<u32>()
};
match parsed {
Ok(pc) => {
kernel.ctor_probe_pcs.insert(pc);
}
Err(_) => {
return Err(anyhow::anyhow!(
"invalid PC in --ctor-probe: {token:?}"
));
}
}
}
if !quiet && !kernel.ctor_probe_pcs.is_empty() {
let pcs: Vec<String> = kernel
.ctor_probe_pcs
.iter()
.map(|p| format!("{p:#010x}"))
.collect();
tracing::info!(
"ctor probes armed: {} ({})",
kernel.ctor_probe_pcs.len(),
pcs.join(", ")
);
}
}
// Diagnostic. Parse `--dump-addr=0x828F3D08,...` (or
// `XENIA_DUMP_ADDR=...`) into `kernel.dump_addrs`. The contents
// are dumped at end-of-run by `dump_thread_diagnostic`. Pure
// read; never mutates guest state.
let dump_addr_combined: Option<String> = match (dump_addr, std::env::var("XENIA_DUMP_ADDR").ok()) {
(Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s),
_ => None,
};
if let Some(list) = dump_addr_combined {
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
u32::from_str_radix(hex, 16)
} else {
token.parse::<u32>()
};
match parsed {
Ok(addr) => {
kernel.dump_addrs.push(addr);
}
Err(_) => {
return Err(anyhow::anyhow!(
"invalid address in --dump-addr: {token:?}"
));
}
}
}
if !quiet && !kernel.dump_addrs.is_empty() {
let addrs: Vec<String> = kernel
.dump_addrs
.iter()
.map(|a| format!("{a:#010x}"))
.collect();
tracing::info!(
"dump addresses armed: {} ({})",
kernel.dump_addrs.len(),
addrs.join(", ")
);
}
}
// Install the GPU register aperture MMIO region on the guest memory so
// any `0x7FC8xxxx` access routes to our atomic mailbox. Matches canary's
// `graphics_system.cc:141-144`. The callbacks capture Arc clones of the
@@ -1812,6 +1927,13 @@ fn worker_prologue(
let pc = kernel.scheduler.ctx(hw_id).pc;
// 0) Diagnostic ctor-probe: if `pc` is in
// `kernel.ctor_probe_pcs`, capture live r3/lr/sp + back-chain
// and println one record. Read-only; lockstep digest unaffected.
// Empty set is the common case → single `is_empty()` test inside
// the helper, no overhead on the hot path.
kernel.fire_ctor_probe_if_match(hw_id, mem);
// 1) Halt-sentinel check (per HW thread).
if pc == LR_HALT {
let injected_here = kernel.interrupts.saved.is_some()
@@ -3405,6 +3527,56 @@ fn dump_thread_diagnostic(
}
}
}
// Diagnostic. `--dump-addr` content dump at end-of-run. Each
// address gets a 64-byte dump in three forms (raw bytes, u32
// big-endian lanes, ASCII guess). The lanes form is gold for
// pool-element / dispatcher struct reverse-engineering: at
// [base+0]=-1 means "empty queue" sentinel from AUDIT-003, while
// any image-range pointer `0x82xxxxxx` at a fixed offset hints
// at an embedded silph::Event* / vtable / next-link.
if !kernel.dump_addrs.is_empty() {
println!("\n=== Dump-addr ===");
const DUMP_BYTES: usize = 128;
for &addr in &kernel.dump_addrs {
println!(" addr={:#010x}", addr);
let mut bytes = [0u8; DUMP_BYTES];
for i in 0..DUMP_BYTES {
bytes[i] = mem.read_u8(addr.wrapping_add(i as u32));
}
for row in 0..(DUMP_BYTES / 16) {
let off = row * 16;
let hex: String = (0..16)
.map(|i| format!("{:02x}", bytes[off + i]))
.collect::<Vec<_>>()
.join(" ");
let words: String = (0..4)
.map(|i| {
let b = &bytes[off + i * 4..off + i * 4 + 4];
format!(
"{:08x}",
u32::from_be_bytes([b[0], b[1], b[2], b[3]])
)
})
.collect::<Vec<_>>()
.join(" ");
let ascii: String = (0..16)
.map(|i| {
let b = bytes[off + i];
if (0x20..=0x7E).contains(&b) {
b as char
} else {
'.'
}
})
.collect();
println!(
" +{:#04x}: {} | be32={} | {}",
off, hex, words, ascii,
);
}
}
}
}
#[allow(clippy::too_many_arguments)]