feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade

Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with
the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper
additionally logs `[disp+off]` — what the producer's
`lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both
flags share parser + storage.

Read-only diagnostic. Lockstep digest preserved (`run digest matches
golden` at -n 50M `--stable-digest`). 588 tests green.

Decisive findings (full deliverable in `audit-findings.md` /
`audit-runs/audit-005/`):

- Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call
  sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at
  -n 500M. The producer code path is not reached.

- Set-diff of kernel-call sequences (canary.log oracle vs ours.log
  at -n 500M) identifies 11 exports canary calls and we don't:
  XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign,
  ObCreateSymbolicLink, NtDeviceIoControlFile (×2),
  XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle,
  KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2).

- XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0).
  The 4 instructions immediately preceding it are:
      addi r3, r0, 10            ; privilege bit 10
      bl   XexCheckExecutablePrivilege
      cmpli 0, r3, 0
      bc 12, eq, 0x824AB724      ; if r3==0, skip whole block

- exports.rs:193 registers XexCheckExecutablePrivilege as
  stub_return_zero. Always returning 0 -> guest takes the branch
  and skips the entire AV/crypto/save-data init block.

- The other call site (sub_824A9710 at 0x824A99A0) queries privilege
  11 with opposite polarity (bne) -> gates XamTaskSchedule on the
  privilege-NOT-set arm. With both stubs returning 0, the guest
  walks the wrong arm of every privilege-gated branch.

- This explains why the dispatcher fields read zero
  ([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps):
  the ctors run, but the producers that would populate those fields
  with a non-zero handle never execute.

Next session: replace XexCheckExecutablePrivilege stub with real
priv-bit lookup from XEX header. See audit-findings.md
KRNBUG-AUDIT-005 for the validation matrix.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-04 18:06:22 +02:00
parent 6a070bedc6
commit 3e2fc1ec88
3 changed files with 245 additions and 18 deletions

View File

@@ -4622,3 +4622,194 @@ function 0x824AA1D8 is then called with handle=0). If none fire,
the producer chain is gated upstream (likely a feature flag, init the producer chain is gated upstream (likely a feature flag, init
phase, or RPC handler that never fires). Either way, the next phase, or RPC handler that never fires). Either way, the next
diagnostic narrows the bug surface dramatically. diagnostic narrows the bug surface dramatically.
---
### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow
**Status**: landed on master (no-ff merge of feature branch
`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic-
only, read-only, lockstep-preserved (`run digest matches golden` at
`-n 50M --stable-digest`).
**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the
shared infrastructure).
**What landed (`crates/xenia-kernel/src/state.rs`):**
- `pub pc_probe_consumers: HashMap<u32, (u32, u32)>` field on
`KernelState` (default empty). Maps a probe PC to a
`(dispatcher_addr, offset)` pair; on hit the helper additionally
logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is
about to read after `bl outer_getter` returns the dispatcher in r3.
- `fire_ctor_probe_if_match` extended to read+print the consumer
field when present. Pure load — does not mutate guest state.
**What landed (`crates/xenia-app/src/main.rs`):**
- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer
name; both share parser/storage).
- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing
`parse_hex_u32`. Plain `PC` form still works (backward compatible).
- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`.
**What landed (`audit-runs/audit-005/`):** one-shot diagnostic
artifacts — not part of the repo build:
- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen
- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines)
- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
- `probe-test-10m.log` — initial smoke test confirming probe wiring
**Reproduce:**
```bash
cargo run --release -p xenia-app -- \
--log-filter='probe_calls=trace,xenia=warn' \
exec sylpheed.iso \
--halt-on-deadlock \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
--pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
0x821700F4@0x828F4070:36 \
-n 500_000_000 \
2> audit-runs/audit-005/ours.log
python3 audit-runs/audit-005/diff.py --max 100 --window 30
```
**Decisive findings:**
1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9
non-create-chain producer call sites for handles 0x100c
(5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 /
0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 /
0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M
(`grep -c CTOR-PROBE ours.log == 0`). The producer code path is
not reached. Rules out failure mode (B: `lwz` reads zero) and (3:
wake function called with stale handle). The bug is upstream,
in the control-flow that should lead the guest to those producer
functions.
2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege`
stub returning 0.** Set-diff of kernel-call sequences across our
500M-instruction run vs canary's full Sylpheed boot
(`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached)
identifies **11 exports that canary calls and we don't**:
```
ExTerminateThread (×2)
KeReleaseSemaphore (×268) ← we use Nt* equivalents
KeResetEvent (×1)
NtDeviceIoControlFile (×2)
ObCreateSymbolicLink (×1)
XGetAVPack (×1) ← gated by priv-10 check
XamTaskCloseHandle (×1)
XamTaskSchedule (×1) ← AUDIT-002 producer candidate
XamUserReadProfileSettings (×2)
XeCryptSha (×1)
XeKeysConsolePrivateKeySign (×1)
```
`XGetAVPack` has exactly one caller (`xrefs` table): site
`0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately
preceding it are:
```
824ab58c addi r3, r0, 10 ; privilege bit 10
824ab590 addi r31, r0, 0
824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege
824ab598 cmpli 0, r3, 0x0
824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
; (XGetAVPack + crypto + Nt writes)
```
Our impl `crates/xenia-kernel/src/exports.rs:193`:
```rust
state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
stub_return_zero);
```
`stub_return_zero` returns r3=0 unconditionally → guest takes
the `bc 12, eq, 0x824AB724` branch and skips the entire
AV/crypto/save-data init block.
The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries
privilege bit **11**:
```
824a999c addi r3, r0, 11
824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11)
824a99a4 cmpli 0, r3, 0
824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set
```
Different polarity (this one gates `XamTaskSchedule` etc. on
the **privilege-NOT-set** path). With both stubs returning 0,
the guest walks the wrong arm of *every* privilege-gated branch.
3. **Cascade reaches the parked-waiter handles.** Trace evidence:
our `probe_calls` log shows `lr=0x824A97E4` (a hit from the
error path inside `sub_824A9710` *after* `sub_824ABA98` returned
negative NTSTATUS). The canary log shows all 11 missing exports
firing in a single contiguous boot phase between `XexCheckExecutablePrivilege`
and the worker-thread spawn — i.e. the init phase that sets up
the dispatcher data structures is exactly the phase we skip.
This explains **why the dispatcher fields read zero** (AUDIT-004
dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the
ctors run (we counted those), but the *producers* that would
populate those fields with a non-zero handle never execute,
because the upstream init flow that registers them is gated
by the privilege checks.
4. **Note on the diff: canary's log is filtered.** Canary's config
has `log_high_frequency_kernel_calls = false`, which suppresses
most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The
"called in OURS but not canary" set (23 entries, headed by
`NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter
difference — it is **not** a bug surface. The directionally
meaningful side of the diff is "called in CANARY but not OURS"
(above): canary's log includes every low-frequency call, so any
absence on our side is a real divergence.
**Stop conditions check:**
- Canary itself does NOT stall at swaps=2 — it reaches a steady
frame loop with `XamInputGetCapabilities` polling, texture loads,
`KeReleaseSemaphore` ticks. The diff was informative.
- First divergence is dense early-CRT noise (~3 entries in), but
the meaningful divergence anchored to a concrete export
(`XGetAVPack`, deterministically gated by a one-line stub) was
recoverable via set-diff. Did not need to narrow scope further.
**Recommendation for next session (do not implement a fix here — this
is the read-only audit deliverable):**
Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a
real implementation. The XEX header's privilege bitmask is parsed
during XEX load (see `crates/xenia-xex/`); `KernelState` already
holds the loaded `image_base`. Implementation outline:
- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time
into `KernelState` (or surface via `Vfs` already-loaded XEX
metadata).
- `xex_check_executable_privilege(priv_id) -> u32`:
return 1 if bit `priv_id` is set in the title's privilege bitmask,
else 0. Match canary's encoding (privilege IDs are 0..7F; canary
reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution
info).
Validation after the fix:
1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`,
`XamTaskSchedule`, `XeCryptSha`, etc. should appear in our
sequence and the divergence should advance several hundred
calls past the priv-check.
2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the
ctor-probe firings change, and ideally one or more of the 9
producer sites starts firing.
3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` /
`[0x828F4070+0x24]` become non-zero (use `--dump-addr`).
4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json`
will likely change (`imports` count goes up, `swaps` may advance);
regenerate the golden under `--stable-digest` and treat that as
the new lockstep anchor.
If after the fix the producer is reached and dispatcher fields
populate, the parked-waiter deadlock should resolve — or surface
the next layer of bugs (e.g. signaling code reads non-zero handle
but `wake_eligible_waiters` fails).

View File

@@ -184,8 +184,13 @@ enum Commands {
/// pool-element `this` addresses that MSVC ctors fail to /// pool-element `this` addresses that MSVC ctors fail to
/// preserve in r31. Read-only; lockstep digest unaffected. /// preserve in r31. Read-only; lockstep digest unaffected.
/// Examples: `--ctor-probe=0x8217C850`, /// Examples: `--ctor-probe=0x8217C850`,
/// `--ctor-probe=0x82181750,0x821701C8`. /// `--ctor-probe=0x82181750,0x821701C8`. KRNBUG-AUDIT-005:
#[arg(long)] /// each token may be extended to `PC@DISPATCHER:OFFSET` (e.g.
/// `0x821802D8@0x828F3D08:80`); on a hit the helper additionally
/// logs the value of `[DISPATCHER+OFFSET]` — what the producer's
/// `lwz r3, OFFSET(r3)` is about to read. The flag is also
/// reachable as `--pc-probe` (semantically clearer name).
#[arg(long, alias = "pc-probe")]
ctor_probe: Option<String>, ctor_probe: Option<String>,
/// Diagnostic. Comma-separated list of guest addresses to dump /// Diagnostic. Comma-separated list of guest addresses to dump
/// (64 bytes each, hex + u32 lanes) at end-of-run, after the /// (64 bytes each, hex + u32 lanes) at end-of-run, after the
@@ -916,27 +921,39 @@ fn cmd_exec_inner(
// worker prologue checks this set on every step; on a hit it // worker prologue checks this set on every step; on a hit it
// prints a single back-chain capture line. Empty set = no // prints a single back-chain capture line. Empty set = no
// probes = no-op fast path. // probes = no-op fast path.
let ctor_probe_combined: Option<String> = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) { let ctor_probe_combined: Option<String> = match (
ctor_probe,
std::env::var("XENIA_CTOR_PROBE").ok().or_else(|| std::env::var("XENIA_PC_PROBE").ok()),
) {
(Some(s), _) => Some(s.to_string()), (Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s), (None, Some(s)) if !s.is_empty() => Some(s),
_ => None, _ => None,
}; };
if let Some(list) = ctor_probe_combined { if let Some(list) = ctor_probe_combined {
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) { for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) { let (pc_str, consumer): (&str, Option<(u32, u32)>) = match token.split_once('@') {
u32::from_str_radix(hex, 16) None => (token, None),
} else { Some((pc_part, rest)) => {
token.parse::<u32>() let (disp_str, off_str) = rest.split_once(':').ok_or_else(|| {
anyhow::anyhow!(
"invalid --pc-probe token {token:?}: expected PC@DISPATCHER:OFFSET"
)
})?;
let disp = parse_hex_u32(disp_str.trim()).map_err(|e| {
anyhow::anyhow!("invalid dispatcher in {token:?}: {e}")
})?;
let off = parse_hex_u32(off_str.trim()).map_err(|e| {
anyhow::anyhow!("invalid offset in {token:?}: {e}")
})?;
(pc_part.trim(), Some((disp, off)))
}
}; };
match parsed { let pc = parse_hex_u32(pc_str).map_err(|e| {
Ok(pc) => { anyhow::anyhow!("invalid PC in --pc-probe: {token:?}: {e}")
kernel.ctor_probe_pcs.insert(pc); })?;
} kernel.ctor_probe_pcs.insert(pc);
Err(_) => { if let Some(c) = consumer {
return Err(anyhow::anyhow!( kernel.pc_probe_consumers.insert(pc, c);
"invalid PC in --ctor-probe: {token:?}"
));
}
} }
} }
if !quiet && !kernel.ctor_probe_pcs.is_empty() { if !quiet && !kernel.ctor_probe_pcs.is_empty() {
@@ -946,9 +963,10 @@ fn cmd_exec_inner(
.map(|p| format!("{p:#010x}")) .map(|p| format!("{p:#010x}"))
.collect(); .collect();
tracing::info!( tracing::info!(
"ctor probes armed: {} ({})", "pc probes armed: {} ({}); consumer-snapshots: {}",
kernel.ctor_probe_pcs.len(), kernel.ctor_probe_pcs.len(),
pcs.join(", ") pcs.join(", "),
kernel.pc_probe_consumers.len(),
); );
} }
} }

View File

@@ -196,6 +196,15 @@ pub struct KernelState {
/// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this /// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this
/// before any save/restore can clobber it. /// before any save/restore can clobber it.
pub ctor_probe_pcs: std::collections::HashSet<u32>, pub ctor_probe_pcs: std::collections::HashSet<u32>,
/// Diagnostic. Optional per-PC dispatcher snapshot. Maps a probe PC
/// to a `(dispatcher_addr, offset)` pair; when the PC fires, the
/// helper additionally logs the value of `[dispatcher_addr +
/// offset]` — i.e. exactly what the producer's `lwz r3, OFF(r3)`
/// is about to read after the `bl outer_getter` returns the
/// dispatcher pointer in r3. Populated from the `PC@DISP:OFF`
/// extended syntax of `--pc-probe` / `--ctor-probe`. Read-only
/// load — does not mutate guest state.
pub pc_probe_consumers: HashMap<u32, (u32, u32)>,
/// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32 /// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32
/// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08, /// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08,
/// 0x828F4070`. Used to inspect static dispatcher / job-queue / /// 0x828F4070`. Used to inspect static dispatcher / job-queue /
@@ -255,6 +264,7 @@ impl KernelState {
ring_size_dwords: 0, ring_size_dwords: 0,
parallel_active: false, parallel_active: false,
ctor_probe_pcs: std::collections::HashSet::new(), ctor_probe_pcs: std::collections::HashSet::new(),
pc_probe_consumers: HashMap::new(),
dump_addrs: Vec::new(), dump_addrs: Vec::new(),
}; };
crate::exports::register_exports(&mut state); crate::exports::register_exports(&mut state);
@@ -580,6 +590,14 @@ impl KernelState {
"CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}", "CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}",
pc, tid, hw_id, cycle, sp, r3, lr, pc, tid, hw_id, cycle, sp, r3, lr,
); );
if let Some(&(disp, off)) = self.pc_probe_consumers.get(&pc) {
let field_addr = disp.wrapping_add(off);
let field_val = mem.read_u32(field_addr);
println!(
" CTOR-PROBE consumer disp={:#010x} off={} field={:#010x} (= [disp+off])",
disp, off, field_val,
);
}
for (i, (fp, frame_lr)) in frames.iter().enumerate() { for (i, (fp, frame_lr)) in frames.iter().enumerate() {
let saved_r31 = mem.read_u32(fp.wrapping_sub(12)); let saved_r31 = mem.read_u32(fp.wrapping_sub(12));
let saved_r30 = mem.read_u32(fp.wrapping_sub(16)); let saved_r30 = mem.read_u32(fp.wrapping_sub(16));