feat(kernel): KRNBUG-AUDIT-005 — --pc-probe extension + canary diff identifies XexCheckExecutablePrivilege stub cascade
Extends `--ctor-probe` machinery into `--pc-probe` (clap alias) with
the optional `PC@DISPATCHER:OFFSET` token form: on a hit, the helper
additionally logs `[disp+off]` — what the producer's
`lwz r3, OFFSET(r3)` is about to read. Reuses `parse_hex_u32`; both
flags share parser + storage.
Read-only diagnostic. Lockstep digest preserved (`run digest matches
golden` at -n 50M `--stable-digest`). 588 tests green.
Decisive findings (full deliverable in `audit-findings.md` /
`audit-runs/audit-005/`):
- Failure mode α confirmed for KRNBUG-AUDIT-004: all 9 producer call
sites for handles 0x100c (5 sites) and 0x15e0 (4 sites) fire 0x at
-n 500M. The producer code path is not reached.
- Set-diff of kernel-call sequences (canary.log oracle vs ours.log
at -n 500M) identifies 11 exports canary calls and we don't:
XGetAVPack, XeCryptSha, XeKeysConsolePrivateKeySign,
ObCreateSymbolicLink, NtDeviceIoControlFile (×2),
XamUserReadProfileSettings (×2), XamTaskSchedule, XamTaskCloseHandle,
KeReleaseSemaphore (×268), KeResetEvent, ExTerminateThread (×2).
- XGetAVPack has exactly one caller (sub_824AB578 at 0x824AB5A0).
The 4 instructions immediately preceding it are:
addi r3, r0, 10 ; privilege bit 10
bl XexCheckExecutablePrivilege
cmpli 0, r3, 0
bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
- exports.rs:193 registers XexCheckExecutablePrivilege as
stub_return_zero. Always returning 0 -> guest takes the branch
and skips the entire AV/crypto/save-data init block.
- The other call site (sub_824A9710 at 0x824A99A0) queries privilege
11 with opposite polarity (bne) -> gates XamTaskSchedule on the
privilege-NOT-set arm. With both stubs returning 0, the guest
walks the wrong arm of every privilege-gated branch.
- This explains why the dispatcher fields read zero
([0x828F3D08+0x50]=0, [0x828F4070+0x24]=0 from AUDIT-004 dumps):
the ctors run, but the producers that would populate those fields
with a non-zero handle never execute.
Next session: replace XexCheckExecutablePrivilege stub with real
priv-bit lookup from XEX header. See audit-findings.md
KRNBUG-AUDIT-005 for the validation matrix.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -4622,3 +4622,194 @@ function 0x824AA1D8 is then called with handle=0). If none fire,
|
||||
the producer chain is gated upstream (likely a feature flag, init
|
||||
phase, or RPC handler that never fires). Either way, the next
|
||||
diagnostic narrows the bug surface dramatically.
|
||||
|
||||
---
|
||||
|
||||
### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow
|
||||
|
||||
**Status**: landed on master (no-ff merge of feature branch
|
||||
`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic-
|
||||
only, read-only, lockstep-preserved (`run digest matches golden` at
|
||||
`-n 50M --stable-digest`).
|
||||
|
||||
**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the
|
||||
shared infrastructure).
|
||||
|
||||
**What landed (`crates/xenia-kernel/src/state.rs`):**
|
||||
- `pub pc_probe_consumers: HashMap<u32, (u32, u32)>` field on
|
||||
`KernelState` (default empty). Maps a probe PC to a
|
||||
`(dispatcher_addr, offset)` pair; on hit the helper additionally
|
||||
logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is
|
||||
about to read after `bl outer_getter` returns the dispatcher in r3.
|
||||
- `fire_ctor_probe_if_match` extended to read+print the consumer
|
||||
field when present. Pure load — does not mutate guest state.
|
||||
|
||||
**What landed (`crates/xenia-app/src/main.rs`):**
|
||||
- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer
|
||||
name; both share parser/storage).
|
||||
- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing
|
||||
`parse_hex_u32`. Plain `PC` form still works (backward compatible).
|
||||
- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`.
|
||||
|
||||
**What landed (`audit-runs/audit-005/`):** one-shot diagnostic
|
||||
artifacts — not part of the repo build:
|
||||
- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen
|
||||
- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines)
|
||||
- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
|
||||
- `probe-test-10m.log` — initial smoke test confirming probe wiring
|
||||
|
||||
**Reproduce:**
|
||||
|
||||
```bash
|
||||
cargo run --release -p xenia-app -- \
|
||||
--log-filter='probe_calls=trace,xenia=warn' \
|
||||
exec sylpheed.iso \
|
||||
--halt-on-deadlock \
|
||||
--trace-handles-focus=0x1004,0x100c,0x15e0 \
|
||||
--pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
|
||||
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
|
||||
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
|
||||
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
|
||||
0x821700F4@0x828F4070:36 \
|
||||
-n 500_000_000 \
|
||||
2> audit-runs/audit-005/ours.log
|
||||
|
||||
python3 audit-runs/audit-005/diff.py --max 100 --window 30
|
||||
```
|
||||
|
||||
**Decisive findings:**
|
||||
|
||||
1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9
|
||||
non-create-chain producer call sites for handles 0x100c
|
||||
(5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 /
|
||||
0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 /
|
||||
0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M
|
||||
(`grep -c CTOR-PROBE ours.log == 0`). The producer code path is
|
||||
not reached. Rules out failure mode (B: `lwz` reads zero) and (3:
|
||||
wake function called with stale handle). The bug is upstream,
|
||||
in the control-flow that should lead the guest to those producer
|
||||
functions.
|
||||
|
||||
2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege`
|
||||
stub returning 0.** Set-diff of kernel-call sequences across our
|
||||
500M-instruction run vs canary's full Sylpheed boot
|
||||
(`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached)
|
||||
identifies **11 exports that canary calls and we don't**:
|
||||
|
||||
```
|
||||
ExTerminateThread (×2)
|
||||
KeReleaseSemaphore (×268) ← we use Nt* equivalents
|
||||
KeResetEvent (×1)
|
||||
NtDeviceIoControlFile (×2)
|
||||
ObCreateSymbolicLink (×1)
|
||||
XGetAVPack (×1) ← gated by priv-10 check
|
||||
XamTaskCloseHandle (×1)
|
||||
XamTaskSchedule (×1) ← AUDIT-002 producer candidate
|
||||
XamUserReadProfileSettings (×2)
|
||||
XeCryptSha (×1)
|
||||
XeKeysConsolePrivateKeySign (×1)
|
||||
```
|
||||
|
||||
`XGetAVPack` has exactly one caller (`xrefs` table): site
|
||||
`0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately
|
||||
preceding it are:
|
||||
|
||||
```
|
||||
824ab58c addi r3, r0, 10 ; privilege bit 10
|
||||
824ab590 addi r31, r0, 0
|
||||
824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege
|
||||
824ab598 cmpli 0, r3, 0x0
|
||||
824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
|
||||
; (XGetAVPack + crypto + Nt writes)
|
||||
```
|
||||
|
||||
Our impl `crates/xenia-kernel/src/exports.rs:193`:
|
||||
```rust
|
||||
state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
|
||||
stub_return_zero);
|
||||
```
|
||||
`stub_return_zero` returns r3=0 unconditionally → guest takes
|
||||
the `bc 12, eq, 0x824AB724` branch and skips the entire
|
||||
AV/crypto/save-data init block.
|
||||
|
||||
The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries
|
||||
privilege bit **11**:
|
||||
```
|
||||
824a999c addi r3, r0, 11
|
||||
824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11)
|
||||
824a99a4 cmpli 0, r3, 0
|
||||
824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set
|
||||
```
|
||||
Different polarity (this one gates `XamTaskSchedule` etc. on
|
||||
the **privilege-NOT-set** path). With both stubs returning 0,
|
||||
the guest walks the wrong arm of *every* privilege-gated branch.
|
||||
|
||||
3. **Cascade reaches the parked-waiter handles.** Trace evidence:
|
||||
our `probe_calls` log shows `lr=0x824A97E4` (a hit from the
|
||||
error path inside `sub_824A9710` *after* `sub_824ABA98` returned
|
||||
negative NTSTATUS). The canary log shows all 11 missing exports
|
||||
firing in a single contiguous boot phase between `XexCheckExecutablePrivilege`
|
||||
and the worker-thread spawn — i.e. the init phase that sets up
|
||||
the dispatcher data structures is exactly the phase we skip.
|
||||
This explains **why the dispatcher fields read zero** (AUDIT-004
|
||||
dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the
|
||||
ctors run (we counted those), but the *producers* that would
|
||||
populate those fields with a non-zero handle never execute,
|
||||
because the upstream init flow that registers them is gated
|
||||
by the privilege checks.
|
||||
|
||||
4. **Note on the diff: canary's log is filtered.** Canary's config
|
||||
has `log_high_frequency_kernel_calls = false`, which suppresses
|
||||
most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The
|
||||
"called in OURS but not canary" set (23 entries, headed by
|
||||
`NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter
|
||||
difference — it is **not** a bug surface. The directionally
|
||||
meaningful side of the diff is "called in CANARY but not OURS"
|
||||
(above): canary's log includes every low-frequency call, so any
|
||||
absence on our side is a real divergence.
|
||||
|
||||
**Stop conditions check:**
|
||||
|
||||
- Canary itself does NOT stall at swaps=2 — it reaches a steady
|
||||
frame loop with `XamInputGetCapabilities` polling, texture loads,
|
||||
`KeReleaseSemaphore` ticks. The diff was informative.
|
||||
- First divergence is dense early-CRT noise (~3 entries in), but
|
||||
the meaningful divergence anchored to a concrete export
|
||||
(`XGetAVPack`, deterministically gated by a one-line stub) was
|
||||
recoverable via set-diff. Did not need to narrow scope further.
|
||||
|
||||
**Recommendation for next session (do not implement a fix here — this
|
||||
is the read-only audit deliverable):**
|
||||
|
||||
Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a
|
||||
real implementation. The XEX header's privilege bitmask is parsed
|
||||
during XEX load (see `crates/xenia-xex/`); `KernelState` already
|
||||
holds the loaded `image_base`. Implementation outline:
|
||||
- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time
|
||||
into `KernelState` (or surface via `Vfs` already-loaded XEX
|
||||
metadata).
|
||||
- `xex_check_executable_privilege(priv_id) -> u32`:
|
||||
return 1 if bit `priv_id` is set in the title's privilege bitmask,
|
||||
else 0. Match canary's encoding (privilege IDs are 0..7F; canary
|
||||
reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution
|
||||
info).
|
||||
|
||||
Validation after the fix:
|
||||
1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`,
|
||||
`XamTaskSchedule`, `XeCryptSha`, etc. should appear in our
|
||||
sequence and the divergence should advance several hundred
|
||||
calls past the priv-check.
|
||||
2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the
|
||||
ctor-probe firings change, and ideally one or more of the 9
|
||||
producer sites starts firing.
|
||||
3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` /
|
||||
`[0x828F4070+0x24]` become non-zero (use `--dump-addr`).
|
||||
4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json`
|
||||
will likely change (`imports` count goes up, `swaps` may advance);
|
||||
regenerate the golden under `--stable-digest` and treat that as
|
||||
the new lockstep anchor.
|
||||
|
||||
If after the fix the producer is reached and dispatcher fields
|
||||
populate, the parked-waiter deadlock should resolve — or surface
|
||||
the next layer of bugs (e.g. signaling code reads non-zero handle
|
||||
but `wake_eligible_waiters` fails).
|
||||
|
||||
@@ -184,8 +184,13 @@ enum Commands {
|
||||
/// pool-element `this` addresses that MSVC ctors fail to
|
||||
/// preserve in r31. Read-only; lockstep digest unaffected.
|
||||
/// Examples: `--ctor-probe=0x8217C850`,
|
||||
/// `--ctor-probe=0x82181750,0x821701C8`.
|
||||
#[arg(long)]
|
||||
/// `--ctor-probe=0x82181750,0x821701C8`. KRNBUG-AUDIT-005:
|
||||
/// each token may be extended to `PC@DISPATCHER:OFFSET` (e.g.
|
||||
/// `0x821802D8@0x828F3D08:80`); on a hit the helper additionally
|
||||
/// logs the value of `[DISPATCHER+OFFSET]` — what the producer's
|
||||
/// `lwz r3, OFFSET(r3)` is about to read. The flag is also
|
||||
/// reachable as `--pc-probe` (semantically clearer name).
|
||||
#[arg(long, alias = "pc-probe")]
|
||||
ctor_probe: Option<String>,
|
||||
/// Diagnostic. Comma-separated list of guest addresses to dump
|
||||
/// (64 bytes each, hex + u32 lanes) at end-of-run, after the
|
||||
@@ -916,27 +921,39 @@ fn cmd_exec_inner(
|
||||
// worker prologue checks this set on every step; on a hit it
|
||||
// prints a single back-chain capture line. Empty set = no
|
||||
// probes = no-op fast path.
|
||||
let ctor_probe_combined: Option<String> = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) {
|
||||
let ctor_probe_combined: Option<String> = match (
|
||||
ctor_probe,
|
||||
std::env::var("XENIA_CTOR_PROBE").ok().or_else(|| std::env::var("XENIA_PC_PROBE").ok()),
|
||||
) {
|
||||
(Some(s), _) => Some(s.to_string()),
|
||||
(None, Some(s)) if !s.is_empty() => Some(s),
|
||||
_ => None,
|
||||
};
|
||||
if let Some(list) = ctor_probe_combined {
|
||||
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
|
||||
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
|
||||
u32::from_str_radix(hex, 16)
|
||||
} else {
|
||||
token.parse::<u32>()
|
||||
let (pc_str, consumer): (&str, Option<(u32, u32)>) = match token.split_once('@') {
|
||||
None => (token, None),
|
||||
Some((pc_part, rest)) => {
|
||||
let (disp_str, off_str) = rest.split_once(':').ok_or_else(|| {
|
||||
anyhow::anyhow!(
|
||||
"invalid --pc-probe token {token:?}: expected PC@DISPATCHER:OFFSET"
|
||||
)
|
||||
})?;
|
||||
let disp = parse_hex_u32(disp_str.trim()).map_err(|e| {
|
||||
anyhow::anyhow!("invalid dispatcher in {token:?}: {e}")
|
||||
})?;
|
||||
let off = parse_hex_u32(off_str.trim()).map_err(|e| {
|
||||
anyhow::anyhow!("invalid offset in {token:?}: {e}")
|
||||
})?;
|
||||
(pc_part.trim(), Some((disp, off)))
|
||||
}
|
||||
};
|
||||
match parsed {
|
||||
Ok(pc) => {
|
||||
kernel.ctor_probe_pcs.insert(pc);
|
||||
}
|
||||
Err(_) => {
|
||||
return Err(anyhow::anyhow!(
|
||||
"invalid PC in --ctor-probe: {token:?}"
|
||||
));
|
||||
}
|
||||
let pc = parse_hex_u32(pc_str).map_err(|e| {
|
||||
anyhow::anyhow!("invalid PC in --pc-probe: {token:?}: {e}")
|
||||
})?;
|
||||
kernel.ctor_probe_pcs.insert(pc);
|
||||
if let Some(c) = consumer {
|
||||
kernel.pc_probe_consumers.insert(pc, c);
|
||||
}
|
||||
}
|
||||
if !quiet && !kernel.ctor_probe_pcs.is_empty() {
|
||||
@@ -946,9 +963,10 @@ fn cmd_exec_inner(
|
||||
.map(|p| format!("{p:#010x}"))
|
||||
.collect();
|
||||
tracing::info!(
|
||||
"ctor probes armed: {} ({})",
|
||||
"pc probes armed: {} ({}); consumer-snapshots: {}",
|
||||
kernel.ctor_probe_pcs.len(),
|
||||
pcs.join(", ")
|
||||
pcs.join(", "),
|
||||
kernel.pc_probe_consumers.len(),
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -196,6 +196,15 @@ pub struct KernelState {
|
||||
/// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this
|
||||
/// before any save/restore can clobber it.
|
||||
pub ctor_probe_pcs: std::collections::HashSet<u32>,
|
||||
/// Diagnostic. Optional per-PC dispatcher snapshot. Maps a probe PC
|
||||
/// to a `(dispatcher_addr, offset)` pair; when the PC fires, the
|
||||
/// helper additionally logs the value of `[dispatcher_addr +
|
||||
/// offset]` — i.e. exactly what the producer's `lwz r3, OFF(r3)`
|
||||
/// is about to read after the `bl outer_getter` returns the
|
||||
/// dispatcher pointer in r3. Populated from the `PC@DISP:OFF`
|
||||
/// extended syntax of `--pc-probe` / `--ctor-probe`. Read-only
|
||||
/// load — does not mutate guest state.
|
||||
pub pc_probe_consumers: HashMap<u32, (u32, u32)>,
|
||||
/// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32
|
||||
/// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08,
|
||||
/// 0x828F4070`. Used to inspect static dispatcher / job-queue /
|
||||
@@ -255,6 +264,7 @@ impl KernelState {
|
||||
ring_size_dwords: 0,
|
||||
parallel_active: false,
|
||||
ctor_probe_pcs: std::collections::HashSet::new(),
|
||||
pc_probe_consumers: HashMap::new(),
|
||||
dump_addrs: Vec::new(),
|
||||
};
|
||||
crate::exports::register_exports(&mut state);
|
||||
@@ -580,6 +590,14 @@ impl KernelState {
|
||||
"CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}",
|
||||
pc, tid, hw_id, cycle, sp, r3, lr,
|
||||
);
|
||||
if let Some(&(disp, off)) = self.pc_probe_consumers.get(&pc) {
|
||||
let field_addr = disp.wrapping_add(off);
|
||||
let field_val = mem.read_u32(field_addr);
|
||||
println!(
|
||||
" CTOR-PROBE consumer disp={:#010x} off={} field={:#010x} (= [disp+off])",
|
||||
disp, off, field_val,
|
||||
);
|
||||
}
|
||||
for (i, (fp, frame_lr)) in frames.iter().enumerate() {
|
||||
let saved_r31 = mem.read_u32(fp.wrapping_sub(12));
|
||||
let saved_r30 = mem.read_u32(fp.wrapping_sub(16));
|
||||
|
||||
Reference in New Issue
Block a user