Merge canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade (KRNBUG-AUDIT-005)

This commit is contained in:
MechaCat02
2026-05-04 18:06:26 +02:00
3 changed files with 245 additions and 18 deletions

View File

@@ -4622,3 +4622,194 @@ function 0x824AA1D8 is then called with handle=0). If none fire,
the producer chain is gated upstream (likely a feature flag, init
phase, or RPC handler that never fires). Either way, the next
diagnostic narrows the bug surface dramatically.
---
### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow
**Status**: landed on master (no-ff merge of feature branch
`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic-
only, read-only, lockstep-preserved (`run digest matches golden` at
`-n 50M --stable-digest`).
**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the
shared infrastructure).
**What landed (`crates/xenia-kernel/src/state.rs`):**
- `pub pc_probe_consumers: HashMap<u32, (u32, u32)>` field on
`KernelState` (default empty). Maps a probe PC to a
`(dispatcher_addr, offset)` pair; on hit the helper additionally
logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is
about to read after `bl outer_getter` returns the dispatcher in r3.
- `fire_ctor_probe_if_match` extended to read+print the consumer
field when present. Pure load — does not mutate guest state.
**What landed (`crates/xenia-app/src/main.rs`):**
- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer
name; both share parser/storage).
- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing
`parse_hex_u32`. Plain `PC` form still works (backward compatible).
- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`.
**What landed (`audit-runs/audit-005/`):** one-shot diagnostic
artifacts — not part of the repo build:
- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen
- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines)
- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit
- `probe-test-10m.log` — initial smoke test confirming probe wiring
**Reproduce:**
```bash
cargo run --release -p xenia-app -- \
--log-filter='probe_calls=trace,xenia=warn' \
exec sylpheed.iso \
--halt-on-deadlock \
--trace-handles-focus=0x1004,0x100c,0x15e0 \
--pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\
0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\
0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\
0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\
0x821700F4@0x828F4070:36 \
-n 500_000_000 \
2> audit-runs/audit-005/ours.log
python3 audit-runs/audit-005/diff.py --max 100 --window 30
```
**Decisive findings:**
1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9
non-create-chain producer call sites for handles 0x100c
(5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 /
0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 /
0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M
(`grep -c CTOR-PROBE ours.log == 0`). The producer code path is
not reached. Rules out failure mode (B: `lwz` reads zero) and (3:
wake function called with stale handle). The bug is upstream,
in the control-flow that should lead the guest to those producer
functions.
2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege`
stub returning 0.** Set-diff of kernel-call sequences across our
500M-instruction run vs canary's full Sylpheed boot
(`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached)
identifies **11 exports that canary calls and we don't**:
```
ExTerminateThread (×2)
KeReleaseSemaphore (×268) ← we use Nt* equivalents
KeResetEvent (×1)
NtDeviceIoControlFile (×2)
ObCreateSymbolicLink (×1)
XGetAVPack (×1) ← gated by priv-10 check
XamTaskCloseHandle (×1)
XamTaskSchedule (×1) ← AUDIT-002 producer candidate
XamUserReadProfileSettings (×2)
XeCryptSha (×1)
XeKeysConsolePrivateKeySign (×1)
```
`XGetAVPack` has exactly one caller (`xrefs` table): site
`0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately
preceding it are:
```
824ab58c addi r3, r0, 10 ; privilege bit 10
824ab590 addi r31, r0, 0
824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege
824ab598 cmpli 0, r3, 0x0
824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block
; (XGetAVPack + crypto + Nt writes)
```
Our impl `crates/xenia-kernel/src/exports.rs:193`:
```rust
state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege",
stub_return_zero);
```
`stub_return_zero` returns r3=0 unconditionally → guest takes
the `bc 12, eq, 0x824AB724` branch and skips the entire
AV/crypto/save-data init block.
The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries
privilege bit **11**:
```
824a999c addi r3, r0, 11
824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11)
824a99a4 cmpli 0, r3, 0
824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set
```
Different polarity (this one gates `XamTaskSchedule` etc. on
the **privilege-NOT-set** path). With both stubs returning 0,
the guest walks the wrong arm of *every* privilege-gated branch.
3. **Cascade reaches the parked-waiter handles.** Trace evidence:
our `probe_calls` log shows `lr=0x824A97E4` (a hit from the
error path inside `sub_824A9710` *after* `sub_824ABA98` returned
negative NTSTATUS). The canary log shows all 11 missing exports
firing in a single contiguous boot phase between `XexCheckExecutablePrivilege`
and the worker-thread spawn — i.e. the init phase that sets up
the dispatcher data structures is exactly the phase we skip.
This explains **why the dispatcher fields read zero** (AUDIT-004
dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the
ctors run (we counted those), but the *producers* that would
populate those fields with a non-zero handle never execute,
because the upstream init flow that registers them is gated
by the privilege checks.
4. **Note on the diff: canary's log is filtered.** Canary's config
has `log_high_frequency_kernel_calls = false`, which suppresses
most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The
"called in OURS but not canary" set (23 entries, headed by
`NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter
difference — it is **not** a bug surface. The directionally
meaningful side of the diff is "called in CANARY but not OURS"
(above): canary's log includes every low-frequency call, so any
absence on our side is a real divergence.
**Stop conditions check:**
- Canary itself does NOT stall at swaps=2 — it reaches a steady
frame loop with `XamInputGetCapabilities` polling, texture loads,
`KeReleaseSemaphore` ticks. The diff was informative.
- First divergence is dense early-CRT noise (~3 entries in), but
the meaningful divergence anchored to a concrete export
(`XGetAVPack`, deterministically gated by a one-line stub) was
recoverable via set-diff. Did not need to narrow scope further.
**Recommendation for next session (do not implement a fix here — this
is the read-only audit deliverable):**
Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a
real implementation. The XEX header's privilege bitmask is parsed
during XEX load (see `crates/xenia-xex/`); `KernelState` already
holds the loaded `image_base`. Implementation outline:
- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time
into `KernelState` (or surface via `Vfs` already-loaded XEX
metadata).
- `xex_check_executable_privilege(priv_id) -> u32`:
return 1 if bit `priv_id` is set in the title's privilege bitmask,
else 0. Match canary's encoding (privilege IDs are 0..7F; canary
reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution
info).
Validation after the fix:
1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`,
`XamTaskSchedule`, `XeCryptSha`, etc. should appear in our
sequence and the divergence should advance several hundred
calls past the priv-check.
2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the
ctor-probe firings change, and ideally one or more of the 9
producer sites starts firing.
3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` /
`[0x828F4070+0x24]` become non-zero (use `--dump-addr`).
4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json`
will likely change (`imports` count goes up, `swaps` may advance);
regenerate the golden under `--stable-digest` and treat that as
the new lockstep anchor.
If after the fix the producer is reached and dispatcher fields
populate, the parked-waiter deadlock should resolve — or surface
the next layer of bugs (e.g. signaling code reads non-zero handle
but `wake_eligible_waiters` fails).

View File

@@ -184,8 +184,13 @@ enum Commands {
/// pool-element `this` addresses that MSVC ctors fail to
/// preserve in r31. Read-only; lockstep digest unaffected.
/// Examples: `--ctor-probe=0x8217C850`,
/// `--ctor-probe=0x82181750,0x821701C8`.
#[arg(long)]
/// `--ctor-probe=0x82181750,0x821701C8`. KRNBUG-AUDIT-005:
/// each token may be extended to `PC@DISPATCHER:OFFSET` (e.g.
/// `0x821802D8@0x828F3D08:80`); on a hit the helper additionally
/// logs the value of `[DISPATCHER+OFFSET]` — what the producer's
/// `lwz r3, OFFSET(r3)` is about to read. The flag is also
/// reachable as `--pc-probe` (semantically clearer name).
#[arg(long, alias = "pc-probe")]
ctor_probe: Option<String>,
/// Diagnostic. Comma-separated list of guest addresses to dump
/// (64 bytes each, hex + u32 lanes) at end-of-run, after the
@@ -916,27 +921,39 @@ fn cmd_exec_inner(
// worker prologue checks this set on every step; on a hit it
// prints a single back-chain capture line. Empty set = no
// probes = no-op fast path.
let ctor_probe_combined: Option<String> = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) {
let ctor_probe_combined: Option<String> = match (
ctor_probe,
std::env::var("XENIA_CTOR_PROBE").ok().or_else(|| std::env::var("XENIA_PC_PROBE").ok()),
) {
(Some(s), _) => Some(s.to_string()),
(None, Some(s)) if !s.is_empty() => Some(s),
_ => None,
};
if let Some(list) = ctor_probe_combined {
for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) {
let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) {
u32::from_str_radix(hex, 16)
} else {
token.parse::<u32>()
let (pc_str, consumer): (&str, Option<(u32, u32)>) = match token.split_once('@') {
None => (token, None),
Some((pc_part, rest)) => {
let (disp_str, off_str) = rest.split_once(':').ok_or_else(|| {
anyhow::anyhow!(
"invalid --pc-probe token {token:?}: expected PC@DISPATCHER:OFFSET"
)
})?;
let disp = parse_hex_u32(disp_str.trim()).map_err(|e| {
anyhow::anyhow!("invalid dispatcher in {token:?}: {e}")
})?;
let off = parse_hex_u32(off_str.trim()).map_err(|e| {
anyhow::anyhow!("invalid offset in {token:?}: {e}")
})?;
(pc_part.trim(), Some((disp, off)))
}
};
match parsed {
Ok(pc) => {
let pc = parse_hex_u32(pc_str).map_err(|e| {
anyhow::anyhow!("invalid PC in --pc-probe: {token:?}: {e}")
})?;
kernel.ctor_probe_pcs.insert(pc);
}
Err(_) => {
return Err(anyhow::anyhow!(
"invalid PC in --ctor-probe: {token:?}"
));
}
if let Some(c) = consumer {
kernel.pc_probe_consumers.insert(pc, c);
}
}
if !quiet && !kernel.ctor_probe_pcs.is_empty() {
@@ -946,9 +963,10 @@ fn cmd_exec_inner(
.map(|p| format!("{p:#010x}"))
.collect();
tracing::info!(
"ctor probes armed: {} ({})",
"pc probes armed: {} ({}); consumer-snapshots: {}",
kernel.ctor_probe_pcs.len(),
pcs.join(", ")
pcs.join(", "),
kernel.pc_probe_consumers.len(),
);
}
}

View File

@@ -196,6 +196,15 @@ pub struct KernelState {
/// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this
/// before any save/restore can clobber it.
pub ctor_probe_pcs: std::collections::HashSet<u32>,
/// Diagnostic. Optional per-PC dispatcher snapshot. Maps a probe PC
/// to a `(dispatcher_addr, offset)` pair; when the PC fires, the
/// helper additionally logs the value of `[dispatcher_addr +
/// offset]` — i.e. exactly what the producer's `lwz r3, OFF(r3)`
/// is about to read after the `bl outer_getter` returns the
/// dispatcher pointer in r3. Populated from the `PC@DISP:OFF`
/// extended syntax of `--pc-probe` / `--ctor-probe`. Read-only
/// load — does not mutate guest state.
pub pc_probe_consumers: HashMap<u32, (u32, u32)>,
/// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32
/// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08,
/// 0x828F4070`. Used to inspect static dispatcher / job-queue /
@@ -255,6 +264,7 @@ impl KernelState {
ring_size_dwords: 0,
parallel_active: false,
ctor_probe_pcs: std::collections::HashSet::new(),
pc_probe_consumers: HashMap::new(),
dump_addrs: Vec::new(),
};
crate::exports::register_exports(&mut state);
@@ -580,6 +590,14 @@ impl KernelState {
"CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}",
pc, tid, hw_id, cycle, sp, r3, lr,
);
if let Some(&(disp, off)) = self.pc_probe_consumers.get(&pc) {
let field_addr = disp.wrapping_add(off);
let field_val = mem.read_u32(field_addr);
println!(
" CTOR-PROBE consumer disp={:#010x} off={} field={:#010x} (= [disp+off])",
disp, off, field_val,
);
}
for (i, (fp, frame_lr)) in frames.iter().enumerate() {
let saved_r31 = mem.read_u32(fp.wrapping_sub(12));
let saved_r30 = mem.read_u32(fp.wrapping_sub(16));