diff --git a/audit-findings.md b/audit-findings.md index 6281bcd..4aef0ac 100644 --- a/audit-findings.md +++ b/audit-findings.md @@ -4622,3 +4622,194 @@ function 0x824AA1D8 is then called with handle=0). If none fire, the producer chain is gated upstream (likely a feature flag, init phase, or RPC handler that never fires). Either way, the next diagnostic narrows the bug surface dramatically. + +--- + +### KRNBUG-AUDIT-005 — `--pc-probe` extended syntax + canary kernel-call diff; `XexCheckExecutablePrivilege` stub gates init flow + +**Status**: landed on master (no-ff merge of feature branch +`canary-diff-and-pc-consumer-probe/p0-priv-stub-cascade`). Diagnostic- +only, read-only, lockstep-preserved (`run digest matches golden` at +`-n 50M --stable-digest`). + +**Tests**: 588 → **588** (unchanged; existing ctor-probe tests cover the +shared infrastructure). + +**What landed (`crates/xenia-kernel/src/state.rs`):** +- `pub pc_probe_consumers: HashMap` field on + `KernelState` (default empty). Maps a probe PC to a + `(dispatcher_addr, offset)` pair; on hit the helper additionally + logs `[disp+off]` — what the producer's `lwz r3, OFFSET(r3)` is + about to read after `bl outer_getter` returns the dispatcher in r3. +- `fire_ctor_probe_if_match` extended to read+print the consumer + field when present. Pure load — does not mutate guest state. + +**What landed (`crates/xenia-app/src/main.rs`):** +- `--pc-probe` clap alias on `--ctor-probe` (semantically clearer + name; both share parser/storage). +- Extended token syntax `PC@DISPATCHER:OFFSET` parsed via existing + `parse_hex_u32`. Plain `PC` form still works (backward compatible). +- `XENIA_PC_PROBE` env var as alias for `XENIA_CTOR_PROBE`. + +**What landed (`audit-runs/audit-005/`):** one-shot diagnostic +artifacts — not part of the repo build: +- `canary.log` — copy of `/home/fabi/xenia_canary_windows/xenia.log` from a Lutris launch of Sylpheed; oracle for what should happen +- `ours.log` — our trace at `-n 500M` with the 9-PC probe + `probe_calls=trace` filter (838 MB, 5.6 M lines) +- `diff.py` — kernel-call sequence diff (set-diff + first-divergence window); deletable after the audit +- `probe-test-10m.log` — initial smoke test confirming probe wiring + +**Reproduce:** + +```bash +cargo run --release -p xenia-app -- \ + --log-filter='probe_calls=trace,xenia=warn' \ + exec sylpheed.iso \ + --halt-on-deadlock \ + --trace-handles-focus=0x1004,0x100c,0x15e0 \ + --pc-probe=0x821802D8@0x828F3D08:80,0x821806E0@0x828F3D08:80,\ +0x82180B28@0x828F3D08:80,0x82180EA0@0x828F3D08:80,\ +0x82181254@0x828F3D08:80,0x8216F9D4@0x828F4070:36,\ +0x8216FC08@0x828F4070:36,0x821700B8@0x828F4070:36,\ +0x821700F4@0x828F4070:36 \ + -n 500_000_000 \ + 2> audit-runs/audit-005/ours.log + +python3 audit-runs/audit-005/diff.py --max 100 --window 30 +``` + +**Decisive findings:** + +1. **Failure mode (α) for KRNBUG-AUDIT-004 confirmed.** All 9 + non-create-chain producer call sites for handles 0x100c + (5 sites at `0x821802D8 / 0x821806E0 / 0x82180B28 / 0x82180EA0 / + 0x82181254`) and 0x15e0 (4 sites at `0x8216F9D4 / 0x8216FC08 / + 0x821700B8 / 0x821700F4`) **fire 0×** at -n 500M + (`grep -c CTOR-PROBE ours.log == 0`). The producer code path is + not reached. Rules out failure mode (B: `lwz` reads zero) and (3: + wake function called with stale handle). The bug is upstream, + in the control-flow that should lead the guest to those producer + functions. + +2. **Upstream control-flow divergence located: `XexCheckExecutablePrivilege` + stub returning 0.** Set-diff of kernel-call sequences across our + 500M-instruction run vs canary's full Sylpheed boot + (`canary.log`, ~5.3K lines, post-`swaps=2` boot loop reached) + identifies **11 exports that canary calls and we don't**: + + ``` + ExTerminateThread (×2) + KeReleaseSemaphore (×268) ← we use Nt* equivalents + KeResetEvent (×1) + NtDeviceIoControlFile (×2) + ObCreateSymbolicLink (×1) + XGetAVPack (×1) ← gated by priv-10 check + XamTaskCloseHandle (×1) + XamTaskSchedule (×1) ← AUDIT-002 producer candidate + XamUserReadProfileSettings (×2) + XeCryptSha (×1) + XeKeysConsolePrivateKeySign (×1) + ``` + + `XGetAVPack` has exactly one caller (`xrefs` table): site + `0x824AB5A0` inside `sub_824AB578`. The 4 instructions immediately + preceding it are: + + ``` + 824ab58c addi r3, r0, 10 ; privilege bit 10 + 824ab590 addi r31, r0, 0 + 824ab594 bl 0x8284DEFC ; XexCheckExecutablePrivilege + 824ab598 cmpli 0, r3, 0x0 + 824ab59c bc 12, eq, 0x824AB724 ; if r3==0, skip whole block + ; (XGetAVPack + crypto + Nt writes) + ``` + + Our impl `crates/xenia-kernel/src/exports.rs:193`: + ```rust + state.register_export(Xboxkrnl, 0x0194, "XexCheckExecutablePrivilege", + stub_return_zero); + ``` + `stub_return_zero` returns r3=0 unconditionally → guest takes + the `bc 12, eq, 0x824AB724` branch and skips the entire + AV/crypto/save-data init block. + + The OTHER call site (`sub_824A9710`, `0x824A99A0`) queries + privilege bit **11**: + ``` + 824a999c addi r3, r0, 11 + 824a99a0 bl 0x8284DEFC ; XexCheckExecutablePrivilege(11) + 824a99a4 cmpli 0, r3, 0 + 824a99a8 bc 4, eq, 0x824A9A60 ; bne — skip block if priv set + ``` + Different polarity (this one gates `XamTaskSchedule` etc. on + the **privilege-NOT-set** path). With both stubs returning 0, + the guest walks the wrong arm of *every* privilege-gated branch. + +3. **Cascade reaches the parked-waiter handles.** Trace evidence: + our `probe_calls` log shows `lr=0x824A97E4` (a hit from the + error path inside `sub_824A9710` *after* `sub_824ABA98` returned + negative NTSTATUS). The canary log shows all 11 missing exports + firing in a single contiguous boot phase between `XexCheckExecutablePrivilege` + and the worker-thread spawn — i.e. the init phase that sets up + the dispatcher data structures is exactly the phase we skip. + This explains **why the dispatcher fields read zero** (AUDIT-004 + dump: `[0x828F3D08+0x50] = 0`, `[0x828F4070+0x24] = 0`): the + ctors run (we counted those), but the *producers* that would + populate those fields with a non-zero handle never execute, + because the upstream init flow that registers them is gated + by the privilege checks. + +4. **Note on the diff: canary's log is filtered.** Canary's config + has `log_high_frequency_kernel_calls = false`, which suppresses + most `Rtl*`, `Mm*`, `Ke*`-internal calls from the log. The + "called in OURS but not canary" set (23 entries, headed by + `NtWaitForSingleObjectEx ×1.5M`) is dominated by this filter + difference — it is **not** a bug surface. The directionally + meaningful side of the diff is "called in CANARY but not OURS" + (above): canary's log includes every low-frequency call, so any + absence on our side is a real divergence. + +**Stop conditions check:** + +- Canary itself does NOT stall at swaps=2 — it reaches a steady + frame loop with `XamInputGetCapabilities` polling, texture loads, + `KeReleaseSemaphore` ticks. The diff was informative. +- First divergence is dense early-CRT noise (~3 entries in), but + the meaningful divergence anchored to a concrete export + (`XGetAVPack`, deterministically gated by a one-line stub) was + recoverable via set-diff. Did not need to narrow scope further. + +**Recommendation for next session (do not implement a fix here — this +is the read-only audit deliverable):** + +Replace `stub_return_zero` for `XexCheckExecutablePrivilege` with a +real implementation. The XEX header's privilege bitmask is parsed +during XEX load (see `crates/xenia-xex/`); `KernelState` already +holds the loaded `image_base`. Implementation outline: +- Parse `XEX_HEADER_EXECUTION_INFO` / privilege bits at load time + into `KernelState` (or surface via `Vfs` already-loaded XEX + metadata). +- `xex_check_executable_privilege(priv_id) -> u32`: + return 1 if bit `priv_id` is set in the title's privilege bitmask, + else 0. Match canary's encoding (privilege IDs are 0..7F; canary + reads `PrivilegeFlags[i/8] & (1 << (i%8))` from the XEX execution + info). + +Validation after the fix: +1. Re-run `audit-runs/audit-005/diff.py` — `XGetAVPack`, + `XamTaskSchedule`, `XeCryptSha`, etc. should appear in our + sequence and the divergence should advance several hundred + calls past the priv-check. +2. Re-run with the 9-PC probe armed at -n 500M — at minimum, the + ctor-probe firings change, and ideally one or more of the 9 + producer sites starts firing. +3. If producer sites fire, dispatcher fields `[0x828F3D08+0x50]` / + `[0x828F4070+0x24]` become non-zero (use `--dump-addr`). +4. Lockstep golden `crates/xenia-app/tests/golden/sylpheed_n50m.json` + will likely change (`imports` count goes up, `swaps` may advance); + regenerate the golden under `--stable-digest` and treat that as + the new lockstep anchor. + +If after the fix the producer is reached and dispatcher fields +populate, the parked-waiter deadlock should resolve — or surface +the next layer of bugs (e.g. signaling code reads non-zero handle +but `wake_eligible_waiters` fails). diff --git a/crates/xenia-app/src/main.rs b/crates/xenia-app/src/main.rs index f7b316a..6faf333 100644 --- a/crates/xenia-app/src/main.rs +++ b/crates/xenia-app/src/main.rs @@ -184,8 +184,13 @@ enum Commands { /// pool-element `this` addresses that MSVC ctors fail to /// preserve in r31. Read-only; lockstep digest unaffected. /// Examples: `--ctor-probe=0x8217C850`, - /// `--ctor-probe=0x82181750,0x821701C8`. - #[arg(long)] + /// `--ctor-probe=0x82181750,0x821701C8`. KRNBUG-AUDIT-005: + /// each token may be extended to `PC@DISPATCHER:OFFSET` (e.g. + /// `0x821802D8@0x828F3D08:80`); on a hit the helper additionally + /// logs the value of `[DISPATCHER+OFFSET]` — what the producer's + /// `lwz r3, OFFSET(r3)` is about to read. The flag is also + /// reachable as `--pc-probe` (semantically clearer name). + #[arg(long, alias = "pc-probe")] ctor_probe: Option, /// Diagnostic. Comma-separated list of guest addresses to dump /// (64 bytes each, hex + u32 lanes) at end-of-run, after the @@ -916,27 +921,39 @@ fn cmd_exec_inner( // worker prologue checks this set on every step; on a hit it // prints a single back-chain capture line. Empty set = no // probes = no-op fast path. - let ctor_probe_combined: Option = match (ctor_probe, std::env::var("XENIA_CTOR_PROBE").ok()) { + let ctor_probe_combined: Option = match ( + ctor_probe, + std::env::var("XENIA_CTOR_PROBE").ok().or_else(|| std::env::var("XENIA_PC_PROBE").ok()), + ) { (Some(s), _) => Some(s.to_string()), (None, Some(s)) if !s.is_empty() => Some(s), _ => None, }; if let Some(list) = ctor_probe_combined { for token in list.split(',').map(str::trim).filter(|s| !s.is_empty()) { - let parsed = if let Some(hex) = token.strip_prefix("0x").or_else(|| token.strip_prefix("0X")) { - u32::from_str_radix(hex, 16) - } else { - token.parse::() + let (pc_str, consumer): (&str, Option<(u32, u32)>) = match token.split_once('@') { + None => (token, None), + Some((pc_part, rest)) => { + let (disp_str, off_str) = rest.split_once(':').ok_or_else(|| { + anyhow::anyhow!( + "invalid --pc-probe token {token:?}: expected PC@DISPATCHER:OFFSET" + ) + })?; + let disp = parse_hex_u32(disp_str.trim()).map_err(|e| { + anyhow::anyhow!("invalid dispatcher in {token:?}: {e}") + })?; + let off = parse_hex_u32(off_str.trim()).map_err(|e| { + anyhow::anyhow!("invalid offset in {token:?}: {e}") + })?; + (pc_part.trim(), Some((disp, off))) + } }; - match parsed { - Ok(pc) => { - kernel.ctor_probe_pcs.insert(pc); - } - Err(_) => { - return Err(anyhow::anyhow!( - "invalid PC in --ctor-probe: {token:?}" - )); - } + let pc = parse_hex_u32(pc_str).map_err(|e| { + anyhow::anyhow!("invalid PC in --pc-probe: {token:?}: {e}") + })?; + kernel.ctor_probe_pcs.insert(pc); + if let Some(c) = consumer { + kernel.pc_probe_consumers.insert(pc, c); } } if !quiet && !kernel.ctor_probe_pcs.is_empty() { @@ -946,9 +963,10 @@ fn cmd_exec_inner( .map(|p| format!("{p:#010x}")) .collect(); tracing::info!( - "ctor probes armed: {} ({})", + "pc probes armed: {} ({}); consumer-snapshots: {}", kernel.ctor_probe_pcs.len(), - pcs.join(", ") + pcs.join(", "), + kernel.pc_probe_consumers.len(), ); } } diff --git a/crates/xenia-kernel/src/state.rs b/crates/xenia-kernel/src/state.rs index 8101d7d..1fe9dec 100644 --- a/crates/xenia-kernel/src/state.rs +++ b/crates/xenia-kernel/src/state.rs @@ -196,6 +196,15 @@ pub struct KernelState { /// `this`. Hooking the ctor's PRE-prologue PC captures r3 = this /// before any save/restore can clobber it. pub ctor_probe_pcs: std::collections::HashSet, + /// Diagnostic. Optional per-PC dispatcher snapshot. Maps a probe PC + /// to a `(dispatcher_addr, offset)` pair; when the PC fires, the + /// helper additionally logs the value of `[dispatcher_addr + + /// offset]` — i.e. exactly what the producer's `lwz r3, OFF(r3)` + /// is about to read after the `bl outer_getter` returns the + /// dispatcher pointer in r3. Populated from the `PC@DISP:OFF` + /// extended syntax of `--pc-probe` / `--ctor-probe`. Read-only + /// load — does not mutate guest state. + pub pc_probe_consumers: HashMap, /// Diagnostic. Guest addresses to dump (64 bytes each, hex + u32 /// lanes) at end-of-run. Populated from `--dump-addr=0x828F3D08, /// 0x828F4070`. Used to inspect static dispatcher / job-queue / @@ -255,6 +264,7 @@ impl KernelState { ring_size_dwords: 0, parallel_active: false, ctor_probe_pcs: std::collections::HashSet::new(), + pc_probe_consumers: HashMap::new(), dump_addrs: Vec::new(), }; crate::exports::register_exports(&mut state); @@ -580,6 +590,14 @@ impl KernelState { "CTOR-PROBE pc={:#010x} tid={} hw={} cycle={} sp={:#010x} r3={:#010x} lr={:#010x}", pc, tid, hw_id, cycle, sp, r3, lr, ); + if let Some(&(disp, off)) = self.pc_probe_consumers.get(&pc) { + let field_addr = disp.wrapping_add(off); + let field_val = mem.read_u32(field_addr); + println!( + " CTOR-PROBE consumer disp={:#010x} off={} field={:#010x} (= [disp+off])", + disp, off, field_val, + ); + } for (i, (fp, frame_lr)) in frames.iter().enumerate() { let saved_r31 = mem.read_u32(fp.wrapping_sub(12)); let saved_r30 = mem.read_u32(fp.wrapping_sub(16));