feat(kernel): KRNBUG-AUDIT-007 — --branch-probe instrumentation; sub_824A9710 exit gate identified
Sister to --pc-probe / --ctor-probe but emits a single compact one-line BRANCH-PROBE record per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6 flags) with no back-chain. Designed for tracing every conditional-branch fire inside a candidate-gate function so the last PC reached before the function epilogue identifies the exit branch. Runtime trace at audit-runs/audit-007/sub_824A9710-trace.log decisively identifies the priv-11 gate: - Exit branch: 0x824a9944 (post bl sub_824ABD88 first call) - Responsible kernel call: NtDeviceIoControlFile, FsCtlCode=0x74004 (registered as stub_success at exports.rs:90) - Mechanical chain: stub returns 0/SUCCESS without writing OUT, game reads [out_buf+8], finds zero, assigns hardcoded 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND) at sub_824ABD88:0x824abea8-ac, exits via 0x824a9944's lt branch before priv-11 site at 0x824a99a0. 592→592 tests; lockstep instructions=100000010, swaps=2, draws=0 deterministic across reruns. Read-only diagnostic — no fix this session. Next session: KRNBUG-IO-003 (real NtDeviceIoControlFile per canary NullDevice::IoControl for FsCtlCodes 0x70000 + 0x74004). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -5079,3 +5079,143 @@ the priv-11 gate**.
|
||||
- `audit-runs/post-IO-002/canary_only.txt` — set-difference output (the 7-entry list)
|
||||
- `audit-runs/post-IO-002/canary_exports.txt`, `ours_exports.txt` — sorted unique export names
|
||||
|
||||
|
||||
---
|
||||
|
||||
## KRNBUG-AUDIT-007 — branch-probe instrumentation + sub_824A9710 exit-branch identification (2026-05-04)
|
||||
|
||||
### Outcome
|
||||
|
||||
**`--branch-probe` instrumentation landed (read-only diagnostic). Runtime trace decisively identified the priv-11 gate.**
|
||||
- 592→592 tests; lockstep `instructions=100000010, swaps=2, draws=0` deterministic across reruns
|
||||
(`audit-runs/audit-007/lock_post_branchprobe.json` ≡ `lock_post_branchprobe_run2.json`
|
||||
≡ `audit-runs/post-IO-002/lock_n100m_run1.json`).
|
||||
- Branch: `investigate-sub-824a9710/p0-branch-probe` — kept (instrumentation is reusable).
|
||||
|
||||
### Decisive runtime evidence
|
||||
|
||||
`audit-runs/audit-007/sub_824A9710-trace.log`:
|
||||
```
|
||||
BRANCH-PROBE pc=0x824a9710 tid=1 hw=0 cycle=5363003 r3=0x00000000 lr=0x824a9acc
|
||||
BRANCH-PROBE pc=0x824a97e0 tid=1 hw=0 cycle=5369559 r3=0xc0000034 lr=0x824a9940
|
||||
BRANCH-PROBE pc=0x824a9a98 tid=1 hw=0 cycle=5369562 r3=0x00000002 lr=0x824a97e4
|
||||
```
|
||||
|
||||
The probe at `0x824a97e0` (the failure landing pad) captured `r3=0xC0000034`, `lr=0x824a9940` (= the
|
||||
`cmpi 0,r3,0` PC after `bl sub_824ABD88` at `0x824a993c`). This pinpoints:
|
||||
- **Exit branch**: `0x824a9944` (`bc 12, lt, 0x824A97E0`) — taken because r3 was 0xC0000034 < 0.
|
||||
- **Responsible bl**: `0x824a993c` → `sub_824ABD88` first call.
|
||||
- **Status code**: `0xC0000034` = `STATUS_OBJECT_NAME_NOT_FOUND`.
|
||||
|
||||
### Root-cause chain through sub_824ABD88
|
||||
|
||||
The function-detector's `end_address=0x824abe3c` for sub_824ABD88 was a truncation artifact;
|
||||
the function actually runs to `0x824ac184`. Within that range the `0xC0000034` is **HARDCODED**
|
||||
at `0x824abea8-0x824abeac`:
|
||||
```
|
||||
0x824abe90 bl NtDeviceIoControlFile (FsCtlCode=0x74004, out_buf=r1+160, out_len=16)
|
||||
0x824abe94 cmpi 0, r3, 0
|
||||
0x824abe98 bc 12, lt, 0x824abeb8 # if r3 < 0 → failure cleanup (NOT taken; stub returned 0 = success)
|
||||
0x824abe9c ld r10, 168(r1) # load doubleword from [out_buf+8]
|
||||
0x824abea0 cmpi cr6, 1, r10, 0 # 64-bit cmp r10 == 0
|
||||
0x824abea4 bc 4, 4*cr6+eq, 0x824abeb0 # if NOT eq, skip the assignment
|
||||
0x824abea8 addis r3, r0, 0xC000 # r3 = 0xC0000000
|
||||
0x824abeac ori r3, r3, 0x34 # r3 = 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND)
|
||||
0x824abeb0 cmpi cr6, 0, r3, 0
|
||||
0x824abeb4 bc 4, 4*cr6+lt, 0x824abecc # if NOT lt → success path; r3 < 0 → NOT taken
|
||||
0x824abeb8 or r28, r3, r3 # save 0xC0000034
|
||||
0x824abebc lwz r3, 96(r1)
|
||||
0x824abec0 bl NtClose
|
||||
0x824abec4 or r3, r28, r28 # restore failure status
|
||||
0x824abec8 b 0x824abe34 # epilogue → return 0xC0000034
|
||||
```
|
||||
|
||||
The game expects the IOCTL response's upper 8 bytes to be non-zero. Our
|
||||
`NtDeviceIoControlFile` is registered as `stub_success` at
|
||||
`crates/xenia-kernel/src/exports.rs:90` — returns 0 (SUCCESS) but writes nothing
|
||||
into the OUT buffer. The fresh stack frame has zero at `[r1+168]`, so the check
|
||||
at `0x824abea4` falls through to the hardcoded failure assignment.
|
||||
|
||||
### Canary reference
|
||||
|
||||
`audit-runs/post-IO-002/canary.log` lines 1196-1209 show canary calls
|
||||
`NtDeviceIoControlFile(handle, ..., FsCtlCode=0x74004, ..., out_buf, out_len=16)`,
|
||||
gets a populated 16-byte response (whose upper 8 bytes are non-zero), then proceeds
|
||||
through 17× NtWriteFile zero-fill, NtClose, NtCreateFile (Cache0\), NtQueryVolumeInformationFile
|
||||
class=3, NtClose, and finally **`XexCheckExecutablePrivilege(0x0000000B)`** — the
|
||||
priv-11 site that has never fired in our run. Immediately followed by
|
||||
**`XamTaskSchedule(824A93C8, 828A28F0, ...)`** — the canary-only export hunt's
|
||||
gate-pivot call.
|
||||
|
||||
The IOCTL implementation in canary lives in `xenia-canary/src/xenia/vfs/devices/null_device.{h,cc}`
|
||||
(`NullDevice::IoControl`) — the device's `IoControl` writes the structured payload
|
||||
that the game-side check consumes.
|
||||
|
||||
### Next session: KRNBUG-IO-003
|
||||
|
||||
**Where:** `crates/xenia-kernel/src/exports.rs:90` — replace the
|
||||
`stub_success` registration with a real `nt_device_io_control_file`.
|
||||
|
||||
**Minimum viable fix:** for FsCtlCode=0x74004, write any non-zero u64 at
|
||||
`[out_buf+8]`. That alone clears the gate.
|
||||
|
||||
**Canary-faithful fix:** mirror `NullDevice::IoControl` for FsCtlCodes
|
||||
`0x70000` (8-byte response, consumed at `sub_824ABD88:0x824abe3c` for a
|
||||
log2/shift count) and `0x74004` (16-byte response, partition geometry).
|
||||
Fall through to `STATUS_NOT_IMPLEMENTED` for unrecognized codes so future
|
||||
divergences surface.
|
||||
|
||||
**Falsifiable cascade prediction:**
|
||||
- `XexCheckExecutablePrivilege` count: **1 → 2** (priv=0xA + priv=0xB).
|
||||
- `XamTaskSchedule` count: **0 → 1**.
|
||||
- canary-only export count: **7 → ≤ 3**.
|
||||
- Worker thread spawn at `ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` —
|
||||
the parked-handle 0x100c producer fires.
|
||||
- `swaps=2 draws=0` plateau persists (renderer is multi-causal).
|
||||
|
||||
**Failure modes to watch for:**
|
||||
- (α) Re-running `--branch-probe` should show a NEW exit branch in
|
||||
`sub_824A9710` (one of `0x824a996c`, `0x824a9998`, `0x824a9a18`) if a downstream
|
||||
helper has its own unimplemented dependency.
|
||||
- (β) sub_824ABA98's analogous failure path (called at 0x824a9950, 0x824a9990)
|
||||
may surface if its own kernel-call dependencies are stubs.
|
||||
- (γ) `nt_write_file` against the synth empty-file Cache0 path needs to handle
|
||||
the 17× zero-fill loop; if our implementation rejects writes to a zero-byte
|
||||
file, the cascade stalls just past the IOCTL fix.
|
||||
|
||||
### Files added / modified (instrumentation only)
|
||||
|
||||
- `crates/xenia-kernel/src/state.rs` — added `branch_probe_pcs: HashSet<u32>`
|
||||
field + `fire_branch_probe_if_match(hw_id)` method emitting a single compact
|
||||
`BRANCH-PROBE` line per fire (pc, tid, hw, cycle, r3, lr, cr0/cr6). Sister to
|
||||
`fire_ctor_probe_if_match`; no back-chain walk. ~40 LOC.
|
||||
- `crates/xenia-app/src/main.rs` — `--branch-probe` CLI flag (env var
|
||||
`XENIA_BRANCH_PROBE`), parser, and call in `worker_prologue`. ~30 LOC.
|
||||
|
||||
### Probe-machinery limitation
|
||||
|
||||
The probe fires only when the **block head** at the matched PC is dispatched —
|
||||
mid-block PCs in the request set don't trigger because the prologue runs once
|
||||
per block, not once per instruction. In this trace: function entries, failure
|
||||
landing pads (`0x824a97e0`), and external-call return PCs (`0x824a9a98`) all
|
||||
hit. Internal `bc` PCs (`0x824a9944`, `0x824a9958`, ...) were silent. The data
|
||||
captured was sufficient — the failure landing PC + LR pair uniquely identified
|
||||
the upstream branch — but if a future audit needs every-branch coverage, the
|
||||
helper call would need to move from `worker_prologue` into the per-instruction
|
||||
step loop (or a custom block-scan that flags branches matching the request
|
||||
list).
|
||||
|
||||
### Trace artifacts (re-runnable)
|
||||
|
||||
- `audit-runs/audit-007/sub_824A9710-trace.log` — 5 BRANCH-PROBE lines + thread diagnostics.
|
||||
- `audit-runs/audit-007/sub_824A9710-trace.err` — full kernel-call trace + counter dump.
|
||||
- `audit-runs/audit-007/lock_post_branchprobe.json`, `lock_post_branchprobe_run2.json` — lockstep digests.
|
||||
|
||||
Re-run command:
|
||||
```
|
||||
PROBE_LIST="0x824a9aa0,0x824a9128,0x824a9710,0x824a9778,0x824a9788,0x824a9790,0x824a97dc,0x824a97e0,0x824a9824,0x824a9828,0x824a9840,0x824a9850,0x824a985c,0x824a9870,0x824a9880,0x824a9888,0x824a9918,0x824a9944,0x824a9958,0x824a996c,0x824a9998,0x824a999c,0x824a99a0,0x824a99a8,0x824a9a10,0x824a9a18,0x824a9a60,0x824a9a78,0x824a9a98"
|
||||
./target/release/xenia-rs exec sylpheed.iso --halt-on-deadlock \
|
||||
--branch-probe="$PROBE_LIST" -n 500_000_000 \
|
||||
> audit-runs/audit-007/sub_824A9710-trace.log \
|
||||
2> audit-runs/audit-007/sub_824A9710-trace.err
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user