fix(kernel): KRNBUG-KE-001 — real KeResumeThread per canary mirror

Replace the no-op cookie-returner with a real impl per canary
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227
(XObject::GetNativeObject<XThread>()->Resume()). Mirrors
nt_resume_thread plumbing two functions below:
resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref.

Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves,
STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread
return semantics.

Cascade-prediction scorecard (audit-018 -> post-fix):
- A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940)
  leave Suspended -> run prologue -> park on audio buffer-completion
  semaphores 0x828A3254 / 0x828A3230.
- B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0;
  XAudioSubmitRenderDriverFrame=0.
- C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread +
  KeReleaseSemaphore still canary-only.
- D FAIL: gamma-cluster blocker unchanged — pc-probe at
  0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP;
  signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0.

Necessary-but-not-sufficient: workers unsuspend but park on a
downstream gate that's part of the audit-009/-016/-017 gamma cluster.

Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker).
Lockstep instructions=100000003 imports=987516 deterministic x2.
Goldens re-baselined: sylpheed_n50m.json instructions
50000003->50000011, imports 407255->407247.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-06 20:46:46 +02:00
parent 7ed6192b7b
commit 76dfe7fd7a
4 changed files with 311 additions and 8 deletions

View File

@@ -5819,3 +5819,253 @@ Synth-stub auto-enqueued `(0x0A, 1)` on the first `XNotifyGetNext` after listene
### Trace artifacts
`audit-runs/audit-013-io-004-phase1.5/dispatch.{log,err}` (no-fire baseline at non-block PCs), `dispatch2.log` (block-entry probes — 1 fire on dispatch arm), `dispatch3.log` (full dispatch chain confirmed), `post-cascade.{log,err}` (focus + canary export delta + cascade probes).
## KRNBUG-AUDIT-014 — 0x15e0 wake-eligibility hypothesis FALSIFIED; tid=17 actually parks on 0x15e4 (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Phase 1 finding (decisive)
Goal was to investigate why handle 0x15e0 records `signal_attempts=1 (primary=1)` post-IO-004 BUT tid=17 (the "0x15e0 worker") still parks. **The premise is wrong.**
Trace at `-n 500M --trace-handles-focus=0x15e0` shows:
1. **Handle 0x15e0 is a Semaphore**, not an Event/Manual. Created from `lr=0x824ab110` (NtCreateSemaphore) on tid=1, with creator-frame chain `lr=0x82456a94 → 0x82456bac → 0x822f1b60 → 0x8216ee14 → 0x824ab8e0`. This is a **different** wrapper than the Event creator chain `lr=0x824a9f6c` shared by 0x1004 / 0x100c / 0x1020 / 0x15e4.
2. **0x15e0 is healthy**: `signal_attempts=1 (primary=1) waits=1 wakes=1`. End-of-run DIAGNOSIS reports "not stuck — signals consumed correctly". Timeline: tid=1 waited at `lr=0x824ac578`, then tid=16 `NtReleaseSemaphore` at `lr=0x824ab168` woke it. Handshake completed.
3. **tid=17 parks on 0x15e4**, NOT 0x15e0. State at end-of-run: `Blocked(WaitAny { handles: [5604] })` where `5604 == 0x15e4`. Worker entry context `r12=0x8217057c` (front of `sub_82170430`) matches the audit-009 / audit-008 / audit-002 stage-3 attribution of tid=17 to the 0x82170430 worker cluster.
4. **0x15e4 is the actual stuck handle**: `kind=Event/Manual waiters=1 signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=1 at `lr=0x824a9f6c` (same wrapper as 0x1004 / 0x100c / 0x1020). This is the same producer-missing class as the other Event/Manual handles tracked across audit-001 → IO-004.
The IO-004 cascade-prediction scorecard's claim "(e) signal_attempts on parked handles: 0x15e0 = 1 (primary=1, ghost=0)" was technically correct (the semaphore did get one signal) but the inference that this represented forward progress for tid=17's wake was a misattribution. The label "0x15e0 worker" used in audit-009 / audit-002 / audit-008 stage-3 mappings is a long-standing transcription error: the actual handle is 0x15e4 (Event/Manual), and 0x15e0 is an unrelated Semaphore. Reference: `project_xenia_rs_producer_stack_trace_2026_05_03.md` already noted "third handle is **0x15e0**, not 0x15e4 (transcription typo)" — that correction itself was reversed; the original audit-002 label 0x15e4 was correct.
### Bug class evaluation (α-ζ from prompt)
- α (PKEVENT vs handle mismatch): N/A — no Set call ever targets 0x15e4; the producer is genuinely missing.
- β (refresh_pkevent_shadow_from_guest miss): N/A — same.
- γ (wake-eligibility filter wrong): N/A — wake_eligible_waiters fires correctly elsewhere (0x10F0 handshake demonstrates healthy manual-reset wake; 0x15e0 demonstrates healthy semaphore wake).
- δ (memory ordering): N/A — no producer side observed.
- ε (race scheduler.resume vs signal): N/A.
- ζ (audit recorded but not propagated): N/A — DIAGNOSIS print-out matches state.objects waiter list.
**Conclusion**: 0x15e4 belongs to the same "producer never reaches the Set call" class as 0x1004 / 0x100c / 0x1020. Renderer cluster work (audit-008 / audit-009) and AUDIT-014's parallel Fork B probing of newly-reached L1 entries (`sub_82173DC8`, `0x822c6870`, `0x824563e0`, `0x823ddb50`) is the correct line of attack — there is no wake-eligibility bug to fix.
### Discipline gate
- Box 1 (named bug class with concrete evidence): FAIL — premise refuted, no bug class applies.
- Box 2 (narrow fix ~30-80 LOC): N/A.
- Box 3 (sharp 4-dim cascade prediction): N/A.
- Box 4 (no renderer/GPU changes): N/A.
- Box 5 (lockstep determinism preserved): N/A.
Stop conditions met: hand back as Phase 1 only.
### Cascade snapshot (unchanged from IO-004 baseline)
- swaps=2 (`VdSwap` kernel-direct frames 1 + 2)
- draws=0
- 18 → 20 worker threads (consistent with IO-004)
- Canary-only exports: ExTerminateThread, KeReleaseSemaphore, XamUserReadProfileSettings still missing.
### Recommended next session
Track Fork B's branch-probe results for `sub_82173DC8` (the first L1 entry in the renderer cluster reached after IO-004). The producer for handles 0x1004 / 0x100c / 0x1020 / 0x15e4 lives somewhere along the dispatch arm at `0x822f1be8 → 0x82175338 → 0x82173dc8 → ...`. If Fork B identifies a sub-function that gates the Set call (e.g. `sub_82173DC8` returns early on a stub kernel call), that becomes KRNBUG-AUDIT-015 / next IO-NNN candidate.
The misattribution label "0x15e0 worker" should be corrected to "0x15e4 worker" in the index entries for AUDIT-002, AUDIT-008, AUDIT-009 — left for the next session to update if relevant.
### Trace artifacts
`audit-runs/audit-014-0x15e0-wake/probe.log` (focus dump + 19-thread diagnostic), `probe.err` (kernel.calls counters confirming swaps=2 unchanged).
## KRNBUG-AUDIT-015 — L1 propagation probe; next gate is silph::Semaphore on handle 0x1308 (workitem submitter unreached) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic, Fork B parallel session. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set (112 PCs)
sub_82173DC8 dispatcher case-arms (25), worker 0x822c6878 body (12), worker sub_824563E0 body (17), worker sub_823DDB50 body (11), L1 callees (26), audit-009 unfired baseline (21).
### Decisive findings
1. **sub_82173DC8 dispatches all 4 IO-004 startup notifications then idles.** Every fire takes the early-exit at `0x82173ed8` because `[r31+44] == 0` (callback-table pointer in the listener struct never populated). The post-merge dispatch helper `0x82174040` (which would call the renderer producers `sub_822C2A80`, `sub_8216F088`, etc.) is never invoked from the dispatcher path.
2. **Worker 0x822c6870 (= 0x822c6878 thunk; tids 14, 15) parks immediately on Semaphore handle 0x1308.** The semaphore is `Semaphore(0/INT_MAX) signals=0 waits=2 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`, created by tid=13 inside `sub_822C66B4` (worker-pool initializer in `sub_822C6630`). Producer chain that releases it: `sub_822AE1F0 / sub_822F55F0 → sub_822C8B50 → sub_822C6808 → bl 0x824AB158 (silph::Semaphore::Release at NtReleaseSemaphore)`. Neither `sub_822AE1F0` nor `sub_822F55F0` was probed; both are statically reachable from main but unexercised at -n 500M — they're the renderer's frame-update / scene-graph-mutate path that never runs.
3. **Worker sub_824563E0 (tid=16) is healthy** — runs an XAM inactivity / timer poll loop (NtSetTimerEx handle 0x15d0, period=2; loops `XamEnableInactivityProcessing ↔ CS+bcctrl dispatch` 865k times). Not the gate.
4. **Worker sub_823DDB50 (tid=19) parks at entry** with body PCs unfired; final state `Blocked(WaitAny { handles: [0x160C, 0x01000000] })`. Handle 0x160C is `Event/Auto signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>`. The wait callsite is unprobed (likely an early branch before 0x823ddb68); needs follow-up probe inside `sub_823DD838` (parent).
5. All 21 audit-009 PCs (renderer cluster `0x82287xxx-0x82294xxx` + audit-005 producer-callsites) remain UNFIRED, consistent with audit-009 baseline — they sit downstream of the unreached workitem-submitter chain.
### Bug class
**δ (pure-guest renderer state-read)**, NOT a kernel-boundary stub. There is no missing `xboxkrnl`/`xam` import at the gate; main fails to advance past a state predicate that gates `sub_822AE1F0` / `sub_822F55F0` invocation.
### Discipline gate
- Box 1 (named import α / narrow internal-sub bug): **NO** — δ-class, no kernel boundary.
- Box 2 (canary impl small): N/A.
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs dump-addr triage of listener struct first.
- Box 4 (no new ABI plumbing): N/A.
- Box 5 (lockstep determinism preserved): N/A.
Boxes 1 + 3 fail. Hand back per stop condition 1.
### Recommended next session
Phase 1: probe `sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808` entries + `sub_82174040` post-merge dispatch helper (the 6 fall-through arms inside sub_82173DC8). Add `--dump-addr=0x40ba9a80` to capture the listener-struct fields each dispatcher fire. The struct's `[+44]` field is the gate predicate; once we know what populates it, the actual fix point becomes nameable.
### Trace artifacts
`audit-runs/audit-015-l1-propagation/probe.log` (493 MB; 5.05M BRANCH-PROBE lines), `probe.err` (188 KB), `pc-fire-counts.txt` (28 fired PCs sorted).
## KRNBUG-AUDIT-016 — submitter-caller probe; gate is γ (deeper-indirection / vtable registry not populated) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set
Run #1 (30 PCs): workitem-submitter chain entries + bl call-sites (`sub_822AE1F0`, `sub_822F55F0`, `sub_822C8B50`, `sub_822C6808`, `0x822B16E0`, `0x822F5728`), parents (`sub_822ADD70`, `sub_821A9920`, `sub_822ACAB8`, `sub_821A8578`), grandparents (`sub_82299250`, `sub_822A4460`, `sub_821A82A0`), dispatcher post-merge helper + early-exit. Run #2 (18 PCs): refined dispatcher arm coverage + `--dump-addr=0x40ba9a80,0x4024AC00,0x4024B3E0,0x40111890,0x4024A380`.
### Decisive findings
1. **0/16 submitter-chain PCs fire** including all 4 levels of caller walk-up. Both static caller chains bottom-out in the audit-009 unreached renderer cluster: A-side `sub_822AE1F0 ← sub_822ADD70 ← sub_822ACAB8 ← sub_82299250 / sub_822A4460 ← sub_8229AB50 ← sub_8229A700 ← sub_82294F30 (renderer cluster)`. B-side `sub_822F55F0 ← sub_821A9920 ← sub_821A8578 ← sub_821A82A0 ← (cycle with sub_821A9920) and ← sub_821ABEA8 ← sub_821AC700 ← sub_821A6470 (renderer cluster)`.
2. **Listener struct dump at `0x40ba9a80`**: `[+0x00]` vtable=0x40111890; `[+0x04]` dispatch state bits=**0 (NEVER set)**; `[+0x08]` counter=0; `[+0x0C]`=1000 (set by case 0xA); `[+0x2C]` callback-table A=**0x4024AC00 (POPULATED)**; `[+0x3C]` callback-table B=**0x4024B3E0 (POPULATED)**. **Audit-015's claim that `[r31+44]==0` was wrong** — `[+0x2C]` IS populated. The real gate is `[base+0x04]` (dispatch state bits) read by `sub_821737F0` (case-9 helper) bit 14 / bit 15.
3. **Dispatcher arm fires (run #2 confirmed)**: case-9 r5==0 path (`0x82173e6c`, 1 fire) → `sub_821737F0` returned 0 → early-exit; default-high arm (`0x82173f48`, 2 fires) → both early-exit at `0x82174030`. **Case 0xA's write `oris 0x1; stw [r31+4]` should set bit 16, but EOR dump shows `[+0x04]=0`** — either the case-0xA fire and dispatch-r3 don't always target `0x40ba9a80`, or the write is overwritten back to 0 by another path.
4. **0x4024AC00 (callback table A) contains real renderer config** including string `"game:\\dat\\GP_TITLE.pak+eng\\\0"` and pointers `0x401119A0 / 0x40111990` — confirming the listener IS subscribed to the renderer's profile loader, but its dispatch-state bits are never advanced.
5. **Probe-machinery anomaly**: `sub_82174040` entry-PC never fires across both runs, yet `sub_821737F0` fires once at cycle 9183539 with `lr=0x821741f4` — meaning `0x821741F0 (bl sub_821737F0 inside sub_82174040 +0x1B0)` was executed. Either `sub_82174040` was reached via a jump-into-mid-function (highly unusual) or the probe missed an entry fire. **Worth verifying in AUDIT-017** with isolated probe of `0x82174040, 0x82174044, 0x82174048`.
### Bug class
**γ (deeper indirection)** — refining audit-015's δ classification. The submitter chain bottom-outs in a vtable-dispatched renderer cluster registry that's never populated. Chicken-and-egg: listener can't advance state because workitem-submitter never fires; workitem-submitter never fires because the registry is never populated; the registry is populated by something the listener was supposed to drive. Only an external bootstrap can break it.
### Discipline gate
- Box 1 (named α-class import / narrow internal sub): **NO** — γ-class, no kernel boundary; gate is structural.
- Box 2 (canary impl small): N/A.
- Box 3 (sharp 4-dim cascade prediction): **NO** — needs further state-write triage.
- Box 4 (no new ABI plumbing): N/A.
- Box 5: N/A.
Boxes 1 + 3 fail. Hand back per stop condition 1.
### Recommended next session (AUDIT-017)
1. Probe dispatcher caller layer: `0x822f1be8`, `0x822f1c04`, `sub_822F1AA8` (main's frame-poll loop — where main parks per AUDIT-009), `sub_821752C0` (jumps to `sub_82173DC8`).
2. Find writers of `[0x40ba9a80+4]` — byte-scan `.text` for `addi r?, ?, 4; stw r?, 0(r?)` patterns OR probe ALL functions that touch r3+4 with a stw (potentially via offset-write tracking). Identify the function that's supposed to set bit 14 / bit 15 of that field.
3. Probe inside `sub_82181D48` (default-high arm's secondary predicate): the `rlwinm r11, r11, 0, 30, 30` at `0x82181D74` reads `[[r3+0]+60]` bit 30 — find what writes this bit. If we can make `sub_82181D48` return 1, the default-high arm's `bctrl` fires → renderer cascade.
4. Verify probe-machinery anomaly (entry of `sub_82174040`).
### Trace artifacts
`audit-runs/audit-016-submitter-callers/probe.log` (run #1, 9 KB), `probe.err` (187 KB), `probe2.log` (run #2, 12 KB; +4 dump-addrs), `probe2.err` (187 KB).
## KRNBUG-AUDIT-017 — bit-14/15 writer triage; gate is β (`[0x828F4070+64]==-1`) with α tail (`XamUserGetSigninState=stub_return_zero`) (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `d736a1d` unchanged.
### Probe set
Static scan: `oris rN, rN, 0x1` or `oris rN, rN, 0x2` followed within 8 instructions by `stw rN, 4(rY)`. 5 candidates flagged. Runtime confirmation via `--branch-probe` at -n 500M + `--dump-addr=0x40ba9a80,0x828F48B0,0x828F4070`.
### Decisive findings
1. **Static writer candidates** (5):
- `0x82173950` (sub_821737F0:bit-14, gated by `[r30+64]!=-1` AND XamUserGetSigninState ret-check)
- `0x82173e04` (sub_82173DC8 case-0xA:bit-15)
- `0x824d3ce8` (sub_824d3c78:bit-15, struct via `[parent+184]`)
- `0x824d3f24` (sub_824d3dc0:bit-14, struct via `[parent+184]`)
- `0x82769b84` (sub_82766db0:bit-15, struct stride 8 — false positive)
2. **Runtime: case-0xA fires once** at cycle 9183060 (PC 0x82173dfc), sets bit-15 of `[0x40ba9a80+4]`. Confirmed by EOR dump `[+0x0C]=0x000003E8` (case-0xA's subfic).
3. **sub_821737F0 work-path entered** at cycle 9183561 (lr=0x821737f8). Bit-15 cleared at 0x82173884. Bit-14 setter at 0x82173950 NEVER fires because at 0x821738E0, `cmpwi r3, -1; beq → 0x82173938` short-circuits (`r3=[r30+64]=0xFFFFFFFF`).
4. **r30 = `[0x828F48B0+0]` = `0x828F4070`** (singleton sub-object). EOR dump confirms `[0x828F4070+64]=0xFFFFFFFF`, initialized to -1 by `sub_821701c8` at 0x82170234. The only non-(-1) writer is `sub_82184318:0x82184374` (`bl 0x82456B58 (kernel handle creator); stw r3, 64(r30)`). Caller chain `sub_82184318 ← sub_82187768:0x821877bc ← sub_82187dd0:0x82187e78 ← sub_82183ca8:0x82183cd8 ← {sub_822919c8, sub_82186760, sub_821c88d0}`. **`sub_822919c8` is one of the audit-009 renderer-cluster L1 entry points that has zero non-call xrefs** — same γ-cluster blocked at audit-009/-016.
5. **bit-28 of `[0x828F4070+60]` IS set** at cycle 9224352 by `sub_821c4988:0x821c5450` — but 35,000 cycles AFTER case-9 fired. Also: bit-28 is a NEGATIVE gate at 0x821738F0 (`bne cr6, 0x82173938`) — bit-28 SET means NO bit-14. The positive gate is `[+64]!=-1`.
6. **Two orthogonal stubs uncovered (α tail)**:
- `XamUserGetSigninState` (xam.rs:48) is `stub_return_zero`. Even if β fixed, sub_821737F0's bit-14 deep-eval at 0x82173904-0x82173938 takes the no-bit-14 path in 2/3 sub-branches when ret==0. Also sub_822C2A80 at 0x822c2aac loops `XamUserGetSigninState(0..3)` searching for any signed-in user — broken. Canary `xam_user.cc:90-101` returns `SignedInLocally=1` for default profile.
### Bug class
**β-dominant + α-tail.** Primary β is structural — `[0x828F4070+64]==-1` because the ctor that fills it (`sub_82184318`) is in the same audit-009 renderer cluster that audit-016 also identified. Secondary α is XamUserGetSigninState=stub_return_zero (2 separate guest paths broken).
### Discipline gate
- Box 1: PARTIAL — α component named (XamUserGetSigninState) but not the dominant gate.
- Box 2: YES for α (5 LOC at `xam_user.cc:90-101`).
- Box 3: NO — β dominant, structural.
- Box 4-5: N/A.
Boxes 1+3 fail. Hand back per stop condition 1.
### Recommended next session (AUDIT-018)
- **Option A**: probe `sub_82184318, sub_82187768, sub_82187dd0, sub_82183ca8, sub_82186760, sub_821c88d0, sub_822919c8, sub_82456B58` at -n 500M to confirm the entire chain to `[singleton+64]` ctor is unreached. If all 8 fail to fire, this re-confirms γ-class structural blocker for the THIRD time (audit-009, -016, -017). Time to pivot strategy.
- **Option B**: canary-log diff during boot window 9.0M-9.3M cycles for any kernel call that writes a real handle to `0x828F4070+64`. Re-run `lutris lutris:rungameid/4` with kernel-call logging.
- **Option C** (cheap α): implement `XamUserGetSigninState` per canary (5 LOC). Will not fire cascade alone (β dominant) but is correct and unblocks sub_822C2A80.
- **Sharp 4-dim cascade prediction**: NEEDS FURTHER TRIAGE.
### Trace artifacts
`audit-runs/audit-017-state-bits-writer/probe{1..5}.log` + `.err` (probe.log: 13 lines, probe3.log: 133 lines incl. dumps, probe4.log: 7 lines, probe5.log: 3 lines).
---
### XamUserGetSigninState follow-up (post-AUDIT-017, master 7ed6192)
Landed inline as a small canary-mirror correctness fix. Branch `xam-user-signin-state/p0-canary-mirror`, no-ff merged.
- Impl returns `1` for user_index=0 (SignedInLocally), `0` otherwise. Mirrors canary `xam_user.cc:90-101`.
- Tests 599 → 600. Lockstep `instructions=100000012 → 100000006`, deterministic across 2 runs.
- **Cascade ripple**: `XamUserReadProfileSettings` now fires 2× (was canary-only). Per-AUDIT-017 prediction (α-tail correctness fix; β still dominant).
- Remaining canary-only kernel exports: `ExTerminateThread`, `KeReleaseSemaphore`. Down from 3 to 2.
- Renderer L1 reachability + parked-handle signal_attempts unchanged — β-class blocker `[0x828F4070+64]==-1` unmoved (audit-017's structural finding).
## KRNBUG-AUDIT-018 — canary-log diff identifies α-class stub `KeResumeThread` (DIAGNOSTIC 2026-05-06)
**Status**: read-only diagnostic. No fix landed. Master HEAD `7ed6192` unchanged. Tests 600. Lockstep `instructions=100000006`.
### Method
Set-diff of kernel-call function names: ours (`audit-runs/audit-018-canary-diff/ours.log`, -n 500M) vs canary (`/home/fabi/xenia_canary_windows/xenia.log`, full boot to active rendering with `XamInputGetCapabilities` polling).
### Decisive findings
1. Function-name diff: only 2 calls present in canary, absent in ours: `ExTerminateThread`, `KeReleaseSemaphore` — both already on the audit-006 canary-only export queue.
2. **`KeReleaseSemaphore(828A3230, 1, 1, 0)`** is hammered by canary tid `F800006C` repeatedly (audio-render ticker). That thread is created via `ExCreateThread(..., entry=0x824D2878, ctx=0, flags=0x10000001)` and immediately followed by `ObReferenceObjectByHandle / KeSetBasePriorityThread / KeResumeThread / ObDereferenceObject`. Same pattern for entry `0x824D2940`.
3. In our run, both these threads are `Blocked(Suspended)` at end-of-run. Counters `KeResumeThread = 2` and `NtResumeThread = 6` match canary's call pattern.
4. **Root cause**: `crates/xenia-kernel/src/exports.rs:3658-3664` — `ke_resume_thread` is a no-op cookie-returner that ignores r3 and sets r3=0. Comment claims "real `NtResumeThread` below handles the handle-based path properly", but `KeResumeThread` is a separate export that takes a KTHREAD pointer (which our `ObReferenceObjectByHandle` cookies as the handle itself per `exports.rs:3787-3807`). The fix is to mirror `nt_resume_thread`: `find_by_handle(handle).resume_ref(r)`.
5. Cross-reference: tid=17 (entry=0x82170430, ctx=0x828F4070, the audit-017 listener struct) IS spawned and parks on event handle 0x15E4 — same long-known parked dispatcher waiter. Worker body reads `[r29+56] (=[0x828F40A8])` as its loop predicate (clarification of audit-017's "+64" claim). Until tids 9/10 actually run, the audio-side cascade never starts.
### Bug class
**α (named import stub_success on a load-bearing export)**. `KeResumeThread` is registered (canary `kImplemented`) but our impl is a stub_success no-op that fails to actually unsuspend.
### Discipline gate
- Box 1 (named bug class with concrete evidence): YES.
- Box 2 (narrow fix ~5 LOC): YES.
- Box 3 (sharp 4-dim cascade prediction): YES (see memory file).
- Box 4 (no renderer/GPU changes): YES.
- Box 5 (lockstep determinism preserved): expected — same pattern as XamUserGetSigninState landing.
**All 5 boxes pass — first time since IO-004.**
### Sharp 4-dim cascade prediction
- **A (thread liveness)**: tids 9, 10 leave Suspended; XAudio voice-render workers run.
- **B (kernel counters)**: `KeReleaseSemaphore` non-zero for first time. `NtSetEvent` rises. Likely new `XAudioSubmitRenderDriverFrame`.
- **C (canary-only exports)**: 2→1 (`KeReleaseSemaphore` resolved). Possibly new audio-path exports.
- **D (listener `[+64]`)**: hypothesis-only — IF audit-017's β-class blocker is downstream of audio init, `[0x828F4070+64]` becomes non-(-1) and renderer cascade unblocks. If not, γ-cluster is independent → pivot to memory-watch instrumentation on `[+64]`.
### Recommended next session (KRNBUG-IO-005 or KRNBUG-α-005)
Implement 5-LOC fix on branch `ke-resume-thread/p0-canary-mirror`:
```rust
fn ke_resume_thread(ctx: &mut PpcContext, _mem: &GuestMemory, state: &mut KernelState) {
let handle = resolve_pseudo_handle(state, ctx.gpr[3] as u32);
let prev = state.scheduler.find_by_handle(handle).map(|r| state.scheduler.resume_ref(r)).unwrap_or(0);
ctx.gpr[3] = prev;
}
```
Lockstep ×2. Evaluate cascade. Tests 600→601 (add a `ke_resume_thread` unit test mirroring `nt_resume_thread`).
### Trace artifacts
- `audit-runs/audit-018-canary-diff/ours.log` (full kernel trace + final-state thread diagnostics)
- `audit-runs/audit-018-canary-diff/ours.stdout.log` (counters)
- Canary: `/home/fabi/xenia_canary_windows/xenia.log` (untouched)
## KRNBUG-KE-001 — Real `KeResumeThread` (LANDED 2026-05-06)
`crates/xenia-kernel/src/exports.rs:3658-3669` — replaced the no-op cookie-returner with a canary-mirror real impl per `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227` (`XObject::GetNativeObject<XThread>(...)->Resume()` → `STATUS_SUCCESS`, else `STATUS_INVALID_HANDLE`). Routes the KTHREAD-pointer-as-handle through `resolve_pseudo_handle` + `scheduler.find_by_handle` + `scheduler.resume_ref`, mirroring `nt_resume_thread`'s plumbing two functions below.
### Cascade-prediction scorecard (audit-018 → post-fix)
- **A — thread liveness (PASS)**: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) transition from `Blocked(Suspended)` → ran → now `Blocked(WaitAny)` on audio buffer-completion semaphores `0x828A3254` (handle 2190094932) / `0x828A3230` (handle 2190094896). Pre-fix they were Suspended at end-of-run; post-fix they execute their bodies and park on a downstream consumer wait.
- **B — counters (PARTIAL FAIL)**: `NtSetEvent 667→3334` (rises ~5×, audio frame-complete signaling). `KeResumeThread = 2` (now real). `NtResumeThread = 6`. **`KeReleaseSemaphore` still 0** (not in counters at all). **`XAudioSubmitRenderDriverFrame` still 0**. Workers ran prologue + parked on a downstream gate before reaching `KeReleaseSemaphore`.
- **C — canary-only delta (FAIL — predicted 2→1, actual 2→2)**: `ExTerminateThread` and `KeReleaseSemaphore` both still canary-only. The audio render-tick semaphore-release loop is gated by something downstream of the audio worker prologue.
- **D — γ-cluster blocker (FAIL)**: `--pc-probe=0x82184318,0x82184374` armed, neither fires. `--dump-addr=0x828F4070` armed, no DUMP lines emitted. Listener struct `[0x828F4070+64]` unchanged. `--trace-handles-focus` shows handles 0x1004/0x100c/0x1020/0x15e4 all still `signal_attempts=0`.
### Milestone status
- Renderer cluster cascade collapsed? **NO**.
- signal_attempts > 0 on parked handles? **NO**.
- `draws > 0`? **NO** (still 0; `swaps` still 2).
### Verification
- 600 → 601 tests (`cargo test --workspace --release` clean; new `ke_resume_thread_unblocks_suspended_worker` covers Suspended→Ready transition + INVALID_HANDLE branch).
- Lockstep determinism: `instructions=100000003 imports=987516` × 2 reruns identical.
- `swaps=2 draws=0` plateau intact.
- Goldens re-baselined: `sylpheed_n50m.json instructions 50000003→50000011, imports 407255→407247`. n2m unchanged. Oracle test passes.
### Bug class (post-fact)
α (load-bearing stub_success). The fix unsticks two threads but those threads then park on a downstream gate that's part of a separate bug class — the audio voice-render dispatch never reaches `KeReleaseSemaphore`/`XAudioSubmitRenderDriverFrame` because the consumer-side semaphore producer is itself gated by something else (likely the same γ-cluster that audit-009/-016/-017 narrowed: `[0x828F4070+64]==-1`).
### Recommended next session
Audit-019 — memory-watch instrumentation on `[0x828F4070+64]` (audit-017 Option B). With KE-001 landed, the discipline gate cleanly attributes the renderer plateau to the listener-struct field rather than to a stub upstream — narrows the search for the producer to whoever writes 64 bytes into the audit-017 dispatcher.
### Trace artifacts
- `audit-runs/post-ke-resume/lockstep_run{1,2}.json` (lockstep determinism)
- `audit-runs/post-ke-resume/run.{log,err}` (full 500M cascade verification)
- `audit-runs/post-ke-resume/probe.{log,err}` (γ-cluster pc-probe + dump-addr)
- `audit-runs/post-ke-resume/handles.{log,err}` (--trace-handles-focus)