Replace the no-op cookie-returner with a real impl per canary xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc:216-227 (XObject::GetNativeObject<XThread>()->Resume()). Mirrors nt_resume_thread plumbing two functions below: resolve_pseudo_handle -> scheduler.find_by_handle -> resume_ref. Returns STATUS_SUCCESS if the KTHREAD-pointer-as-handle resolves, STATUS_INVALID_HANDLE otherwise — matches canary's Resume()/!thread return semantics. Cascade-prediction scorecard (audit-018 -> post-fix): - A PASS: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended -> run prologue -> park on audio buffer-completion semaphores 0x828A3254 / 0x828A3230. - B PARTIAL FAIL: NtSetEvent 667->3334; KeReleaseSemaphore=0; XAudioSubmitRenderDriverFrame=0. - C FAIL (predicted 2->1, actual 2->2): both ExTerminateThread + KeReleaseSemaphore still canary-only. - D FAIL: gamma-cluster blocker unchanged — pc-probe at 0x82184318/0x82184374 no fires; dump-addr 0x828F4070 no DUMP; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. Necessary-but-not-sufficient: workers unsuspend but park on a downstream gate that's part of the audit-009/-016/-017 gamma cluster. Tests 600 -> 601 (+ke_resume_thread_unblocks_suspended_worker). Lockstep instructions=100000003 imports=987516 deterministic x2. Goldens re-baselined: sylpheed_n50m.json instructions 50000003->50000011, imports 407255->407247. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
20 KiB
Canary-Only Export Fix Queue (audit-006)
-
Status: POST-KE-001 (2026-05-06): 2 canary-only (XamUserReadProfileSettings DROPPED post-XamUserGetSigninState landing earlier; KE-001 unsuspended audio workers but KeReleaseSemaphore producer is downstream-gated and did NOT fire).
KeResumeThreadis now a real impl per canaryxboxkrnl_threading.cc:216-227(KRNBUG-KE-001, branchke-resume-thread/p0-canary-mirror). Cascade A passed: tids 9 (entry=0x824D2878) and 10 (entry=0x824D2940) leave Suspended → run prologue → park onWaitAnyfor audio buffer-completion semaphores0x828A3254/0x828A3230. Cascade B partial:NtSetEvent 667→3334(5×) butKeReleaseSemaphore=0andXAudioSubmitRenderDriverFrame=0— workers stuck before the producer. Cascade C predicted 2→1, actual 2→2 (ExTerminateThread,KeReleaseSemaphoreboth still canary-only). Cascade D:--pc-probe=0x82184318,0x82184374armed — neither fires;--dump-addr=0x828F4070no DUMP lines; γ-cluster blocker unchanged; signal_attempts on 0x1004/0x100c/0x1020/0x15e4 still 0. swaps=2 draws=0 plateau intact. Lockstepinstructions=100000003 imports=987516deterministic ×2. Goldens re-baselinedsylpheed_n50m.json instructions 50000003→50000011, imports 407255→407247. See KRNBUG-KE-001 inaudit-findings.md. -
Prior status (superseded by KE-001): POST-IO-004 (2026-05-06): 7 → 3 canary-only. Real
XamNotifyCreateListener+XNotifyGetNextlanded (KRNBUG-IO-004). Dispatch arm at0x822f1be8now fires;sub_82173DC8runs in a tight loop on tid=1; renderer-cluster L1 entries0x822c6870,0x824563e0,0x823ddb50are reached for the first time. 4 reclassified RE-FIRES (now reached):KeResetEvent,ObCreateSymbolicLink,XamTaskCloseHandle,XamTaskSchedule. Still canary-only:ExTerminateThread,KeReleaseSemaphore,XamUserReadProfileSettings— all REAL_BUT_UNREACHED at the new boot horizon. Worker count 18 → 20. signal_attempts on 0x15e0 = 1 (was 0). draws=0 still expected at this step. See KRNBUG-IO-004 inaudit-findings.mdandproject_xenia_rs_io_004_xnotify_listener_2026_05_06.md. -
Prior status (superseded by IO-004): AUDIT-009 (2026-05-05): GATE IS HIGHER THAN THE CLUSTER ITSELF. AUDIT-008's β-hypothesis (gate sits among the 5 callers of
sub_821800D8in 0x82287000-0x82292FFF) is falsified: a 21-PC--branch-probe(the 6 parents + 5 shims + dispatcher + 9 audit-005 producer-callsites) shows 0/21 firings at -n 500M (audit-runs/audit-009/probe-500m.err). The whole 0x82287000-0x82294000 cluster is unreached. Static analysis: the cluster's level-1 root functions (sub_82293448,sub_822919C8) have zero non-call xrefs in sylpheed.db — they are reached only via vtable / function-pointer that's never written. Main parks atsub_822F1AA8frame-poll loop forever (1.49M XNotifyGetNext iterations). Three canary-only exports (ExTerminateThread,KeReleaseSemaphore,XamUserReadProfileSettings) remain REAL_BUT_UNREACHED — same as audit-008. DO NOT pull from this queue. Next-session probe set: cluster L1 roots + new thread entry trampolines (0x822c6870 / 0x824563e0 / 0x823dde30 / 0x823ddb50) + main's frame-poll callees + main's post-poll continuation list. See KRNBUG-AUDIT-009 inaudit-findings.mdandproject_xenia_rs_audit_009_renderer_unreached_2026_05_05.md. -
Prior status (superseded by AUDIT-009): AUDIT-008 MODEL RESET (2026-05-05). 0x100c worker IS spawned post-IO-003 as tid=3 (ctx=0x828F3D08), 0x1004 as tid=11, 0x15e0 as tid=17. AUDIT-008 hypothesized the gate among the 5 non-create-chain callers of
sub_821800D8whose parents live in 0x82287000-0x82292FFF. AUDIT-009 falsified that — those parents are themselves never entered, so the gate is one level above. -
Prior status (superseded by AUDIT-008): PARTIAL CASCADE (2026-05-04, post-KRNBUG-IO-003). 7 → 3 canary-only exports.
NtDeviceIoControlFilereal impl landed; the priv-11 query (XexCheckExecutablePrivilege(0xB)) andXamTaskSchedulenow fire. Reclassified (now firing on our side):KeResetEvent,ObCreateSymbolicLink,XamTaskCloseHandle,XamTaskSchedule. Bonus pickups:XeCryptSha,XeKeysConsolePrivateKeySign(both 0→1 — were not on the canary-only list because they were already inours_exportsbut unreached). Still canary-only:ExTerminateThread,KeReleaseSemaphore,XamUserReadProfileSettings.Worker thread spawn count unchanged at 19; handle 0x100c remains UNCREATED.(audit-008: 0x100c worker IS spawned, claim was wrong.) See KRNBUG-IO-003 inaudit-findings.mdandproject_xenia_rs_io_003_ioctl_2026_05_04.md. -
Prior status (now superseded): SUPERSEDED by AUDIT-007 (2026-05-04). Real gate identified:
NtDeviceIoControlFile(FsCtlCode=0x74004) isstub_successatcrates/xenia-kernel/src/exports.rs:90. Game-sidesub_824ABD88:0x824abea8-acreads[out_buf+8]of the IOCTL response, finds zero (stub doesn't write OUT), assigns hardcoded0xC0000034(STATUS_OBJECT_NAME_NOT_FOUND); callersub_824A9710exits at0x824a9944before priv-11. Tier 4 entries remain parked, classification unchanged (still REAL_BUT_UNREACHED), awaiting KRNBUG-IO-003. Seeproject_xenia_rs_audit_007_branch_probe_2026_05_04.mdfor the runtime trace + decisive proof. -
Prior status: PARTIAL — KRNBUG-IO-002 landed, but predicted cascade did NOT fire (7 → 7). Tier 0 marked superseded; Tier 4 entries STILL parked. Re-audit needed to find the real upstream gate.
-
Pre-state: master HEAD
556a8c3, exports diff captured 2026-05-04 -
Post-IO-002 state: branch
xboxkrnl-vol-allocunit/p0-65536-cluster, fresh 500 M trace ataudit-runs/post-IO-002/. Canary-only kernel exports remain identical:{ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings}. -
Inputs:
canary.log(348720 B, identical to audit-005 oracle, canary build9467c77f0)ours.log(692 MB, 5.6 M trace lines, run at 17:20–17:21 today, post-IO-001)
-
Tooling:
diff.py+ plaincomm -23set-difference on extracted call names
Headline finding
7/7 canary-only entries classify as REAL_BUT_UNREACHED or STUB_BUT_UNREACHED. Per the audit-006 spec stop condition ("if two-thirds of entries are REAL_BUT_UNREACHED, the problem isn't stubs — it's an upstream gate"), the next session should NOT pull a Tier-1 entry from this queue. Instead, it should fix the gate.
The gate is KRNBUG-IO-002: our nt_query_volume_information_file
class-3 (FileFsSizeInformation) returns alloc_unit = 1 × 2048 = 2048,
but Sylpheed's main(1, 0x10000, 0xFF000) expects alloc_unit = 65536
(see project_xenia_rs_io_nullfile_2026_05_04.md).
Sylpheed's verifier sub_824ABA98 rejects 2048, propagates failure to
sub_824A9710, which exits early before its XexCheckExecutablePrivilege(0xB)
call site. Canary fires the priv-11 query and the entire downstream
cluster (XamTaskSchedule → Cache0 callback thread → 0x100c worker spawn
→ display-init pump → profile-settings cascade); we fire none of it.
Direct evidence (telemetry):
- Our
XexCheckExecutablePrivilegecount = 1 (priv=0xA only). Canary count = 2 (priv=0xA + priv=0xB). - All 7 canary-only entries have ours-side count = 0 at -n 500M.
- Our trace ends with main thread (hw=0) parked on
XNotifyGetNext + NtWaitForSingleObjectEx(0x10f4, lr=0x824ac578)and hw=1 parked onNtWaitForMultipleObjectsEx(lr=0x824ab214) + cs=0x828f3e70— classic post-cache-recreate spin. - The 44
NtWriteFilecalls in ours.log (cache zero-fill) are followed by more NtClose / NtCreateFile cycles, butXexCheckExecutablePrivilege(0xB)never fires → priv-11 site insub_824A9710is unreached. - Memory's predicted
0xC000014Fdoes not yet appear in ours.log; first cache-related error is0xC0000034(OBJECT_NAME_NOT_FOUND) fromlr=0x824a97e4. This still fits the gate hypothesis: the recreate path is reached, completes its writes, re-opens, queries volume info, and the game-side verifier rejects our reply silently (no kernel error).
Tier 0 — upstream gate (SUPERSEDED 2026-05-04 — fix landed but cascade did NOT fire)
KRNBUG-IO-002 — nt_query_volume_information_file block size — LANDED, gate hypothesis FALSIFIED
Outcome: the block-size literals at exports.rs:1255-1256 were corrected
to canary's NullDevice values (sectors_per_unit=0x80, bytes_per_sector=0x200,
product 0x10000). 591 → 592 tests, lockstep instructions=100000010, swaps=2, draws=0 deterministic across two reruns (audit-runs/post-IO-002/lock_n100m_run{1,2}.json).
sylpheed_n50m oracle still matches its existing golden (no observable change at -n 50M).
However, the predicted cascade DID NOT fire. Set-difference on a fresh
500 M trace (audit-runs/post-IO-002/ours.log) produces the identical
seven-entry canary-only set audit-006 captured pre-fix:
ExTerminateThread, KeReleaseSemaphore, KeResetEvent,
ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule,
XamUserReadProfileSettings
XexCheckExecutablePrivilege count remains 1 (priv=0xA only, priv=0xB
unreached). XamTaskSchedule count remains 0. Worker thread spawns
fell from 19 → 18 (within noise — single thread variance per call-site
breakdown: lr=0x824ac5f0×15 + 0x824cd984×1 + 0x824d2e68×2). The 16
NtQueryVolumeInformationFile call sites in ours.log all originate from
a single LR 0x82611f38 — meaning the audit-006 premise that
sub_824ABA98/sub_824A9710 consume the volume-info reply at the
priv-11 gate may be incorrect, or the gate consumes a different
information class entirely.
Stop-condition triggered. Per the IO-002 task brief, this session does not pivot to a second fix. The fix is correct (it makes our reply byte-identical to canary's NullDevice and survives every test we have); it is just not load-bearing for the priv-11 gate. The branch landed as a strict no-op at our current boot horizon — kept because it's correct and unblocks no regression.
Next-session next gate hypothesis (untested):
- The audit-005 disasm of
sub_824ABA98may have mis-attributed the consumer of bytes_per_sector. The IO-001 trace decisively located the failure at theNtReadFileinsidesub_824A9710, not at any volume-info site. Re-read thesub_824A9710disasm with that in mind. - Volume-info LR
0x82611f38is far downstream of the priv-10/priv-11 cluster (the calls complete successfully — they don't gate anything visible). The actual gate may bent_query_information_file,nt_query_full_attributes_file, an FsCtl IOCTL, or a different alloc-unit query path. - Per AUDIT-005 instrumentation, the priv-11 site at
sub_824A9710PC cluster has never fired in any session. Probesub_824A9710entry with--pc-probeand trace which conditional exits the function before the priv-11 query — that's the real gate.
KRNBUG-IO-002 — nt_query_volume_information_file block size (original spec, kept for archaeology)
- Where in our code:
crates/xenia-kernel/src/exports.rs:1241-1269(functionnt_query_volume_information_file). - Classification:
REAL_BUT_BUGGY. Registered at exports.rs:100, called 16× in ours.log (16× in canary.log too — call counts match), returnsSTATUS_SUCCESS, but the FileFsSizeInformation payload is wrong. - Bug: class=3 branch writes
(total=0x100000, free=0, sectors_per_unit=1, bytes_per_sector=2048). Product = 2048 bytes per cluster. - Canary reference:
- Entry function
NtQueryVolumeInformationFile_entryatxenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:323(caseXFileFsSizeInformationat lines 355–365). Canary delegates to per-device methods onfile->device(). NullDevice(the device backing\Device\Harddisk0\Cache0) returnssectors_per_allocation_unit() = 0x80andbytes_per_sector() = 0x200atxenia-canary/src/xenia/vfs/devices/null_device.h:38-46. Product = 65536, matching Sylpheed's expectation.- Other device backings (HostPath, DiscImage, DiscZArchive) all return
(1, 0x200)= 512. Sylpheed's first volume query at this site is against Cache0 (NullDevice), so the relevant value is 65536.
- Entry function
- Fix sketch (minimum): in the class=3 branch, change the two writes to
mem.write_u32(info+16, 128); mem.write_u32(info+20, 512);(and reduce TotalAllocationUnits accordingly so the disk size remains plausible — e.g. 0x10 units × 128 sectors × 512 bytes ≈ 1 MB, matching NullDevice). Total diff ≤ 4 lines. - Fix sketch (proper, deferred until the cluster fires reliably): introduce a per-handle device-info lookup so HostPath / DiscImage paths return their canary-correct values too. Skipped for now because Sylpheed only queries Cache0 at this gate.
- Expected observable post-fix:
XexCheckExecutablePrivilegecount: 1 → 2.XamTaskSchedulecount: 0 → 1 (callback=0x824A93C8, message=0x828A28F0).kernel.calls{XamTaskSchedule}finally non-zero — closes the APUBUG-PRODUCER-001 / XAMBUG-PRODUCER-001 producer hunt that falsified XAudio + XamTaskSchedule producer hypotheses.- Spawn of the Cache0 callback thread (XThread::Execute thid 7 in canary, our equivalent to come).
- Inside that thread:
StfsCreateDevice(still undefined extern in canary too — does not block) +ObCreateSymbolicLink+ExRegisterTitleTerminateNotification. - Back on main:
KeResetEvent(0x8287094C),NtCreateEvent,ExCreateThread(entry=0x82181830, ctx=0x828F3D08)— and0x82181830is the worker entry for dispatcher 0x100c, one of the four parked-handle producers (perproject_xenia_rs_producer_stack_trace_2026_05_03.md). Spawning that worker should advance handle 0x100c'ssignal_attemptscounter off zero. - Eventually (further into the boot):
XamUserGetXUID,XamUserReadProfileSettings,XamContentCreateEnumerator,KeReleaseSemaphoredisplay-pump (268+ calls in canary at this horizon).
- Risk: low. Two-line value change. NullDevice is the only device Sylpheed asks about at this gate; other devices are not yet hit.
- Effort: trivial.
- Dependencies: none. Land directly.
- Verification chain:
cargo test -p xenia-kernel, thencargo run --release -p xenia-app -- exec sylpheed.iso -n 500_000_000with kernel-call tracing on, then re-run audit-006's set-difference; expect canary-only count to drop from 7 toward 0 as the cluster fires.
Tier 4 — REAL_BUT_UNREACHED / STUB_BUT_UNREACHED — do not fix yet
These are downstream of Tier 0. Reachability is blocked on KRNBUG-IO-002 landing. After IO-002 lands, re-derive this list — most entries should have moved off, and any survivors will be classifiable on real evidence.
| # | Export | Ordinal | Library | Our state | Canary impl | Canary calls (at horizon) | Cascade rank |
|---|---|---|---|---|---|---|---|
| 1 | XamTaskSchedule |
0x01AF | xam | REAL_BUT_UNREACHED (xam_task_schedule, xam.rs:213) |
xam_task.cc:43-80 |
1 (gate-pivot call) | upstream-of-cluster — fires the entire post-IO-002 cascade |
| 2 | XamTaskCloseHandle |
0x01B1 | xam | STUB_BUT_UNREACHED (stub_success, xam.rs:33) |
xam_task.cc:83-93 (one-liner: NtClose + last-error) |
1 | low (cleanup after #1) |
| 3 | KeResetEvent |
0x8F | xboxkrnl | REAL_BUT_UNREACHED (ke_reset_event, exports.rs:3172) |
xboxkrnl_threading.cc:566 |
1 | medium — clears 0x8287094C right before ExCreateThread(0x82181830) on main |
| 4 | ObCreateSymbolicLink |
0x0103 | xboxkrnl | STUB_BUT_UNREACHED (stub_success, exports.rs:121) |
xboxkrnl_ob.cc:351 |
1 | low — Cache0-symlink registration; cosmetic for Sylpheed boot |
| 5 | KeReleaseSemaphore |
0x88 | xboxkrnl | REAL_BUT_UNREACHED (ke_release_semaphore, exports.rs:3280) |
xboxkrnl_threading.cc:724 |
268 | high (in volume) — display-init pump on the post-cluster main loop |
| 6 | ExTerminateThread |
0x19 | xboxkrnl | REAL_BUT_UNREACHED (ex_terminate_thread, exports.rs:312) |
xboxkrnl_threading.cc:173 |
2 | low — thread cleanup on Cache0 / profile threads |
| 7 | XamUserReadProfileSettings |
0x0219 | xam | REAL_BUT_UNREACHED (xam_user_read_profile_settings, xam.rs:327) |
xam_user.cc:329 |
2 | medium — gates the XamUserGetXUID → profile load flow far downstream |
Why every entry above is Tier 4 (not Tier 1):
- Each entry's first call in
canary.logfalls after line 1210 (XamTaskSchedule(824A93C8, ...)), which is the gate-pivot call. - Our trace contains zero of any of the seven, despite running 500 M instructions and reaching the post-cache-recreate horizon.
- Six of the seven are already real implementations. The two stubs
(
XamTaskCloseHandle,ObCreateSymbolicLink) are minor cleanups; even upgrading them would not move boot progress until #1 (XamTaskSchedule) fires. - Therefore: fixing any of these in isolation is wasted effort. They should be re-classified after KRNBUG-IO-002 lands and the priv-11 / Cache0 callback chain runs.
Tier 1 / 2 / 3 — empty for this audit
No entry qualifies as Tier 1 or Tier 2 in the current state. The single
high-cascade fix worth pulling next is the Tier-0 gate (KRNBUG-IO-002),
which is not itself a canary-only export — it's a wrong-value bug in
an export both sides call, so the diff.py based set-difference doesn't
surface it. That is exactly why audit-006 was scoped this way: to confirm
the gate hypothesis from project_xenia_rs_io_nullfile_2026_05_04.md
before another implementation session is started.
Cross-check vs IO-001 snapshot
IO-001 memory recorded these 7 still-canary-only exports:
ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings.
Audit-006 set-difference produces the identical 7, in 1:1 correspondence. No new canary-only export has appeared since IO-001 landed; no entry has moved off. Cascade is still parked at the same gate.
The XeCryptSha, XeKeysConsolePrivateKeySign, and NtDeviceIoControlFile
entries that IO-001 was credited with unblocking are confirmed: ours
calls them 1, 1, 2 times respectively (canary calls them 1, 1, 2 — exact
match). They are correctly off the canary-only list.
Methodology notes
- "Cascade rank" definition: estimated by where the export's first canary call falls in the boot sequence and how many downstream code paths depend on it. "high" = upstream-of-cluster (XamTaskSchedule). "medium" = intermediate (KeResetEvent, profile cascade). "low" = leaf cleanup or cosmetic (XamTaskCloseHandle, ObCreateSymbolicLink). Rank only matters once Tier 0 is landed; until then everything is parked.
- Reachability oracle: binary
grep -c "call=NAME"against ours.log at -n 500M. Zero counts are conclusive for "unreached" because tracing is unconditional. - Canary log freshness: the log is from 17:34 (3 h before this
audit) but is byte-identical to audit-005's input — canary's behavior
is deterministic given the same ROM and the canary build header
(
canary_experimental@9467c77f0 on May 2 2026) hasn't changed. Re-running through Lutris is unnecessary. - Gate confirmation: memory predicted block-size mismatch as the
IO-002 blocker; this audit confirmed it by eliminating the alternative
(no Tier-1-eligible canary-only export exists in the current 7-entry
list). The 0xC000014F status memory predicted is not yet visible in
ours.log because the recreate path completes the writes — the
verifier inside
sub_824ABA98rejects the volume-info reply at the game level (no kernel error logged). - What this queue is not: a list of fixes to land. The audit-006 discipline was scoping; the discipline of subsequent sessions is to re-run audit-006's diff after IO-002, then either close audit-006 (if the cluster fires through and all 7 entries drop) or open audit-007 on whatever new canary-only set surfaces.
Recommended next session
KRNBUG-IO-002 (block-size fix), one-shot. Two-line edit at
crates/xenia-kernel/src/exports.rs:1255-1256. Verify the cluster fires
by re-running audit-006's set-difference; expect 7 → 0 (or close to 0)
canary-only entries. If new entries surface in either direction, that's
audit-007's input.
Do not open this queue's Tier 4 entries before IO-002 closes. Their classification is pending; their fix sketches will look very different once they're observably called and their actual return values can be compared to canary.