audit-findings.md: full IO-004 entry with cascade-prediction scorecard. audit-runs/audit-006/canary_export_queue.md: post-IO-004 status note (7 -> 3 canary-only; 4 reclassified RE-FIRES). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
280 lines
19 KiB
Markdown
280 lines
19 KiB
Markdown
# Canary-Only Export Fix Queue (audit-006)
|
||
|
||
- Status: **POST-IO-004 (2026-05-06): 7 → 3 canary-only.** Real `XamNotifyCreateListener` + `XNotifyGetNext` landed (KRNBUG-IO-004). Dispatch arm at `0x822f1be8` now fires; `sub_82173DC8` runs in a tight loop on tid=1; renderer-cluster L1 entries `0x822c6870`, `0x824563e0`, `0x823ddb50` are reached for the first time. 4 reclassified RE-FIRES (now reached): `KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`. Still canary-only: `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings` — all REAL_BUT_UNREACHED at the new boot horizon. Worker count 18 → 20. signal_attempts on 0x15e0 = 1 (was 0). draws=0 still expected at this step. See KRNBUG-IO-004 in `audit-findings.md` and `project_xenia_rs_io_004_xnotify_listener_2026_05_06.md`.
|
||
|
||
- Prior status (superseded by IO-004): **AUDIT-009 (2026-05-05): GATE IS HIGHER THAN THE CLUSTER ITSELF.** AUDIT-008's β-hypothesis (gate sits among the 5 callers of `sub_821800D8` in 0x82287000-0x82292FFF) is **falsified**: a 21-PC `--branch-probe` (the 6 parents + 5 shims + dispatcher + 9 audit-005 producer-callsites) shows **0/21 firings** at -n 500M (`audit-runs/audit-009/probe-500m.err`). The whole 0x82287000-0x82294000 cluster is unreached. Static analysis: the cluster's level-1 root functions (`sub_82293448`, `sub_822919C8`) have **zero non-call xrefs in sylpheed.db** — they are reached only via vtable / function-pointer that's never written. Main parks at `sub_822F1AA8` frame-poll loop forever (1.49M XNotifyGetNext iterations). Three canary-only exports (`ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`) remain REAL_BUT_UNREACHED — same as audit-008. **DO NOT pull from this queue.** Next-session probe set: cluster L1 roots + new thread entry trampolines (0x822c6870 / 0x824563e0 / 0x823dde30 / 0x823ddb50) + main's frame-poll callees + main's post-poll continuation list. See KRNBUG-AUDIT-009 in `audit-findings.md` and `project_xenia_rs_audit_009_renderer_unreached_2026_05_05.md`.
|
||
|
||
- Prior status (superseded by AUDIT-009): **AUDIT-008 MODEL RESET (2026-05-05).** 0x100c worker IS spawned post-IO-003 as tid=3 (ctx=0x828F3D08), 0x1004 as tid=11, 0x15e0 as tid=17. AUDIT-008 hypothesized the gate among the 5 non-create-chain callers of `sub_821800D8` whose parents live in 0x82287000-0x82292FFF. AUDIT-009 falsified that — those parents are themselves never entered, so the gate is one level above.
|
||
|
||
- Prior status (superseded by AUDIT-008): **PARTIAL CASCADE (2026-05-04, post-KRNBUG-IO-003). 7 → 3 canary-only exports.** `NtDeviceIoControlFile` real impl landed; the priv-11 query (`XexCheckExecutablePrivilege(0xB)`) and `XamTaskSchedule` now fire. **Reclassified (now firing on our side):** `KeResetEvent`, `ObCreateSymbolicLink`, `XamTaskCloseHandle`, `XamTaskSchedule`. **Bonus pickups:** `XeCryptSha`, `XeKeysConsolePrivateKeySign` (both 0→1 — were not on the canary-only list because they were already in `ours_exports` but unreached). **Still canary-only:** `ExTerminateThread`, `KeReleaseSemaphore`, `XamUserReadProfileSettings`. ~~Worker thread spawn count unchanged at 19; handle 0x100c remains UNCREATED.~~ (audit-008: 0x100c worker IS spawned, claim was wrong.) See KRNBUG-IO-003 in `audit-findings.md` and `project_xenia_rs_io_003_ioctl_2026_05_04.md`.
|
||
|
||
- Prior status (now superseded): **SUPERSEDED by AUDIT-007 (2026-05-04). Real gate identified: `NtDeviceIoControlFile` (FsCtlCode=0x74004) is `stub_success` at `crates/xenia-kernel/src/exports.rs:90`. Game-side `sub_824ABD88:0x824abea8-ac` reads `[out_buf+8]` of the IOCTL response, finds zero (stub doesn't write OUT), assigns hardcoded `0xC0000034` (STATUS_OBJECT_NAME_NOT_FOUND); caller `sub_824A9710` exits at `0x824a9944` before priv-11. Tier 4 entries remain parked, classification unchanged (still REAL_BUT_UNREACHED), awaiting KRNBUG-IO-003. See `project_xenia_rs_audit_007_branch_probe_2026_05_04.md` for the runtime trace + decisive proof.**
|
||
|
||
- Prior status: **PARTIAL — KRNBUG-IO-002 landed, but predicted cascade did NOT fire (7 → 7). Tier 0 marked superseded; Tier 4 entries STILL parked. Re-audit needed to find the real upstream gate.**
|
||
- Pre-state: master HEAD `556a8c3`, exports diff captured 2026-05-04
|
||
- Post-IO-002 state: branch `xboxkrnl-vol-allocunit/p0-65536-cluster`, fresh 500 M trace at `audit-runs/post-IO-002/`. Canary-only kernel exports remain identical: `{ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings}`.
|
||
- Inputs:
|
||
- `canary.log` (348720 B, identical to audit-005 oracle, canary build `9467c77f0`)
|
||
- `ours.log` (692 MB, 5.6 M trace lines, run at 17:20–17:21 today, post-IO-001)
|
||
- Tooling: `diff.py` + plain `comm -23` set-difference on extracted call names
|
||
|
||
## Headline finding
|
||
|
||
**7/7 canary-only entries classify as REAL_BUT_UNREACHED or STUB_BUT_UNREACHED.**
|
||
Per the audit-006 spec stop condition ("if two-thirds of entries are
|
||
REAL_BUT_UNREACHED, the problem isn't stubs — it's an upstream gate"),
|
||
the next session should **NOT** pull a Tier-1 entry from this queue.
|
||
Instead, it should fix the gate.
|
||
|
||
The gate is **KRNBUG-IO-002**: our `nt_query_volume_information_file`
|
||
class-3 (FileFsSizeInformation) returns alloc_unit = 1 × 2048 = 2048,
|
||
but Sylpheed's `main(1, 0x10000, 0xFF000)` expects alloc_unit = 65536
|
||
(see `project_xenia_rs_io_nullfile_2026_05_04.md`).
|
||
Sylpheed's verifier `sub_824ABA98` rejects 2048, propagates failure to
|
||
`sub_824A9710`, which exits early before its `XexCheckExecutablePrivilege(0xB)`
|
||
call site. Canary fires the priv-11 query *and* the entire downstream
|
||
cluster (`XamTaskSchedule` → Cache0 callback thread → 0x100c worker spawn
|
||
→ display-init pump → profile-settings cascade); we fire none of it.
|
||
|
||
Direct evidence (telemetry):
|
||
- Our `XexCheckExecutablePrivilege` count = **1** (priv=0xA only).
|
||
Canary count = **2** (priv=0xA + priv=0xB).
|
||
- All 7 canary-only entries have ours-side count = **0** at -n 500M.
|
||
- Our trace ends with main thread (hw=0) parked on `XNotifyGetNext +
|
||
NtWaitForSingleObjectEx(0x10f4, lr=0x824ac578)` and hw=1 parked on
|
||
`NtWaitForMultipleObjectsEx(lr=0x824ab214) + cs=0x828f3e70` —
|
||
classic post-cache-recreate spin.
|
||
- The 44 `NtWriteFile` calls in ours.log (cache zero-fill) are followed by
|
||
more NtClose / NtCreateFile cycles, but `XexCheckExecutablePrivilege(0xB)`
|
||
never fires → priv-11 site in `sub_824A9710` is unreached.
|
||
- Memory's predicted `0xC000014F` does not yet appear in ours.log; first
|
||
cache-related error is `0xC0000034` (OBJECT_NAME_NOT_FOUND) from
|
||
`lr=0x824a97e4`. This still fits the gate hypothesis: the recreate path
|
||
is reached, completes its writes, re-opens, queries volume info, and
|
||
the *game-side* verifier rejects our reply silently (no kernel error).
|
||
|
||
---
|
||
|
||
## Tier 0 — upstream gate (SUPERSEDED 2026-05-04 — fix landed but cascade did NOT fire)
|
||
|
||
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size — **LANDED, gate hypothesis FALSIFIED**
|
||
|
||
**Outcome:** the block-size literals at `exports.rs:1255-1256` were corrected
|
||
to canary's NullDevice values (`sectors_per_unit=0x80, bytes_per_sector=0x200`,
|
||
product `0x10000`). 591 → 592 tests, lockstep `instructions=100000010, swaps=2,
|
||
draws=0` deterministic across two reruns (`audit-runs/post-IO-002/lock_n100m_run{1,2}.json`).
|
||
sylpheed_n50m oracle still matches its existing golden (no observable change at -n 50M).
|
||
|
||
**However, the predicted cascade DID NOT fire.** Set-difference on a fresh
|
||
500 M trace (`audit-runs/post-IO-002/ours.log`) produces the **identical**
|
||
seven-entry canary-only set audit-006 captured pre-fix:
|
||
|
||
```
|
||
ExTerminateThread, KeReleaseSemaphore, KeResetEvent,
|
||
ObCreateSymbolicLink, XamTaskCloseHandle, XamTaskSchedule,
|
||
XamUserReadProfileSettings
|
||
```
|
||
|
||
`XexCheckExecutablePrivilege` count remains **1** (priv=0xA only, priv=0xB
|
||
unreached). `XamTaskSchedule` count remains **0**. Worker thread spawns
|
||
fell from 19 → 18 (within noise — single thread variance per call-site
|
||
breakdown: `lr=0x824ac5f0×15 + 0x824cd984×1 + 0x824d2e68×2`). The 16
|
||
NtQueryVolumeInformationFile call sites in `ours.log` all originate from
|
||
a single LR `0x82611f38` — meaning the `audit-006` premise that
|
||
`sub_824ABA98`/`sub_824A9710` consume the volume-info reply at the
|
||
priv-11 gate may be **incorrect**, or the gate consumes a *different*
|
||
information class entirely.
|
||
|
||
**Stop-condition triggered.** Per the IO-002 task brief, this session does
|
||
not pivot to a second fix. The fix is correct (it makes our reply
|
||
byte-identical to canary's NullDevice and survives every test we have);
|
||
it is just not load-bearing for the priv-11 gate. The branch landed as a
|
||
strict no-op at our current boot horizon — kept because it's correct and
|
||
unblocks no regression.
|
||
|
||
**Next-session next gate hypothesis (untested):**
|
||
- The audit-005 disasm of `sub_824ABA98` may have mis-attributed the consumer
|
||
of bytes_per_sector. The IO-001 trace decisively located the failure at
|
||
the `NtReadFile` inside `sub_824A9710`, not at any volume-info site.
|
||
Re-read the `sub_824A9710` disasm with that in mind.
|
||
- Volume-info LR `0x82611f38` is far downstream of the priv-10/priv-11
|
||
cluster (the calls *complete* successfully — they don't gate anything
|
||
visible). The actual gate may be `nt_query_information_file`,
|
||
`nt_query_full_attributes_file`, an FsCtl IOCTL, or a different
|
||
alloc-unit query path.
|
||
- Per AUDIT-005 instrumentation, the priv-11 site at `sub_824A9710` PC
|
||
cluster has **never fired** in any session. Probe `sub_824A9710` entry
|
||
with `--pc-probe` and trace which conditional exits the function before
|
||
the priv-11 query — that's the real gate.
|
||
|
||
---
|
||
|
||
### KRNBUG-IO-002 — `nt_query_volume_information_file` block size (original spec, kept for archaeology)
|
||
|
||
- **Where in our code:** `crates/xenia-kernel/src/exports.rs:1241-1269` (function
|
||
`nt_query_volume_information_file`).
|
||
- **Classification:** `REAL_BUT_BUGGY`. Registered at exports.rs:100, called
|
||
16× in ours.log (16× in canary.log too — call counts match), returns
|
||
`STATUS_SUCCESS`, but the FileFsSizeInformation payload is wrong.
|
||
- **Bug:** class=3 branch writes `(total=0x100000, free=0,
|
||
sectors_per_unit=1, bytes_per_sector=2048)`. Product = 2048 bytes per
|
||
cluster.
|
||
- **Canary reference:**
|
||
- Entry function `NtQueryVolumeInformationFile_entry` at
|
||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:323` (case
|
||
`XFileFsSizeInformation` at lines 355–365). Canary delegates to per-device
|
||
methods on `file->device()`.
|
||
- `NullDevice` (the device backing `\Device\Harddisk0\Cache0`) returns
|
||
`sectors_per_allocation_unit() = 0x80` and `bytes_per_sector() = 0x200`
|
||
at `xenia-canary/src/xenia/vfs/devices/null_device.h:38-46`. Product =
|
||
65536, matching Sylpheed's expectation.
|
||
- Other device backings (HostPath, DiscImage, DiscZArchive) all return
|
||
`(1, 0x200)` = 512. Sylpheed's first volume query at this site is
|
||
against Cache0 (NullDevice), so the relevant value is 65536.
|
||
- **Fix sketch (minimum):** in the class=3 branch, change the two writes to
|
||
`mem.write_u32(info+16, 128); mem.write_u32(info+20, 512);` (and reduce
|
||
TotalAllocationUnits accordingly so the disk size remains plausible —
|
||
e.g. 0x10 units × 128 sectors × 512 bytes ≈ 1 MB, matching NullDevice).
|
||
Total diff ≤ 4 lines.
|
||
- **Fix sketch (proper, deferred until the cluster fires reliably):**
|
||
introduce a per-handle device-info lookup so HostPath / DiscImage paths
|
||
return their canary-correct values too. Skipped for now because Sylpheed
|
||
only queries Cache0 at this gate.
|
||
- **Expected observable post-fix:**
|
||
- `XexCheckExecutablePrivilege` count: 1 → 2.
|
||
- `XamTaskSchedule` count: 0 → 1 (callback=0x824A93C8, message=0x828A28F0).
|
||
- `kernel.calls{XamTaskSchedule}` finally non-zero — closes the
|
||
APUBUG-PRODUCER-001 / XAMBUG-PRODUCER-001 producer hunt that
|
||
falsified XAudio + XamTaskSchedule producer hypotheses.
|
||
- Spawn of the Cache0 callback thread (XThread::Execute thid 7 in
|
||
canary, our equivalent to come).
|
||
- Inside that thread: `StfsCreateDevice` (still undefined extern in
|
||
canary too — does not block) + `ObCreateSymbolicLink` +
|
||
`ExRegisterTitleTerminateNotification`.
|
||
- Back on main: `KeResetEvent(0x8287094C)`, `NtCreateEvent`,
|
||
`ExCreateThread(entry=0x82181830, ctx=0x828F3D08)` — and `0x82181830`
|
||
is the worker entry for **dispatcher 0x100c**, one of the four
|
||
parked-handle producers (per
|
||
`project_xenia_rs_producer_stack_trace_2026_05_03.md`). Spawning
|
||
that worker should advance handle 0x100c's `signal_attempts`
|
||
counter off zero.
|
||
- Eventually (further into the boot): `XamUserGetXUID`,
|
||
`XamUserReadProfileSettings`, `XamContentCreateEnumerator`,
|
||
`KeReleaseSemaphore` display-pump (268+ calls in canary at this
|
||
horizon).
|
||
- **Risk:** low. Two-line value change. NullDevice is the only device
|
||
Sylpheed asks about at this gate; other devices are not yet hit.
|
||
- **Effort:** trivial.
|
||
- **Dependencies:** none. Land directly.
|
||
- **Verification chain:** `cargo test -p xenia-kernel`,
|
||
then `cargo run --release -p xenia-app -- exec sylpheed.iso -n 500_000_000`
|
||
with kernel-call tracing on, then re-run audit-006's set-difference;
|
||
expect canary-only count to drop from 7 toward 0 as the cluster fires.
|
||
|
||
---
|
||
|
||
## Tier 4 — REAL_BUT_UNREACHED / STUB_BUT_UNREACHED — do not fix yet
|
||
|
||
These are downstream of Tier 0. Reachability is blocked on KRNBUG-IO-002
|
||
landing. After IO-002 lands, re-derive this list — most entries should
|
||
have moved off, and any survivors will be classifiable on real evidence.
|
||
|
||
| # | Export | Ordinal | Library | Our state | Canary impl | Canary calls (at horizon) | Cascade rank |
|
||
|---|--------|---------|---------|-----------|-------------|---------------------------|--------------|
|
||
| 1 | `XamTaskSchedule` | 0x01AF | xam | REAL_BUT_UNREACHED (`xam_task_schedule`, xam.rs:213) | `xam_task.cc:43-80` | 1 (gate-pivot call) | upstream-of-cluster — fires the entire post-IO-002 cascade |
|
||
| 2 | `XamTaskCloseHandle` | 0x01B1 | xam | STUB_BUT_UNREACHED (`stub_success`, xam.rs:33) | `xam_task.cc:83-93` (one-liner: `NtClose` + last-error) | 1 | low (cleanup after #1) |
|
||
| 3 | `KeResetEvent` | 0x8F | xboxkrnl | REAL_BUT_UNREACHED (`ke_reset_event`, exports.rs:3172) | `xboxkrnl_threading.cc:566` | 1 | medium — clears 0x8287094C right before ExCreateThread(0x82181830) on main |
|
||
| 4 | `ObCreateSymbolicLink` | 0x0103 | xboxkrnl | STUB_BUT_UNREACHED (`stub_success`, exports.rs:121) | `xboxkrnl_ob.cc:351` | 1 | low — Cache0-symlink registration; cosmetic for Sylpheed boot |
|
||
| 5 | `KeReleaseSemaphore` | 0x88 | xboxkrnl | REAL_BUT_UNREACHED (`ke_release_semaphore`, exports.rs:3280) | `xboxkrnl_threading.cc:724` | 268 | high (in volume) — display-init pump on the post-cluster main loop |
|
||
| 6 | `ExTerminateThread` | 0x19 | xboxkrnl | REAL_BUT_UNREACHED (`ex_terminate_thread`, exports.rs:312) | `xboxkrnl_threading.cc:173` | 2 | low — thread cleanup on Cache0 / profile threads |
|
||
| 7 | `XamUserReadProfileSettings` | 0x0219 | xam | REAL_BUT_UNREACHED (`xam_user_read_profile_settings`, xam.rs:327) | `xam_user.cc:329` | 2 | medium — gates the `XamUserGetXUID → profile load` flow far downstream |
|
||
|
||
**Why every entry above is Tier 4 (not Tier 1):**
|
||
|
||
- Each entry's first call in `canary.log` falls **after** line 1210
|
||
(`XamTaskSchedule(824A93C8, ...)`), which is the gate-pivot call.
|
||
- Our trace contains zero of any of the seven, despite running 500 M
|
||
instructions and reaching the post-cache-recreate horizon.
|
||
- Six of the seven are already real implementations. The two stubs
|
||
(`XamTaskCloseHandle`, `ObCreateSymbolicLink`) are minor cleanups; even
|
||
upgrading them would not move boot progress until #1 (`XamTaskSchedule`)
|
||
fires.
|
||
- Therefore: fixing any of these in isolation is wasted effort. They
|
||
should be re-classified after KRNBUG-IO-002 lands and the priv-11 /
|
||
Cache0 callback chain runs.
|
||
|
||
---
|
||
|
||
## Tier 1 / 2 / 3 — empty for this audit
|
||
|
||
No entry qualifies as Tier 1 or Tier 2 in the current state. The single
|
||
high-cascade fix worth pulling next is the Tier-0 gate (KRNBUG-IO-002),
|
||
which is **not itself a canary-only export** — it's a wrong-value bug in
|
||
an export both sides call, so the diff.py based set-difference doesn't
|
||
surface it. That is exactly why audit-006 was scoped this way: to confirm
|
||
the gate hypothesis from `project_xenia_rs_io_nullfile_2026_05_04.md`
|
||
before another implementation session is started.
|
||
|
||
---
|
||
|
||
## Cross-check vs IO-001 snapshot
|
||
|
||
IO-001 memory recorded these 7 still-canary-only exports:
|
||
> ExTerminateThread, KeReleaseSemaphore, KeResetEvent, ObCreateSymbolicLink,
|
||
> XamTaskCloseHandle, XamTaskSchedule, XamUserReadProfileSettings.
|
||
|
||
Audit-006 set-difference produces the **identical** 7, in 1:1
|
||
correspondence. No new canary-only export has appeared since IO-001
|
||
landed; no entry has moved off. Cascade is still parked at the same gate.
|
||
|
||
The `XeCryptSha`, `XeKeysConsolePrivateKeySign`, and `NtDeviceIoControlFile`
|
||
entries that IO-001 was credited with unblocking are confirmed: ours
|
||
calls them 1, 1, 2 times respectively (canary calls them 1, 1, 2 — exact
|
||
match). They are correctly off the canary-only list.
|
||
|
||
---
|
||
|
||
## Methodology notes
|
||
|
||
1. **"Cascade rank" definition:** estimated by where the export's first
|
||
canary call falls in the boot sequence and how many downstream code
|
||
paths depend on it. "high" = upstream-of-cluster (XamTaskSchedule).
|
||
"medium" = intermediate (KeResetEvent, profile cascade).
|
||
"low" = leaf cleanup or cosmetic (XamTaskCloseHandle, ObCreateSymbolicLink).
|
||
Rank only matters once Tier 0 is landed; until then everything is parked.
|
||
2. **Reachability oracle:** binary `grep -c "call=NAME"` against ours.log
|
||
at -n 500M. Zero counts are conclusive for "unreached" because tracing
|
||
is unconditional.
|
||
3. **Canary log freshness:** the log is from 17:34 (3 h before this
|
||
audit) but is byte-identical to audit-005's input — canary's behavior
|
||
is deterministic given the same ROM and the canary build header
|
||
(`canary_experimental@9467c77f0 on May 2 2026`) hasn't changed.
|
||
Re-running through Lutris is unnecessary.
|
||
4. **Gate confirmation:** memory predicted block-size mismatch as the
|
||
IO-002 blocker; this audit confirmed it by eliminating the alternative
|
||
(no Tier-1-eligible canary-only export exists in the current 7-entry
|
||
list). The 0xC000014F status memory predicted is not yet visible in
|
||
ours.log because the recreate path completes the writes — the
|
||
verifier inside `sub_824ABA98` rejects the volume-info reply at the
|
||
game level (no kernel error logged).
|
||
5. **What this queue is *not*:** a list of fixes to land. The audit-006
|
||
discipline was scoping; the discipline of subsequent sessions is to
|
||
re-run audit-006's diff after IO-002, then either close audit-006 (if
|
||
the cluster fires through and all 7 entries drop) or open audit-007
|
||
on whatever new canary-only set surfaces.
|
||
|
||
---
|
||
|
||
## Recommended next session
|
||
|
||
**KRNBUG-IO-002 (block-size fix), one-shot.** Two-line edit at
|
||
`crates/xenia-kernel/src/exports.rs:1255-1256`. Verify the cluster fires
|
||
by re-running audit-006's set-difference; expect 7 → 0 (or close to 0)
|
||
canary-only entries. If new entries surface in either direction, that's
|
||
audit-007's input.
|
||
|
||
**Do not** open this queue's Tier 4 entries before IO-002 closes. Their
|
||
classification is pending; their fix sketches will look very different
|
||
once they're observably called and their actual return values can be
|
||
compared to canary.
|