Files
xenia-rs/docs/functions/sub_821CB030.md
MechaCat02 ad45873a1b ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 07:27:26 +02:00

63 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
---
address: 0x821CB030
classification: normal_callee
confidence: high
last_audit: 066
aliases:
- "wedge primary site"
- "file-IO completion event creator+waiter"
---
# sub_821CB030 — wedge primary site (creates + submits + waits file-IO completion XEvent)
## Synopsis
The function whose body creates, submits work for, and waits on the canonical AUDIT-049/058/059 γ-wedge XEvent. Used by `silph::GamePart_Title::UImpl` to load `cache:\aab216c3\5\…` files synchronously: NtCreateEvent at `+0x128`, work submit at `+0x19C` (calls `sub_82452DC0`), wait INFINITE at `+0x1AC`. The wait is what blocks the entire post-intro phase in ours.
## Evidence
- AUDIT-049: tid=13 chain ends at this fn with wait at `0x824ac578` (KeWaitForSingleObject in the wait wrapper called from `+0x1AC`).
- AUDIT-058: canary captures `sub_821CB030+0x12c` (=PC after the NtCreateEvent bl) in stacks.
- AUDIT-059 Probe O ours: handle `0x12AC` (Event/Auto) created at `0x821cb158` (=`+0x128`), waited at `0x821cb1dc` (=`+0x1AC`). Wedge has `signal_attempts=0` — never signaled by the worker side.
- AUDIT-059 Probe C canary: same PCs fire; `0xF8000098` created, then `NtDuplicateObject`'d to `0xF80000A0`, original closed fast, dup signaled by worker via `sub_82458B90`/`sub_8245EC10`.
- File-IO context: precedes synchronous file load of `cache:\aab216c3\5\…` (post-VFS work in AUDIT-054).
## Activation
Direct `bl` from `sub_821CBA08+0xd8` (AUDIT-059 create-stack frame 1). One static caller. Higher in the chain: `sub_821CC3F8 (GamePart_Title) → sub_821CBA08 → sub_821CB030`.
## Static graph
- Callers:
- `sub_821CBA08+0xd8` (only static caller)
- Callees of interest:
- `sub_824A9F18` — NtCreateEvent wrapper, called at `+0x124 bl` (post-call PC = `+0x128 = 0x821CB158`).
- `sub_82452DC0` — work-submitter, called at `+0x198 bl` (post-call PC = `+0x19C`).
- `sub_824AC540` — wait wrapper, called at `+0x1A8 bl` (post-call PC = `+0x1AC = 0x821CB1DC`).
## Audit log
- **AUDIT-066 (2026-05-12)** — **source-review only (READ-ONLY)**. Re-read canary's `xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-389` + `xfile.cc:19-198` + `kernel_state.cc:519-551` and ours's `xenia-kernel/src/exports.rs:1103-1518, 3747-3764`. AUDIT-065's "host-side IO completion thread `F8000048` signals each per-load event" framing is **falsified**: (i) canary's `NtReadFile`/`NtReadFileScatter`/`NtWriteFile` are synchronous and signal the supplied event handle **inline** via `ev->Set(0, false)` (lines 210-212, 296-298, 383-385); no host async-IO thread exists; the only host thread "Kernel Dispatch" (`kernel_state.cc:524-549`) services `CompleteOverlappedDeferred` for XAM overlapped UI/content, not file IO; (ii) `F8000048` in AUDIT-065 stdout is a **guest XThread** thid=10 (entry `0x82450A28`, ctx `0x828F3B68`), spawned by main at `canary-run.stdout:1331` via `ExCreateThread(...,824AFF88, 82450A28, 828F3B68, 0)` — the `F8` prefix is a guest kernel-object handle region marker, NOT a host-thread marker; (iii) cache loads at `canary-run.stdout:2127-2154` (sequence `NtCreateEvent → NtCreateFile → NtDuplicateObject → NtQueryInformationFile → NtClose`) emit **zero** `NtReadFile`/`NtSetEvent` lines — `NtQueryInformationFile` has no event-handle parameter in either engine; (iv) thid=17 (`F8000094`) terminates via `ExTerminateThread(0)` WITHOUT ever calling Wait inside its cache loop — so the canary path doesn't even hit this fn's wait sites for the cache files visible in AUDIT-065's stdout. Ours's `signal_io_completion_event` (`exports.rs:1156-1169`) called from 16 sites in `nt_read_file`/`nt_write_file`/`nt_device_io_control_file` already implements canary's `ev->Set(0, false)` semantics — **there is no missing analog**. The wait at this fn's `+0x1AC` is a wait on the `sub_82452DC0` work-queue dup'd XEvent, signaled by guest worker-cluster code (γ-signalers A/B/C/D per AUDIT-059/060) — not IO completion. Bug class confirmed = AUDIT-063 structural / bootstrap-ordering. **AUDIT-066 fix locus (`xenia-kernel/src/exports.rs` IO handlers) is the WRONG target**; the bug is upstream in worker-cluster bootstrap (`sub_825070F0` activation gate). [confirmed: NO IO-completion gap]
- **AUDIT-065 (2026-05-12)** — wedge mechanism precisely framed via [sub_82173990](sub_82173990.md). Canary's tid=17 worker (= analog of ours's tid=13) reaches `ExTerminateThread(0)` after sequentially loading `cache:\aab216c3\5\ee70e0a`, `cache:\87719002\c\dba806e/ec0a96e`, `cache:\87719002\a\60fcb85`, `cache:\87719002\2\85d8849`, `cache:\87719002\0\1a2db9c` etc — 16+ cache file loads — AND spawning child workers via `ExCreateThread(..., 824AFF88, 821C4AD0/822C6870, ...)`. Worker's own `sub_821CB030` calls (file-IO completion event waits) complete in canary. **In ours, the very first sub_821CB030 call (on handle `0x12AC`) hangs (`NO_SIGNALS_DESPITE_WAITS`)** — tid=13 never reaches `ExTerminateThread`, tid=1's join wait on `0x12A4` never completes. Cache file opens succeed in ours (paths `cache:/aab216c3/5`, `cache:/aab216c3` etc seen in log just before the stall) — so the bug is post-VFS, in the producer→worker async-IO completion signaling, exactly as AUDIT-062 found. [confirmed]
- **AUDIT-063 (2026-05-12)** — AUDIT-062's candidate trio (`0x822F2304`/`0x822F1D84`/`0x821743D8`) confirmed as RED HERRINGS: containing fns `sub_822F2248`/`sub_822F1AA8`/`sub_821741C8` resolved, but **none are reachable from `sub_82452DC0` in 12 hops**. Track-A probe (180s canary / 500M-instr ours): canary fires 11.7k× / ours 0× on `0x822F1D84` and `0x821743D8` — but they're downstream of an unblocked main event loop (canary tid=6 = guest main). Ours's main (tid=1) is `Blocked` on `0x12A4` (tid=13 thread-join handle, AUDIT-049), which transitively blocks on this fn's wedge `0x12AC`. Real producer is the worker cluster `sub_82458B90`/`sub_8245EC10`/`sub_8245FEB8`/`sub_8245D9D8`/`sub_8245DA78` running on the 4 workers spawned by `sub_825070F0`**0 of those 8 workers spawn in ours** vs 8 in canary. The bug is the AUDIT-057 thread-gap closing in on itself: the cluster cannot bootstrap because the wedge isn't signaled, and the wedge isn't signaled because the cluster cannot bootstrap. NO new producer fn was missed by prior audits. [confirmed: trio is symptom not cause]
- **AUDIT-062 (2026-05-12)** — wedge KEVENT data-flow traced. Outcome **(b)**: NtDuplicateObject thunk = `0x8284DF7C`; sub_821CB030 has NO direct bl-NtDup (dup is performed by descendant via wrapper `sub_824AA398`). Phase 2 ours `--lr-trace=0x8284DF7C`: wedge handle `0x12AC` IS duped by tid=13 cycle 26711 (alongside `0x12B0` cycle 23833). Out_ptr `0x40541E80` populated with dup_handle = source_handle = `0x12AC` (ours aliases per `exports.rs:4263`). sub_82452DC0 fires 8× in ours; line 8 = wedge submit on tid=13 cycle 8127 lr=0x821CB1D0, with r6=0x40541E80 (job struct carries the dup pointer). So **work IS submitted with the right handle**. Phase 4 ours `--lr-trace=0x8284DF5C,0x824AA2F0`: 68 NtSet fires, **0 on `0x12AC`** (neighbors 0x129C / 0x12B0 ARE signaled — infrastructure capable). γ-signalers A/B/C/D all fire (3/2/3/6+2 fires resp.) — but on non-wedge handles. **The break is upstream of γ-signaler**: ours's worker tid=5 is parked on its OWN idle event `0x12B8` (created by tid=5 via NtCreateEvent), and **no NtSetEvent in ours signals `0x12B8`** (also NO_SIGNALS_DESPITE_WAITS). Producer-side worker-wake signal is missing. Cascade A=NtDup fires correctly on wedge YES (cycle 26711); B=wedge dup NOT signaled CONFIRMED; C=outcome (b) localized to producer→worker wake gap (`0x12B8`); D=draws>0 deferred to AUDIT-063 fix. New finding: **γ-signaler D = `sub_8245D9D8` / `sub_8245DA78`** (LR `0x8245DA44` / `0x8245DB08`) — NtSet wrapper hot from worker-side, missed by AUDIT-059/060 dossier list. Canary spreads NtDup across 6 tids (6/10/16/17/18/26 → 33 fires/180s); ours across 3 (1/5/13 → 14 fires) — confirms AUDIT-057 thread-gap as enabling condition. Trace `audit-runs/audit-062-wedge-kevent-flow/`. [confirmed outcome b]
- **AUDIT-060 (2026-05-12)** — confirmed wedge structural identification: `NtCreateEvent → NtDuplicateObject → enqueue → worker → NtSetEvent on dup` (canary path); ours stalls at the wait because workers don't signal. [confirmed]
- **AUDIT-059 (2026-05-11)** — established as keystone γ-wedge site. Handle 0x12AC create-site is here at `+0x128`. [confirmed]
- **AUDIT-058 (2026-05-10)** — sister mention in tid=13 chain (frames via sub_821CB1D0 ← sub_821CBAE0). [confirmed]
- **AUDIT-049 (2026-05-10)** — original discovery that tid=13 waits INFINITE on event created here; main thread (tid=1) is downstream via thread-join handle. [confirmed]
## Open questions
- Is the `+0x128` create the ONLY NtCreateEvent in this fn, or are there multiple? **AUDIT-062 db query: exactly 1 `bl 0x824A9F18` (NtCreateEvent wrapper) at `+0x128`.** Two `bl 0x82452DC0` (`+0x19C`, `+0x2EC`) and two `bl 0x824AA330` wait-wrappers (`+0x1AC`, `+0x318`) — same KEVENT submitted+waited twice (sequential file-IO loads), or alternative-branch fork. Canary's 2 fires at `0x821CB158` therefore mean sub_821CB030 is *invoked twice* by its caller, each creating a fresh KEVENT.
- What does `+0x19C..+0x1A8` do between work-submit and wait? (Likely sets up the wait params.) Disassemble to confirm.
- ~~Does ours's NtDuplicateObject correctly create a signal-aliased handle?~~ AUDIT-062 confirmed: YES — ours aliases (dup_id = source_id), out_ptr populated, refcount bumped. Bug is NOT here.
- **Open after AUDIT-062**: which producer-side call (descendant of `sub_82452DC0`) calls `NtSetEvent` on the worker idle event (`0x12B8`-class) in canary, and why does ours skip it? Probe canary's hot NtSet wrapper LRs `0x822F2304, 0x822F1D84, 0x821743D8` (9k+ fires each) — one of these is likely the worker-wake.
## Cross-references
- Wedge handle in ours: drifts per run (0x1288/0x12A4/0x12AC across audits — see [reference_function_dossiers](docs/functions/README.md) caveat).
- Callers: [sub_821CBA08](#) (not yet dossierd)
- Callees: [sub_82452DC0](sub_82452DC0.md), sub_824A9F18 (NtCreateEvent wrapper)
- Audits: 049, 058, 059, 060, 062
- Artifacts: `audit-runs/audit-049-tid1-stall-0x1280/`, `audit-runs/audit-059-gamma-wedge/`, `audit-runs/audit-062-wedge-kevent-flow/`