Files
xenia-rs/docs/functions/sub_821CB030.md
MechaCat02 ad45873a1b ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 07:27:26 +02:00

10 KiB
Raw Blame History

address, classification, confidence, last_audit, aliases
address classification confidence last_audit aliases
0x821CB030 normal_callee high 066
wedge primary site
file-IO completion event creator+waiter

sub_821CB030 — wedge primary site (creates + submits + waits file-IO completion XEvent)

Synopsis

The function whose body creates, submits work for, and waits on the canonical AUDIT-049/058/059 γ-wedge XEvent. Used by silph::GamePart_Title::UImpl to load cache:\aab216c3\5\… files synchronously: NtCreateEvent at +0x128, work submit at +0x19C (calls sub_82452DC0), wait INFINITE at +0x1AC. The wait is what blocks the entire post-intro phase in ours.

Evidence

  • AUDIT-049: tid=13 chain ends at this fn with wait at 0x824ac578 (KeWaitForSingleObject in the wait wrapper called from +0x1AC).
  • AUDIT-058: canary captures sub_821CB030+0x12c (=PC after the NtCreateEvent bl) in stacks.
  • AUDIT-059 Probe O ours: handle 0x12AC (Event/Auto) created at 0x821cb158 (=+0x128), waited at 0x821cb1dc (=+0x1AC). Wedge has signal_attempts=0 — never signaled by the worker side.
  • AUDIT-059 Probe C canary: same PCs fire; 0xF8000098 created, then NtDuplicateObject'd to 0xF80000A0, original closed fast, dup signaled by worker via sub_82458B90/sub_8245EC10.
  • File-IO context: precedes synchronous file load of cache:\aab216c3\5\… (post-VFS work in AUDIT-054).

Activation

Direct bl from sub_821CBA08+0xd8 (AUDIT-059 create-stack frame 1). One static caller. Higher in the chain: sub_821CC3F8 (GamePart_Title) → sub_821CBA08 → sub_821CB030.

Static graph

  • Callers:
    • sub_821CBA08+0xd8 (only static caller)
  • Callees of interest:
    • sub_824A9F18 — NtCreateEvent wrapper, called at +0x124 bl (post-call PC = +0x128 = 0x821CB158).
    • sub_82452DC0 — work-submitter, called at +0x198 bl (post-call PC = +0x19C).
    • sub_824AC540 — wait wrapper, called at +0x1A8 bl (post-call PC = +0x1AC = 0x821CB1DC).

Audit log

  • AUDIT-066 (2026-05-12)source-review only (READ-ONLY). Re-read canary's xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-389 + xfile.cc:19-198 + kernel_state.cc:519-551 and ours's xenia-kernel/src/exports.rs:1103-1518, 3747-3764. AUDIT-065's "host-side IO completion thread F8000048 signals each per-load event" framing is falsified: (i) canary's NtReadFile/NtReadFileScatter/NtWriteFile are synchronous and signal the supplied event handle inline via ev->Set(0, false) (lines 210-212, 296-298, 383-385); no host async-IO thread exists; the only host thread "Kernel Dispatch" (kernel_state.cc:524-549) services CompleteOverlappedDeferred for XAM overlapped UI/content, not file IO; (ii) F8000048 in AUDIT-065 stdout is a guest XThread thid=10 (entry 0x82450A28, ctx 0x828F3B68), spawned by main at canary-run.stdout:1331 via ExCreateThread(...,824AFF88, 82450A28, 828F3B68, 0) — the F8 prefix is a guest kernel-object handle region marker, NOT a host-thread marker; (iii) cache loads at canary-run.stdout:2127-2154 (sequence NtCreateEvent → NtCreateFile → NtDuplicateObject → NtQueryInformationFile → NtClose) emit zero NtReadFile/NtSetEvent lines — NtQueryInformationFile has no event-handle parameter in either engine; (iv) thid=17 (F8000094) terminates via ExTerminateThread(0) WITHOUT ever calling Wait inside its cache loop — so the canary path doesn't even hit this fn's wait sites for the cache files visible in AUDIT-065's stdout. Ours's signal_io_completion_event (exports.rs:1156-1169) called from 16 sites in nt_read_file/nt_write_file/nt_device_io_control_file already implements canary's ev->Set(0, false) semantics — there is no missing analog. The wait at this fn's +0x1AC is a wait on the sub_82452DC0 work-queue dup'd XEvent, signaled by guest worker-cluster code (γ-signalers A/B/C/D per AUDIT-059/060) — not IO completion. Bug class confirmed = AUDIT-063 structural / bootstrap-ordering. AUDIT-066 fix locus (xenia-kernel/src/exports.rs IO handlers) is the WRONG target; the bug is upstream in worker-cluster bootstrap (sub_825070F0 activation gate). [confirmed: NO IO-completion gap]
  • AUDIT-065 (2026-05-12) — wedge mechanism precisely framed via sub_82173990. Canary's tid=17 worker (= analog of ours's tid=13) reaches ExTerminateThread(0) after sequentially loading cache:\aab216c3\5\ee70e0a, cache:\87719002\c\dba806e/ec0a96e, cache:\87719002\a\60fcb85, cache:\87719002\2\85d8849, cache:\87719002\0\1a2db9c etc — 16+ cache file loads — AND spawning child workers via ExCreateThread(..., 824AFF88, 821C4AD0/822C6870, ...). Worker's own sub_821CB030 calls (file-IO completion event waits) complete in canary. In ours, the very first sub_821CB030 call (on handle 0x12AC) hangs (NO_SIGNALS_DESPITE_WAITS) — tid=13 never reaches ExTerminateThread, tid=1's join wait on 0x12A4 never completes. Cache file opens succeed in ours (paths cache:/aab216c3/5, cache:/aab216c3 etc seen in log just before the stall) — so the bug is post-VFS, in the producer→worker async-IO completion signaling, exactly as AUDIT-062 found. [confirmed]
  • AUDIT-063 (2026-05-12) — AUDIT-062's candidate trio (0x822F2304/0x822F1D84/0x821743D8) confirmed as RED HERRINGS: containing fns sub_822F2248/sub_822F1AA8/sub_821741C8 resolved, but none are reachable from sub_82452DC0 in 12 hops. Track-A probe (180s canary / 500M-instr ours): canary fires 11.7k× / ours 0× on 0x822F1D84 and 0x821743D8 — but they're downstream of an unblocked main event loop (canary tid=6 = guest main). Ours's main (tid=1) is Blocked on 0x12A4 (tid=13 thread-join handle, AUDIT-049), which transitively blocks on this fn's wedge 0x12AC. Real producer is the worker cluster sub_82458B90/sub_8245EC10/sub_8245FEB8/sub_8245D9D8/sub_8245DA78 running on the 4 workers spawned by sub_825070F00 of those 8 workers spawn in ours vs 8 in canary. The bug is the AUDIT-057 thread-gap closing in on itself: the cluster cannot bootstrap because the wedge isn't signaled, and the wedge isn't signaled because the cluster cannot bootstrap. NO new producer fn was missed by prior audits. [confirmed: trio is symptom not cause]
  • AUDIT-062 (2026-05-12) — wedge KEVENT data-flow traced. Outcome (b): NtDuplicateObject thunk = 0x8284DF7C; sub_821CB030 has NO direct bl-NtDup (dup is performed by descendant via wrapper sub_824AA398). Phase 2 ours --lr-trace=0x8284DF7C: wedge handle 0x12AC IS duped by tid=13 cycle 26711 (alongside 0x12B0 cycle 23833). Out_ptr 0x40541E80 populated with dup_handle = source_handle = 0x12AC (ours aliases per exports.rs:4263). sub_82452DC0 fires 8× in ours; line 8 = wedge submit on tid=13 cycle 8127 lr=0x821CB1D0, with r6=0x40541E80 (job struct carries the dup pointer). So work IS submitted with the right handle. Phase 4 ours --lr-trace=0x8284DF5C,0x824AA2F0: 68 NtSet fires, 0 on 0x12AC (neighbors 0x129C / 0x12B0 ARE signaled — infrastructure capable). γ-signalers A/B/C/D all fire (3/2/3/6+2 fires resp.) — but on non-wedge handles. The break is upstream of γ-signaler: ours's worker tid=5 is parked on its OWN idle event 0x12B8 (created by tid=5 via NtCreateEvent), and no NtSetEvent in ours signals 0x12B8 (also NO_SIGNALS_DESPITE_WAITS). Producer-side worker-wake signal is missing. Cascade A=NtDup fires correctly on wedge YES (cycle 26711); B=wedge dup NOT signaled CONFIRMED; C=outcome (b) localized to producer→worker wake gap (0x12B8); D=draws>0 deferred to AUDIT-063 fix. New finding: γ-signaler D = sub_8245D9D8 / sub_8245DA78 (LR 0x8245DA44 / 0x8245DB08) — NtSet wrapper hot from worker-side, missed by AUDIT-059/060 dossier list. Canary spreads NtDup across 6 tids (6/10/16/17/18/26 → 33 fires/180s); ours across 3 (1/5/13 → 14 fires) — confirms AUDIT-057 thread-gap as enabling condition. Trace audit-runs/audit-062-wedge-kevent-flow/. [confirmed outcome b]
  • AUDIT-060 (2026-05-12) — confirmed wedge structural identification: NtCreateEvent → NtDuplicateObject → enqueue → worker → NtSetEvent on dup (canary path); ours stalls at the wait because workers don't signal. [confirmed]
  • AUDIT-059 (2026-05-11) — established as keystone γ-wedge site. Handle 0x12AC create-site is here at +0x128. [confirmed]
  • AUDIT-058 (2026-05-10) — sister mention in tid=13 chain (frames via sub_821CB1D0 ← sub_821CBAE0). [confirmed]
  • AUDIT-049 (2026-05-10) — original discovery that tid=13 waits INFINITE on event created here; main thread (tid=1) is downstream via thread-join handle. [confirmed]

Open questions

  • Is the +0x128 create the ONLY NtCreateEvent in this fn, or are there multiple? AUDIT-062 db query: exactly 1 bl 0x824A9F18 (NtCreateEvent wrapper) at +0x128. Two bl 0x82452DC0 (+0x19C, +0x2EC) and two bl 0x824AA330 wait-wrappers (+0x1AC, +0x318) — same KEVENT submitted+waited twice (sequential file-IO loads), or alternative-branch fork. Canary's 2 fires at 0x821CB158 therefore mean sub_821CB030 is invoked twice by its caller, each creating a fresh KEVENT.
  • What does +0x19C..+0x1A8 do between work-submit and wait? (Likely sets up the wait params.) Disassemble to confirm.
  • Does ours's NtDuplicateObject correctly create a signal-aliased handle? AUDIT-062 confirmed: YES — ours aliases (dup_id = source_id), out_ptr populated, refcount bumped. Bug is NOT here.
  • Open after AUDIT-062: which producer-side call (descendant of sub_82452DC0) calls NtSetEvent on the worker idle event (0x12B8-class) in canary, and why does ours skip it? Probe canary's hot NtSet wrapper LRs 0x822F2304, 0x822F1D84, 0x821743D8 (9k+ fires each) — one of these is likely the worker-wake.

Cross-references

  • Wedge handle in ours: drifts per run (0x1288/0x12A4/0x12AC across audits — see reference_function_dossiers caveat).
  • Callers: sub_821CBA08 (not yet dossierd)
  • Callees: sub_82452DC0, sub_824A9F18 (NtCreateEvent wrapper)
  • Audits: 049, 058, 059, 060, 062
  • Artifacts: audit-runs/audit-049-tid1-stall-0x1280/, audit-runs/audit-059-gamma-wedge/, audit-runs/audit-062-wedge-kevent-flow/