Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).
Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.
Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
[status mismatch], [args_resolved.path mismatch] tags
(tools/diff-events/diff_events.py); closes reading-error #41
(silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
(event_log.rs + xenia-app/main.rs); closes reading-error #42
(Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
(exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
(exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]
Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.
Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
10 KiB
10 KiB
address, classification, confidence, last_audit, aliases
| address | classification | confidence | last_audit | aliases | ||
|---|---|---|---|---|---|---|
| 0x821CB030 | normal_callee | high | 066 |
|
sub_821CB030 — wedge primary site (creates + submits + waits file-IO completion XEvent)
Synopsis
The function whose body creates, submits work for, and waits on the canonical AUDIT-049/058/059 γ-wedge XEvent. Used by silph::GamePart_Title::UImpl to load cache:\aab216c3\5\… files synchronously: NtCreateEvent at +0x128, work submit at +0x19C (calls sub_82452DC0), wait INFINITE at +0x1AC. The wait is what blocks the entire post-intro phase in ours.
Evidence
- AUDIT-049: tid=13 chain ends at this fn with wait at
0x824ac578(KeWaitForSingleObject in the wait wrapper called from+0x1AC). - AUDIT-058: canary captures
sub_821CB030+0x12c(=PC after the NtCreateEvent bl) in stacks. - AUDIT-059 Probe O ours: handle
0x12AC(Event/Auto) created at0x821cb158(=+0x128), waited at0x821cb1dc(=+0x1AC). Wedge hassignal_attempts=0— never signaled by the worker side. - AUDIT-059 Probe C canary: same PCs fire;
0xF8000098created, thenNtDuplicateObject'd to0xF80000A0, original closed fast, dup signaled by worker viasub_82458B90/sub_8245EC10. - File-IO context: precedes synchronous file load of
cache:\aab216c3\5\…(post-VFS work in AUDIT-054).
Activation
Direct bl from sub_821CBA08+0xd8 (AUDIT-059 create-stack frame 1). One static caller. Higher in the chain: sub_821CC3F8 (GamePart_Title) → sub_821CBA08 → sub_821CB030.
Static graph
- Callers:
sub_821CBA08+0xd8(only static caller)
- Callees of interest:
sub_824A9F18— NtCreateEvent wrapper, called at+0x124 bl(post-call PC =+0x128 = 0x821CB158).sub_82452DC0— work-submitter, called at+0x198 bl(post-call PC =+0x19C).sub_824AC540— wait wrapper, called at+0x1A8 bl(post-call PC =+0x1AC = 0x821CB1DC).
Audit log
- AUDIT-066 (2026-05-12) — source-review only (READ-ONLY). Re-read canary's
xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-389+xfile.cc:19-198+kernel_state.cc:519-551and ours'sxenia-kernel/src/exports.rs:1103-1518, 3747-3764. AUDIT-065's "host-side IO completion threadF8000048signals each per-load event" framing is falsified: (i) canary'sNtReadFile/NtReadFileScatter/NtWriteFileare synchronous and signal the supplied event handle inline viaev->Set(0, false)(lines 210-212, 296-298, 383-385); no host async-IO thread exists; the only host thread "Kernel Dispatch" (kernel_state.cc:524-549) servicesCompleteOverlappedDeferredfor XAM overlapped UI/content, not file IO; (ii)F8000048in AUDIT-065 stdout is a guest XThread thid=10 (entry0x82450A28, ctx0x828F3B68), spawned by main atcanary-run.stdout:1331viaExCreateThread(...,824AFF88, 82450A28, 828F3B68, 0)— theF8prefix is a guest kernel-object handle region marker, NOT a host-thread marker; (iii) cache loads atcanary-run.stdout:2127-2154(sequenceNtCreateEvent → NtCreateFile → NtDuplicateObject → NtQueryInformationFile → NtClose) emit zeroNtReadFile/NtSetEventlines —NtQueryInformationFilehas no event-handle parameter in either engine; (iv) thid=17 (F8000094) terminates viaExTerminateThread(0)WITHOUT ever calling Wait inside its cache loop — so the canary path doesn't even hit this fn's wait sites for the cache files visible in AUDIT-065's stdout. Ours'ssignal_io_completion_event(exports.rs:1156-1169) called from 16 sites innt_read_file/nt_write_file/nt_device_io_control_filealready implements canary'sev->Set(0, false)semantics — there is no missing analog. The wait at this fn's+0x1ACis a wait on thesub_82452DC0work-queue dup'd XEvent, signaled by guest worker-cluster code (γ-signalers A/B/C/D per AUDIT-059/060) — not IO completion. Bug class confirmed = AUDIT-063 structural / bootstrap-ordering. AUDIT-066 fix locus (xenia-kernel/src/exports.rsIO handlers) is the WRONG target; the bug is upstream in worker-cluster bootstrap (sub_825070F0activation gate). [confirmed: NO IO-completion gap] - AUDIT-065 (2026-05-12) — wedge mechanism precisely framed via sub_82173990. Canary's tid=17 worker (= analog of ours's tid=13) reaches
ExTerminateThread(0)after sequentially loadingcache:\aab216c3\5\ee70e0a,cache:\87719002\c\dba806e/ec0a96e,cache:\87719002\a\60fcb85,cache:\87719002\2\85d8849,cache:\87719002\0\1a2db9cetc — 16+ cache file loads — AND spawning child workers viaExCreateThread(..., 824AFF88, 821C4AD0/822C6870, ...). Worker's ownsub_821CB030calls (file-IO completion event waits) complete in canary. In ours, the very first sub_821CB030 call (on handle0x12AC) hangs (NO_SIGNALS_DESPITE_WAITS) — tid=13 never reachesExTerminateThread, tid=1's join wait on0x12A4never completes. Cache file opens succeed in ours (pathscache:/aab216c3/5,cache:/aab216c3etc seen in log just before the stall) — so the bug is post-VFS, in the producer→worker async-IO completion signaling, exactly as AUDIT-062 found. [confirmed] - AUDIT-063 (2026-05-12) — AUDIT-062's candidate trio (
0x822F2304/0x822F1D84/0x821743D8) confirmed as RED HERRINGS: containing fnssub_822F2248/sub_822F1AA8/sub_821741C8resolved, but none are reachable fromsub_82452DC0in 12 hops. Track-A probe (180s canary / 500M-instr ours): canary fires 11.7k× / ours 0× on0x822F1D84and0x821743D8— but they're downstream of an unblocked main event loop (canary tid=6 = guest main). Ours's main (tid=1) isBlockedon0x12A4(tid=13 thread-join handle, AUDIT-049), which transitively blocks on this fn's wedge0x12AC. Real producer is the worker clustersub_82458B90/sub_8245EC10/sub_8245FEB8/sub_8245D9D8/sub_8245DA78running on the 4 workers spawned bysub_825070F0— 0 of those 8 workers spawn in ours vs 8 in canary. The bug is the AUDIT-057 thread-gap closing in on itself: the cluster cannot bootstrap because the wedge isn't signaled, and the wedge isn't signaled because the cluster cannot bootstrap. NO new producer fn was missed by prior audits. [confirmed: trio is symptom not cause] - AUDIT-062 (2026-05-12) — wedge KEVENT data-flow traced. Outcome (b): NtDuplicateObject thunk =
0x8284DF7C; sub_821CB030 has NO direct bl-NtDup (dup is performed by descendant via wrappersub_824AA398). Phase 2 ours--lr-trace=0x8284DF7C: wedge handle0x12ACIS duped by tid=13 cycle 26711 (alongside0x12B0cycle 23833). Out_ptr0x40541E80populated with dup_handle = source_handle =0x12AC(ours aliases perexports.rs:4263). sub_82452DC0 fires 8× in ours; line 8 = wedge submit on tid=13 cycle 8127 lr=0x821CB1D0, with r6=0x40541E80 (job struct carries the dup pointer). So work IS submitted with the right handle. Phase 4 ours--lr-trace=0x8284DF5C,0x824AA2F0: 68 NtSet fires, 0 on0x12AC(neighbors 0x129C / 0x12B0 ARE signaled — infrastructure capable). γ-signalers A/B/C/D all fire (3/2/3/6+2 fires resp.) — but on non-wedge handles. The break is upstream of γ-signaler: ours's worker tid=5 is parked on its OWN idle event0x12B8(created by tid=5 via NtCreateEvent), and no NtSetEvent in ours signals0x12B8(also NO_SIGNALS_DESPITE_WAITS). Producer-side worker-wake signal is missing. Cascade A=NtDup fires correctly on wedge YES (cycle 26711); B=wedge dup NOT signaled CONFIRMED; C=outcome (b) localized to producer→worker wake gap (0x12B8); D=draws>0 deferred to AUDIT-063 fix. New finding: γ-signaler D =sub_8245D9D8/sub_8245DA78(LR0x8245DA44/0x8245DB08) — NtSet wrapper hot from worker-side, missed by AUDIT-059/060 dossier list. Canary spreads NtDup across 6 tids (6/10/16/17/18/26 → 33 fires/180s); ours across 3 (1/5/13 → 14 fires) — confirms AUDIT-057 thread-gap as enabling condition. Traceaudit-runs/audit-062-wedge-kevent-flow/. [confirmed outcome b] - AUDIT-060 (2026-05-12) — confirmed wedge structural identification:
NtCreateEvent → NtDuplicateObject → enqueue → worker → NtSetEvent on dup(canary path); ours stalls at the wait because workers don't signal. [confirmed] - AUDIT-059 (2026-05-11) — established as keystone γ-wedge site. Handle 0x12AC create-site is here at
+0x128. [confirmed] - AUDIT-058 (2026-05-10) — sister mention in tid=13 chain (frames via sub_821CB1D0 ← sub_821CBAE0). [confirmed]
- AUDIT-049 (2026-05-10) — original discovery that tid=13 waits INFINITE on event created here; main thread (tid=1) is downstream via thread-join handle. [confirmed]
Open questions
- Is the
+0x128create the ONLY NtCreateEvent in this fn, or are there multiple? AUDIT-062 db query: exactly 1bl 0x824A9F18(NtCreateEvent wrapper) at+0x128. Twobl 0x82452DC0(+0x19C,+0x2EC) and twobl 0x824AA330wait-wrappers (+0x1AC,+0x318) — same KEVENT submitted+waited twice (sequential file-IO loads), or alternative-branch fork. Canary's 2 fires at0x821CB158therefore mean sub_821CB030 is invoked twice by its caller, each creating a fresh KEVENT. - What does
+0x19C..+0x1A8do between work-submit and wait? (Likely sets up the wait params.) Disassemble to confirm. Does ours's NtDuplicateObject correctly create a signal-aliased handle?AUDIT-062 confirmed: YES — ours aliases (dup_id = source_id), out_ptr populated, refcount bumped. Bug is NOT here.- Open after AUDIT-062: which producer-side call (descendant of
sub_82452DC0) callsNtSetEventon the worker idle event (0x12B8-class) in canary, and why does ours skip it? Probe canary's hot NtSet wrapper LRs0x822F2304, 0x822F1D84, 0x821743D8(9k+ fires each) — one of these is likely the worker-wake.
Cross-references
- Wedge handle in ours: drifts per run (0x1288/0x12A4/0x12AC across audits — see reference_function_dossiers caveat).
- Callers: sub_821CBA08 (not yet dossierd)
- Callees: sub_82452DC0, sub_824A9F18 (NtCreateEvent wrapper)
- Audits: 049, 058, 059, 060, 062
- Artifacts:
audit-runs/audit-049-tid1-stall-0x1280/,audit-runs/audit-059-gamma-wedge/,audit-runs/audit-062-wedge-kevent-flow/