Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).
Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.
Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
[status mismatch], [args_resolved.path mismatch] tags
(tools/diff-events/diff_events.py); closes reading-error #41
(silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
(event_log.rs + xenia-app/main.rs); closes reading-error #42
(Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
(exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
(exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]
Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.
Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
14 KiB
address, classification, confidence, last_audit, aliases
| address | classification | confidence | last_audit | aliases | ||
|---|---|---|---|---|---|---|
| 0x82173990 | normal_callee | high | 066 |
|
sub_82173990 — synchronous task-spawn-and-join helper
Synopsis
Tid=1 (main) one-shot helper that builds a stack-resident task descriptor, calls
sub_821746B0 to allocate+initialize a 24-byte task record (which encapsulates
a sync object created by sub_82172370), and waits INFINITE on that sync
object. The wait at PC 0x82173C60 bl 0x824AA330 (= sub_82173990+0x2D0)
is the AUDIT-049/AUDIT-064 wedge site — both canary and ours enter the wait,
but only canary's wait completes. The wait is on the thread handle of the
worker spawned by sub_821746B0 (XThread or KE_THREAD), released when that
worker calls ExTerminateThread. Function is called exactly 1× per boot in
both engines (entry probe fires once, all body PCs fire once each).
Evidence
Disasm-anatomy (size 768B / 192 insns @ 0x82173990..0x82173C8C):
0x82173990 mflr/prologue (256-byte frame)
0x821739B0 bl 0x8216E7E8 ; first string-init helper (r4=r11+6244)
0x821739CC bl 0x82448AA0 ; cr0=.G. → arg!=0 path
0x821739F0 bl 0x82448BC8 ; returns string-table entry → r28
0x82173A38 bl 0x8216F218 ; internal copy
0x82173A68 bl 0x821835E0 ; → r25 (ID/result); cr6-tests below
0x82173A78 bne cr6, 0x82173A84 ; skip if r25==28
0x82173A88 beq cr6, 0x82173BC0 ; skip if r25==0
…
0x82173B98 bl 0x82453910 ; signaler candidate (AUDIT-049 column)
0x82173BC0 convergence label (string-table clean-up + dispatch)
0x82173BE4 bl 0x824B2188 ; tid=1's outer-channel pump
0x82173C34 bl 0x821746B0 ; allocates 24-byte task record, sub_82172370 fills [r29+4]
0x82173C38 mr r30, r3 ; r30 = task struct
0x82173C48 bl 0x824AA5C8 ; status query → r3 (canary r3=1 → [r31+80]=0x103 STATUS_PENDING)
0x82173C54 bne cr6, 0x82173C64 ; guard: only wait if r11==0x103 (STATUS_PENDING)
0x82173C5C lwz r3, 4(r30) ; r3 = task->sync_handle = [struct+4]
0x82173C60 bl 0x824AA330 ; KeWaitForSingleObject INFINITE ← THE WEDGE
0x82173C70 bl 0x82174AF8 ; post-wait task completion (sub_82174AF8 runs post-state transition)
0x82173C88 epilogue (b 0x825F0FC4)
Canary run (AUDIT-065, 180s wallclock, --audit_61_branch_probe_pcs)
All 17 probed PCs fire exactly 1× each on tid=F8000008 (= canary main / mapped from tid=6):
| PC | lr | r3 | r4 | tid | meaning |
|---|---|---|---|---|---|
0x82173990 |
0x822F1B50 |
BCCC4A80 |
701CF8C0 |
6 | entry; lr=post-bctrl of sub_822F1AA8 |
0x821739CC |
— | 0x00000001 |
0x820A17A8 |
6 | cr0=.G. — cmplwi r28,0 post-strcmp != 0 path |
0x821739F0 |
— | BCCC4A64 |
BCCC49FC |
6 | r28 populated; cr6=..E (==) |
0x82173A38 |
— | 701CF860 |
701CF840 |
6 | inner copy call entry |
0x82173A68 |
— | BDE996FF |
BDE98F14 |
6 | r25=0xBDE996FF (returned ID), cr6=..E |
0x82173A78 |
— | 0x0000001C |
BDE98F14 |
6 | bne 0x82173A84 — r25 != 28 taken |
0x82173BC0 |
0x82173A6C |
0x1C |
… | 6 | convergence (post-bne over alloc); beq 0x82173B14-skipped |
0x82173BE4 |
0x82173BD4 |
BE568F00 |
0x00000005 |
6 | bl 0x824B2188 entry |
0x82173C34 |
0x82173C1C |
BCCC4A80 |
0x00000000 |
6 | bl 0x821746B0 entry — calls task-alloc |
0x82173C38 |
0x82173C38 |
BC365700 |
701CF6E0 |
6 | r3=task_struct, cr6=.G. |
0x82173C48 |
0x82173C38 |
F8000094 |
701CF800 |
6 | post bl 0x824AA5C8; r3=F8000094 ?? actually this is r3 at BB entry post-bl |
0x82173C54 |
0x82173C4C |
0x00000001 |
0x30000000 |
6 | cmplwi r11, 0x103 — value 0x103 sets cr6=..E (eq) per actual r11; BUT cr6=..E means !lt!gt eq — wait was entered |
0x82173C60 |
0x82173C4C |
F8000094 |
FFFFFFFF |
6 | wait entry — r3 = thread handle |
0x82173C64 |
0x82173C64 |
0 |
0x1 |
6 | post-wait — wait completed! |
0x82173C70 |
0x82173C64 |
BCCC4A80 |
BC365700 |
6 | bl 0x82174AF8 (cleanup) |
0x82173C88 |
0x82173C88 |
701CF840 |
… | 6 | epilogue |
Wait duration: ~445 log lines between PC 0x82173C60 (entry) and 0x82173C64 (post-wait).
Inside the wait window: K> F8000094 XThread::Execute thid 17 (handle=F8000094, 'XThread01F4 (F8000094)', native=000001F4) — i.e. F8000094 IS a thread handle. The thread loads cache files (cache:\aab216c3\5\ee70e0a, cache:\87719002\c\dba806e, cache:\87719002\c\ec0a96e, cache:\87719002\a\60fcb85, etc) and spawns child workers (ExCreateThread(..., 824AFF88, 821C4AD0, BCA44C00, ...) and others). The wait completes immediately after d> F8000094 ExTerminateThread(00000000).
Ours run (AUDIT-065, -n 500M instructions)
Only 6 of 24 BB-entry probed PCs fire (ours's branch probe fires only at BB-entry):
| PC | lr | r3 | cycle | meaning |
|---|---|---|---|---|
0x82173990 |
0x822F1B50 |
0x40ba9a80 |
6,172,194 | entry; lr= bctrl in sub_822F1AA8 |
0x821739CC |
— | 0x00000001 |
6,172,686 | non-zero arg path |
0x821739F0 |
— | 0x40ba9a64 |
6,173,074 | str-init complete |
0x82173A68 |
— | 0x41d7e6ff |
9,174,034 | r3=str-table-entry (the AUDIT-049 wait inside sub_82452DC0 already happened HERE — note the cycle gap from 6.17M to 9.17M means tid=1 has been blocked in sub_82172370/etc; actually this is the bl 0x821835E0 post-return) |
0x82173BC0 |
0x82173A6C |
0x1C |
9,175,368 | convergence (r25=0x41d7e6ff != 28 not equal to 28; checks ≠0 also nonzero, fall thru to skip block @ 0x82173B14) |
0x82173C38 |
0x82173C38 |
0x4024a640 |
9,178,243 | post bl 0x821746B0 — r3=0x4024a640 (task struct, ALSO start_ctx of tid=13) |
No probe fires beyond 0x82173C38. The next BB-entry probe was 0x82173C64 (post-wait). Mid-block PCs 0x82173C48/C54/C60/C70 don't fire in ours's branch-probe (per AUDIT-046 reading-error #13). The fact that 0x82173C64 does NOT fire confirms: ours's tid=1 wedges between 0x82173C38 and 0x82173C64 — at the wait at 0x82173C60.
End-of-run thread state (ours --trace-handles dump)
handle=0x000012a4 Thread(id=13, exit=None) waiters(tid)=[1]
handle=0x000012ac kind=Event/Auto waiters=1 signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>
[ 0] cycle=0 tid=13 lr=0x824ac578 src=do_wait_single
handle=0x000012b8 kind=Event/Auto waiters=1 signals=0 waits=1 wakes=0 <NO_SIGNALS_DESPITE_WAITS>
[ 0] cycle=0 tid=5 lr=0x824ac578 src=do_wait_single
tid=13 (handle 0x12A4, exit=None) is alive but stuck on event 0x12AC inside sub_821CB030 (cache file IO completion event). tid=5 is one of the workers parked on its own idle event 0x12B8. tid=1 join-waits tid=13 → tid=13 waits 0x12AC → 0x12AC needs workers → workers parked.
Activation
Direct bl from sub_82175330+0x4 via tail-jump (post-bctrl of vtable[0] dispatched at sub_822F1AA8+0xA4). One static caller sub_82175330 per sylpheed.db.
Called exactly 1× per boot on tid=1 in both engines.
Static graph
- Direct callers (sylpheed.db
xrefs.source_func):sub_82175330+0x4viab 0x82173990(tail-jump from the vtable thunk).
- Direct callees of interest:
bl 0x8216E7E8at+0x20— string-table helper (used twice).bl 0x82448AA0at+0x38,+0x48,+0x88,+0xC4,+0x168,+0x260— string-table lookup.bl 0x824AA7A0at+0x4C— string-helper.bl 0x82448BC8at+0x5C— internal lookup.bl 0x82448C50at+0x78,+0x98,+0x178,+0x308— string convert.bl 0x8216F218at+0xA8,+0x188— copy / string ops.bl 0x8217C850at+0xAC— query.bl 0x82178E50at+0xB8— query.bl 0x821835E0at+0xD8— returns ID intor25(key gate).bl 0x824AA830at+0xFC,+0x148— kernel helper.bl 0x822C69C8at+0x104,+0x134— task-helper.bl 0x822DE650at+0x118— helper.bl 0x822F2328at+0x124,+0x240— helper (calls inside outer sub_822F1AA8 too).bl 0x822DE858at+0x13C— helper.bl 0x822F28C0at+0x144,+0x25C— helper.bl 0x82674028at+0x15C— kernel-debug printf? format.bl 0x82150EF8at+0x1A4— heap alloc 28-byte struct.bl 0x824523E8at+0x1FC— task-helper.bl 0x82453910at+0x208— signaler/notify (AUDIT-049 column).bl 0x821506B8at+0x224— heap free.bl 0x8216E790at+0x22C,+0x2EC,+0x2F4— string-cleanup.bl 0x824B2188at+0x254— tid=1's outer-channel pump.bl 0x824482D0at+0x288— format.bl 0x821746B0at+0x2A4— task allocator + worker spawn (the key call).bl 0x824AA5C8at+0x2B8— status query (returnsr3→ checked vs0x103 STATUS_PENDING).bl 0x824AA330at+0x2D0—KeWaitForSingleObjectINFINITE — THE WEDGE PC.bl 0x82174AF8at+0x2E0— post-wait task-cleanup/state-transition.b 0x825F0FC4at+0x2FC— epilogue tail-jump.
Audit log
- AUDIT-066 (2026-05-12) — source-review only (READ-ONLY). AUDIT-065's "host-side
F8000048IO completion thread" inference falsified by canary source review.F8000048is a guest XThread thid=10, spawned by main at canary-run.stdout:1331 viaExCreateThread(...,824AFF88, 82450A28, 828F3B68, 0)— entry0x82450A28is a Sylpheed-internal worker thread, not host infrastructure. Canary's only host helper thread is "Kernel Dispatch" (xenia-canary/src/xenia/kernel/kernel_state.cc:524-549) which servicesCompleteOverlappedDeferredfor XAM UI/content, not file IO. Canary'sNtReadFile/NtReadFileScatter/NtWriteFile(xboxkrnl_io.cc:125-389) are synchronous and signal the supplied event handle inline viaev->Set(0, false)after the sync read. Ours'ssignal_io_completion_event(exports.rs:1156-1169) is the bit-equivalent. No "host-side IO completion signal" gap exists in ours's IO handlers. The wait at this fn's+0x2D0(PC0x82173C60) is on the thread handle for the worker spawned viabl 0x821746B0(= tid=13 in ours / thid=17 in canary), released byExTerminateThreadper AUDIT-065 — confirmed correct framing. AUDIT-066 conclusion: brief's proposed fix locus (xenia-kernel/src/exports.rsIO handlers) is wrong; the bug is upstream worker-cluster bootstrap (AUDIT-057/063/064 chain). No code change of any size inexports.rswould unwedge tid=13. [confirmed: brief premise falsified] - AUDIT-065 (2026-05-12) — full disasm + 17-PC probe in BOTH engines. Canary's tid=1 (= F8000008, internally tid=6) reaches PC
0x82173C60exactly once, waits on r3=0xF8000094(= XThread thid 17's thread handle), and the wait completes when that worker reachesExTerminateThread(0). Worker runs synchronous cache file IO (cache:\aab216c3\5\ee70e0a,cache:\87719002\c\...etc) and spawns child workers viaExCreateThread(... 824AFF88, 821C4AD0, BCA44C00 ...)before terminating. Ours's tid=1 reaches PC0x82173C38(postbl 0x821746B0, r3=0x4024a640= ours's task struct = ours's tid=13's start_ctx) and stalls before0x82173C64— i.e. inside the wait at0x82173C60. Ours's tid=13 (created bybl 0x821746B0's subroutine, entry0x821748F0) DOES open the same cache files (cache:/aab216c3/5,cache:/aab216c3) but BLOCKS insidesub_821CB030+0x1ACon event0x12AC(NO_SIGNALS_DESPITE_WAITS). SoExTerminateThreadis never reached on ours's tid=13 → tid=1's wait on0x12A4never completes. Refines the wedge from "thread-join on tid=13" to a precise mechanism: the wait at+0x2D0is structurally a synchronous task-join (canary worker exits in ~1ms; ours's worker is permanently stuck downstream).sub_82173990body itself is clean — every probed PC except the wait completion matches canary's behavior; the divergence is entirely in what happens insidesub_821746B0's spawned worker (tid=13's body insub_821748F0→sub_821C4EB0→sub_821CC3F8→sub_821CBA08→sub_821CB030). Same AUDIT-049 island, now framed as: how does canary's worker get its0x12AC-equivalent event signaled fast enough that the worker can callExTerminateThread? [confirmed]
Open questions
- What signals canary's tid=17 cache-IO completion event (the
0x12AC-equivalent inside itssub_821CB030)? Probe canary'sNtSetEvent/KeSetEventthunks (0x8284DF5C/0x82490018) filtered on tid=17'sr3-handle in the lr-window around the corresponding tid=17 wait. Compare against ours's empty0x12AC.signalscount. - The cache file open flow looks identical in both engines for
cache:\aab216c3\…paths — confirming AUDIT-054's VFS layout fix landed correctly. The divergence is purely in the producer-side signaling. - Both engines pass through PC
0x82173B98(thesub_82453910"signaler" candidate) on ther25!=0 && r25!=28path — but only if the AUDIT-046 "5/5 iter" path lands inside the block[0x82173B14, 0x82173BC0]. Ours's BB-entry probe shows0x82173BC0fires but NOT0x82173B14/B40/B84/B98/BA0/BA8— meaning ours's beq at0x82173B14is taken (r5==0) and we skip the entire block. Canary's BR list also shows direct jump from0x82173A78→0x82173BC0(line 1998→1999) — so canary also skips the block at 0x82173B14. Block is dead in both engines at this horizon.sub_82453910is NOT the relevant signaler at this call site. r3at0x82173C48showsF8000094in canary at probe time, but that's the post-bl 0x824AA5C8PC (mid-block; the captured value is whatever r3 carried at that instant — likely the handle being queried, not the return). Worth a follow-up probe to confirm the status-query target.
Cross-references
- Caller: sub_822F1AA8 (via thunk sub_82175330, vtable[0] of
*(0x828E1F08)). - Callees of interest:
sub_821746B0— task allocator + worker spawn (no dossier yet — recommend creating one).sub_82172370— sync object creator (no dossier yet).sub_82174AF8— post-wait cleanup (no dossier yet).
- Worker-side wedge: sub_821CB030 — fires inside the worker spawned via sub_821746B0 from sub_82173990's
blat+0x2A4. - Audits: 049 (original tid=1 stall localization), 064 (full activation chain to sub_825070F0), 065 (this).
- Artifacts:
audit-runs/audit-065-sub82173990-wait-site/{sub_82173990.disasm,canary.log,canary-run.stdout,ours.log,ours-stdout.log,summary.md}.