ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).
Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.
Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
[status mismatch], [args_resolved.path mismatch] tags
(tools/diff-events/diff_events.py); closes reading-error #41
(silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
(event_log.rs + xenia-app/main.rs); closes reading-error #42
(Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
(exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
(exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]
Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.
Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
55
docs/functions/sub_822F1AA8.md
Normal file
55
docs/functions/sub_822F1AA8.md
Normal file
@@ -0,0 +1,55 @@
|
||||
---
|
||||
address: 0x822F1AA8
|
||||
classification: normal_callee
|
||||
confidence: high
|
||||
last_audit: 065
|
||||
aliases:
|
||||
- "tid=1 post-init dispatch loop (calls sub_82173990 via vtable[0])"
|
||||
---
|
||||
|
||||
# sub_822F1AA8 — tid=1 post-init game-loop dispatcher
|
||||
|
||||
## Synopsis
|
||||
|
||||
Normal-callee invoked by tid=1's `entry_point → sub_8216EA68 → sub_822F1AA8`. Contains the per-frame game-loop pump for the post-init subsystem-dispatch tree (audit-064 chain to [sub_825070F0](sub_825070F0.md)). Runs an outer loop: (a) `KeWaitForSingleObject` infinite at PC `0x822F1DFC`, (b) dispatches vtable[0] of object at `*(0x828E1F08)` at PC `0x822F1B4C bctrl` — which is `sub_82175330` → tail-jump → `sub_82173990`. Canary executes the body 4040× in 60s (per-frame). Ours executes the function entry 1× then **blocks immediately inside sub_82173990 (the vtable[0] callee) at sub_82173990+0x2D0** — KeWaitForSingleObject INFINITE on handle `0x12A4` (= tid=13's thread handle = AUDIT-049 wedge).
|
||||
|
||||
## Evidence
|
||||
|
||||
- Disasm prolog: `mflr r12; bl 0x825F0F60; stfd f30, -136(r1); stfd f31, -128(r1); subi r31, r1, 256; stwu r1, -256(r1); mr r30, r3; ...` — normal-callee.
|
||||
- Function size: 996 bytes / 249 insns. `has_eh=True`, `frame_size=0` per DB (dynamic 256-byte frame).
|
||||
- Static caller: `sub_8216EA68+0x3AC` via `bl` (the post-init dispatcher).
|
||||
- AUDIT-064 ours `--ctor-probe=0x822F1AA8` -n 500M fires 1× at tid=1, cycle=6,171,801, lr=0x8216ee14, r3=0x40d09a40. Back-chain: tid=1 thread_proc → entry_point → sub_8216EA68+0x3AC → sub_822F1AA8.
|
||||
- AUDIT-064 ours fine probe at BB-entries `0x822F1ACC/0x822F1AEC/0x822F1B20/0x822F1B30/0x822F1B38` all fire 1× — execution does pass through the function body to PC `0x822F1B38`.
|
||||
- AUDIT-064 ours `--ctor-probe=0x822F1B50` fires **0×**. The bctrl at PC `0x822F1B4C` DOES execute (sub_82175330 fires 1× per separate probe), but never returns — because sub_82175330 tail-jumps to sub_82173990 which blocks at +0x2D0.
|
||||
- AUDIT-064 canary 60s probe (`--audit_61_branch_probe_pcs`): all probes in body fire — `0x822F1B5C/0x822F1B78/0x822F1BB8` fire 1×, `0x822F1BCC` (outer-loop body) fires 4040×, `0x822F1D58` (the inner bctrl → sub_821741C8) fires 4030×, `0x822F1DFC` (outer KeWait) fires 4040×.
|
||||
- Global `0x828E1F08` is the slot holding the object pointer that the vtable[0] bctrl dispatches off. Its writers are `sub_822F14D8+0xF0` (PC `0x822F15A4`, observed in ours) and `sub_822F1638+0x84` (PC `0x822F16BC`). At cycle ~6,171,800 in ours, `[0x828E1F08]` is set to `0x40111890`; `[0x40111890+0]` evolves through multiple vtable values (`0x820AD894`, `0x820A183C`, ...) before the bctrl fires; final value at bctrl is `0x820A183C` (slot 0 = `sub_82175330`).
|
||||
|
||||
## Activation
|
||||
|
||||
Direct `bl` from `sub_8216EA68+0x3AC` on tid=1. One-shot at boot per game session — but the function itself contains an outer loop that iterates `KeWaitForSingleObject` waits until termination.
|
||||
|
||||
## Static graph
|
||||
|
||||
- Static callers: `sub_8216EA68+0x3AC` via `bl` (sole).
|
||||
- Direct callees: `sub_822F13B0`, `sub_824AA2F0` (NtSetEvent wrapper), `sub_82172370`, `sub_824AA3E0`, `sub_824C1910` (leaf), `sub_824AA8B0`, `sub_82456B58`, `sub_824AA330` (KeWaitForSingleObject wrapper), `sub_824574C0`, `sub_82457038`, `sub_8284E45C` (kernel import thunk).
|
||||
- Indirect: `bctrl` at PC `0x822F1B4C` (vtable[0] of `*(0x828E1F08)`) and `bctrl` at PC `0x822F1D58` (vtable[2] of same).
|
||||
- Reads `0x828E1F08` at PCs `0x822F1B3C, 0x822F1BE8, 0x822F1D40, 0x822F1E44, ...` (11 reads).
|
||||
|
||||
## Audit log
|
||||
|
||||
- **AUDIT-065 (2026-05-12)** — refined the dispatch-target understanding. The vtable[0] callee at PC `0x822F1B4C bctrl` is `sub_82175330` (2-insn tail-jump to `sub_82173990`). `sub_82173990` is a **synchronous task-spawn-and-join helper** — not an outer game loop. Canary fires this function **exactly 1× per boot** (not 4040× as the synopsis previously suggested) — the 4040× metric in audit-064 referred to PCs *downstream of* sub_82173990's return into sub_822F1AA8's outer loop (PCs `0x822F1BCC`/`0x822F1D58`/`0x822F1DFC`). Per AUDIT-065 sub_82173990 dossier, the wait at sub_82173990+0x2D0 IS a thread-join, and the body of sub_82173990 itself is clean — only the worker spawned via `sub_821746B0` (which becomes ours's tid=13) is wedged inside `sub_821CB030`. [confirmed]
|
||||
- **AUDIT-064 (2026-05-12)** — identified as the immediate dispatch chain origin for the 4-fn ladder leading to sub_825070F0. Disasm + ours fine-grained BB probes localize the wedge: tid=1 enters function (1×), passes through PCs 0x822F1ACC/0x822F1AEC/0x822F1B20/0x822F1B30/0x822F1B38, executes bctrl at PC `0x822F1B4C` → sub_82175330 → sub_82173990 → KeWaitForSingleObject(handle=0x12A4 = tid=13 thread handle) → STALL. Canary instead returns from that wait and enters the outer game-loop body (`0x822F1BCC` fires 4040× in 60s). **First divergence between canary and ours is at sub_82173990's wait inside vtable[0] of `*(0x828E1F08)` — same AUDIT-049 wedge.** [confirmed]
|
||||
|
||||
## Open questions
|
||||
|
||||
- The vptr at `[0x40111890+0]` mutates multiple times before the bctrl fires (writes from sub_82152XXX, sub_8244e850, sub_8244e8bc, sub_82155b4c, sub_82460c40, sub_822F2758, sub_8216F110, ...). Is the final value `0x820A183C` (which has slot 0 = sub_82175330) the same as canary's final value? Run the same `--mem-watch` against canary to verify.
|
||||
- Why does canary's tid=13 finish (allowing tid=1's join wait on handle 0x12A4 to complete) while ours's tid=13 stalls? That's the AUDIT-049 root question — separately tracked.
|
||||
|
||||
## Cross-references
|
||||
|
||||
- Direct callers: `sub_8216EA68+0x3AC`.
|
||||
- Callees of interest: `sub_82173990` (via vtable[0] thunk `sub_82175330`) — where tid=1's stall occurs.
|
||||
- Downstream (when activated): `sub_82173990` → `sub_821741C8` → `sub_82172BA0` → `sub_821B55D8` → `sub_824F8398` → `sub_824F7CD0` → `sub_824F7800` → `sub_825070F0`.
|
||||
- Object dispatch: `*(0x828E1F08) = 0x40111890`, vptr `[0x40111890+0] = 0x820A183C` (vtable), slot 0 = `sub_82175330`.
|
||||
- Audits: 049 (the underlying wedge), 064.
|
||||
- Artifacts: `audit-runs/audit-064-activation-ladder/ours-fine-822F1AA8.stdout`, `ours-bb-822F1AA8.stdout`, `ours-vtable820a183c.stdout`, `ours-vptr-time.log`, `canary-inside-822F1AA8.log`.
|
||||
Reference in New Issue
Block a user