ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge

Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-29 07:27:26 +02:00
parent e6d43a23ac
commit ad45873a1b
50 changed files with 14389 additions and 506 deletions

View File

@@ -0,0 +1,56 @@
---
address: 0x82172BA0
classification: normal_callee
confidence: high
last_audit: 064
aliases:
- "Vtable-slot-6 array-walker / AUDIT-033 T6-gateway descendant"
---
# sub_82172BA0 — array-walk dispatcher (vtable slot 6)
## Synopsis
Normal-callee dispatcher. Walks an array of object pointers (header at `r29+56`: `r29+56[8]` = element count `>>2`; `r29+56[4]` = data ptr) and invokes vtable slot 6 (`lwz r11, 24(r11)`) on each. The `bctrl` at PC `0x82172D88` is the slot-6 dispatch site — observed in canary firing into [sub_821B55D8](sub_821B55D8.md). Has a critical-section prologue (`lwarx`/`stwcx.` at PC `0x82172C08..0x82172C14`) protecting the array snapshot. Only fires when caller `sub_821741C8` sees `[r30+4]` mask-3-bits-field == 4. AUDIT-064 verified canary fires 2× at 180s wallclock; ours fires 0× because tid=1's wait at `sub_82173990+0x2D0` (handle 0x12A4 = tid=13 thread handle) never completes.
## Evidence
- Disasm prolog at `0x82172BA0`: `mflr r12; bl 0x825F0F78 (frame helper); subi r31, r1, 176; stwu r1, -176(r1); mr r29, r3; ...` — normal-callee prolog, frame ptr `r31 = r1-176`. NOT MSVC EH handler.
- Function size: 604 bytes / 151 insns. `has_eh=True`, `frame_size=0` per DB (dynamic).
- Static caller xref (sole): PC `0x821744C8` inside `sub_821741C8` via `bl`. Gating disasm at `sub_821741C8+0x2C8..2C8` matches mask-bits of `[r30+4]` to value 4 to take this call.
- The bctrl at PC `0x82172D88` operates on slot 6 (`lwz r11, 24(r11)` = byte-offset 24 in 4-byte slots = slot index 6).
- AUDIT-064 canary 60s+180s probes: fires 1-2× with `lr=0x821744CC r3=BCCC4A80 r4=BC369160 r5=BC369160 r6=03A72328` on tid=6. PC `0x82172D88` (the bctrl) fires 2× at 60s in upstream probe.
- AUDIT-064 ours `--ctor-probe=0x82172BA0` -n 500M: **0 fires**.
- Critical-section pattern at `0x82172C08..0x82172C14`: `mfmsr r8; mtmsrd r13; lwarx r9, r0, r10; stwcx. r11, r0, r10; mtmsrd r8; bne 0x82172C00` — disable interrupts → atomic swap → restore.
## Activation
Direct `bl` from `sub_821741C8+0x300` (PC `0x821744C8`). Conditional: `sub_821741C8` masks `[r30+4]` via `rlwinm r11, r11, 0, 27, 29` and switches on the 3-bit field — value `4` selects this fn, value `8` selects `sub_82172E58`, else no-op.
## Static graph
- Static callers (DB):
- `sub_821741C8+0x300` via `bl`.
- Callees:
- `sub_822F2328` (PC `0x82172BC4`).
- `sub_8284DCFC` (PC `0x82172BD4`) — likely a kernel sync primitive.
- `sub_8228E138` (PC `0x82172BF4`).
- Indirect via `bctrl` at PC `0x82172D88` (slot 6) and other vtable slots inside the body.
- DB lists many `ind_call` targets recorded for PC `0x82172D88` (sub_82680370, sub_823A2258, sub_82455300, sub_827E8D60, sub_8237B020, sub_82398CC0, sub_82391BA8, sub_827ED308, sub_826B24E8, sub_822C7418, sub_821F8340, sub_823800A8, sub_824A6C00, sub_823762E8, sub_825ED990, sub_827EFED0, sub_822B06A0, sub_82455658, sub_82388FF8, sub_827FA850, sub_8232C4C0, sub_8238EC10, sub_82674028, sub_823929D0, ...). **Critical caveat**: this list is missing `sub_821B55D8` despite that being the runtime target observed in canary — the dynamic-target inference has gaps.
## Audit log
- **AUDIT-064 (2026-05-12)** — disasm confirms array-walk dispatcher pattern; canary fires 1-2× / ours 0×. The runtime activation chain for sub_825070F0 starts here. **Convergence finding**: ours never reaches sub_82172BA0 because tid=1 is stalled at `sub_82173990+0x2D0` (handle 0x12A4 = tid=13's thread handle — AUDIT-049 wedge). The whole 5-level ladder downstream is gated by this wait. [confirmed]
## Open questions
- What is the array at `[r29+56]`? Likely a list of subsystem objects (graphics, audio, input, etc.) the game-loop dispatcher iterates each frame. Canary `r3=0xBCCC4A80` is the dispatcher object.
- The `bctrl`'s xref-table is incomplete (missing `sub_821B55D8`). Investigate the dynamic-target inference's gap.
## Cross-references
- Callers: `sub_821741C8+0x300`.
- Callees (via bctrl): `sub_821B55D8` (observed in canary), plus 50+ others recorded in DB.
- Upstream: `sub_822F1AA8` → vtable[0]=`sub_82173990` → calls `sub_821741C8`.
- Audits: 033 (T6 gateway analysis), 058, 064.
- Artifacts: `audit-runs/audit-064-activation-ladder/canary-{60,120,180}s.log`, `canary-upstream-60s.log`, `canary-inside-822F1AA8.log`.