Files
xenia-rs/docs/functions
MechaCat02 ad45873a1b ITERATE-2.V: scheduler priority aging closes 18-day AUDIT-049 wedge
Priority aging in xenia-cpu/scheduler.rs:pick_runnable
(effective_priority = base + age_bonus(now_round - last_run_round),
capped at +31, AGING_ROUNDS_PER_BONUS=1). Strict-priority was parking
priority=0 threads behind CPU-bound priority=15 audio mixer
(sub_824D1328 guest spinwait at PC=0x824d1404 on CPU5). Aging
eventually picks the starved thread, breaking the producer-consumer
cycle that caused 5-tid wedge at PC=0x824ac578 since AUDIT-049 (10 May).

Cascade observed: tid=13 clean exit; events 121K -> 13M (107x); last
host_ns 767ms -> 51,011ms (66x); 8 new threads spawn; VdSwap 1 -> 2.

Complete two-day iterate sequence (2026-05-27 -> 2026-05-28):
- 2.F: VdSwap drain timeout 900ms -> 1ms (xenia-gpu/handle.rs); 876x
       perf win on VdSwap kernel callback
- 2.H: vA0000000 physical heap bucket added (state.rs, exports.rs);
       ctx_ptrs now in 0xA0000000-0xBFFFFFFF range matching canary
- 2.L: Phase-A diff harness categorized [return_value mismatch],
       [status mismatch], [args_resolved.path mismatch] tags
       (tools/diff-events/diff_events.py); closes reading-error #41
       (silent test-harness state leak invalidating trace diffs)
- 2.M: always-on exit-thread-state.json sibling to Phase-A JSONL
       (event_log.rs + xenia-app/main.rs); closes reading-error #42
       (Phase-A blind to blocked-forever waits)
- 2.Q: signal.match kernel instrumentation in NtSetEvent /
       NtReleaseSemaphore / KeSetEvent / KeReleaseSemaphore
       (exports.rs); emits target_handle + waiter_count + waiter_tids
- 2.T: wake.requested kernel instrumentation in wake_eligible_waiters
       (exports.rs); emits target_tid + transition + new_state
- 2.V: scheduler priority aging (xenia-cpu/scheduler.rs) [keystone]

Plus accumulated WIP from earlier May (contention_manifest,
phase_b_snapshot, xam/xaudio enhancements, analysis db, xex loader,
xenia-app main loop, etc.). Audit-runs/ artifacts remain untracked
per project convention.

Tests: 300 xenia-cpu / 227 xenia-kernel / 5 xenia-app / 19 xenia-path
/ 30+ smaller suites -- all PASS, 0 regressions. Determinism preserved
(2x cold runs bit-identical at 13,003,881 events post-2.V).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-05-29 07:27:26 +02:00
..

Function dossiers — persistent RE notes for Project Sylpheed (Sylpheed.xex)

What this is

One markdown file per guest function we've investigated during a kernel-bug audit. The dossier is a living, append-only record of what we know (and what we got wrong) about each function. The goal is two-fold:

  1. Don't re-derive understanding. When an audit touches sub_821C4EB0, the next agent shouldn't have to re-walk the disasm — read sub_821C4EB0.md first.
  2. Don't repeat misinterpretations. AUDIT-060 falsified two audits of work because we'd read MSVC EH FuncInfo metadata as if it were static call edges. The dossier captures both the corrected reading AND the falsified one — so future agents see the trap was already sprung once.

This system is agent-writable. Audit agents are expected to consult dossiers before probing, and to append (not rewrite) when a new audit produces evidence about a known function. Agents should create new dossiers for any function they perform non-trivial work on.

Layout

docs/functions/
  README.md           — this file
  INDEX.md            — one-line lookup table, sorted by address
  sub_XXXXXXXX.md     — per-function dossier (one per function, address in UPPERCASE hex)

Filename convention: sub_ + 8-hex-uppercase + .md. Match the name used in sylpheed.db.functions.name. If the function has a symbol (e.g. GamePart_Title::UImpl::ctor), still use the address-based filename; record the symbol inside.

Schema

Each dossier follows this shape:

---
address: 0xXXXXXXXX
classification: <one-word category — see below>
confidence: <high | medium | low | refuted>
last_audit: NNN
aliases:
  - "human-readable name or prior misnomer (status)"
---

# sub_XXXXXXXX

## Synopsis

One short paragraph: the current best understanding. ONLY the latest consensus —
old interpretations live in the audit log.

## Evidence

Hard facts only. Disasm patterns, .rdata/.pdata references, runtime fires from
instrumentation, byte-level dumps. No inference here; that goes in Activation
or Notes.

## Activation

When/how this function runs:
- direct bl from caller X at PC Y
- indirect via fnptr-array slot N at 0x...
- vtable dispatch from class C, slot K (vtable at 0x...)
- C++ EH catch-handler dispatch (FuncInfo @ 0x...)
- thread_proc entry point (registered via ExCreateThread call site PC Z)

## Static graph

- Callers (from sylpheed.db `xrefs` table, source_func column — never source per AUDIT-045):
  - PC `0xCCCCCCCC` inside `sub_DDDDDDDD`
- Callees:
  - bl `sub_EEEEEEEE` at PC `0x...`
  - bctrl (computed) at PC `0x...` — candidates: ...

## Audit log

Append-only. Most recent FIRST. Each entry pairs (audit-NNN, date, observation,
status). Status options: confirmed | falsified | superseded-by-NNN.

- **AUDIT-NNN (YYYY-MM-DD)** — observation + relevant data point [STATUS]
- **AUDIT-MMM (YYYY-MM-DD)** — earlier observation [STATUS: falsified by NNN — reason]

## Open questions

Future-work bullets:
- Specific PC to probe
- Hypothesis to test
- Cross-reference to verify

## Cross-references

- Related dossiers: [sub_XXXXX](sub_XXXXX.md) (relationship)
- Audit memory entries: `project_xenia_rs_audit_NNN_*.md`
- Trace artifacts: `audit-runs/audit-NNN-*/...`

Classification vocabulary

Pick the most specific that fits. Add new ones if needed but don't bloat the list.

Class Meaning
normal_callee Plain function reached by direct bl. The default.
vtable_method Virtual method dispatched via bctrl from a class vtable.
thread_proc Entry point registered via ExCreateThread / KeInitializeThread. 0 static callers is correct; check for lr=0xbcbcbcbc thread-entry sentinel at first fire.
msvc_eh_catch_handler MSVC C++ catch handler. Prolog subi r31, r12, N; mflr r12; .... Referenced from .rdata FuncInfo (magic 0x19930520..22). 0 static callers; dispatched by EH runtime only. Do not treat its .rdata references as call edges.
msvc_eh_state_handler MSVC EH state/unwind handler. Similar to above but no subi r31, r12 prolog.
import_thunk Wraps an xboxkrnl import (e.g. NtCreateEvent at thunk 0x8284DF1C). Behavior is host-side.
wrapper Thin wrapper around a kernel import or library call.
crt_init_driver CRT-style iterator that walks an array of fn pointers / vtables (e.g. sub_824ACB38).
fnptr_array_entry Function reached only via enumeration by a crt_init_driver.
dispatch_table_method Function installed into a runtime dispatch table by a ctor; reached via indirect call only.
synchronization_primitive Function that wraps Nt/Ke wait/set/release calls.
unknown Not yet investigated. Synopsis describes what little we know.

Confidence levels

Confidence Meaning
high Multiple converging evidence sources (disasm + runtime instrumentation + cross-engine probe).
medium One strong source (e.g. disasm alone or one canary trace). Plausible but not cross-checked.
low Inference from static call graph or one observation; should be probed if it becomes load-bearing.
refuted An earlier claim was falsified. Keep the dossier; document what the function actually is in synopsis + put the refuted claim in audit log with status falsified.

Golden rules — for agents and humans

  1. Append, don't overwrite. New audits add entries to "Audit log". Old entries stay with their original wording so future readers can see the evolution.
  2. Falsify, don't delete. If a later audit disproves an earlier claim, mark the old audit-log entry [STATUS: falsified by AUDIT-NNN — reason]. The earlier interpretation taught us something (often that a class of disasm pattern is ambiguous) — preserve it.
  3. Cite the source. Every claim ties to either (a) an audit number + trace artifact path, or (b) a static-DB query you can reproduce. "X is a thread_proc" without a basis is unacceptable.
  4. Distinguish fact from inference. "Fires 5× at -n 500M with lr=0x8246020C all five times" is a fact. "Therefore it's a vptr installer for slot 1 of dispatch_table 0x820B5830" is an inference. Put facts in Evidence; inferences in Synopsis/Activation/Notes — and label inferences as such.
  5. Update INDEX.md. When you create a new dossier or change a classification, add/update the corresponding row in INDEX.md.
  6. Update the last_audit frontmatter. Reflects the most recent audit that touched the dossier.
  7. One function per file. If you find a fn is structurally a wrapper for another, write two dossiers and link them.

Anti-patterns to avoid

  • Reading EH metadata as call edges. .rdata references to a fn inside an MSVC FuncInfo struct (magic 0x19930520..22 nearby) are unwind-handler bindings, NOT bl call sites. Pattern: catch-handler prolog subi r31, r12, N; mflr r12; stwu r1, .... See sub_821B6DF4.md for the canonical falsified example.
  • "0 static callers" = "dead in ours". Three legitimate reasons a fn has 0 static callers and still runs: thread_proc (ExCreateThread), fnptr_array_entry (enumerated by crt_init_driver), msvc_eh_*_handler (dispatched by EH runtime). Always check.
  • Comparing fire counts at fixed instruction horizons across engines. Canary @ 60s wallclock and ours @ -n 500M are different time bases. State (i) and state (ii) data points must be normalized — either both at the same wallclock or both at the same boot milestone.
  • Trusting handle IDs across runs. KernelState::alloc_handle is monotonic; handles drift run-to-run. Function-context names (e.g. "sub_821CB030+0x128 creator") are stable; handle IDs are not.
  • Quoting xrefs.source instead of xrefs.source_func. See AUDIT-045 reading-error #12. Use source_func for caller-set queries.

Backfill status

Initial set (created in AUDIT-060 retrospective backfill, 2026-05-12):

  • The 10 most-cited fns from AUDIT-049060.

Future audits should extend coverage as they touch new fns. Backfilling earlier audit fns (AUDIT-030048) is a nice-to-have but not blocking.