# Function dossiers — persistent RE notes for Project Sylpheed (Sylpheed.xex) ## What this is One markdown file per guest function we've investigated during a kernel-bug audit. The dossier is a **living, append-only record** of what we know (and what we got wrong) about each function. The goal is two-fold: 1. **Don't re-derive understanding.** When an audit touches `sub_821C4EB0`, the next agent shouldn't have to re-walk the disasm — read [sub_821C4EB0.md](sub_821C4EB0.md) first. 2. **Don't repeat misinterpretations.** AUDIT-060 falsified two audits of work because we'd read MSVC EH FuncInfo metadata as if it were static call edges. The dossier captures both the corrected reading AND the falsified one — so future agents see the trap was already sprung once. This system is **agent-writable**. Audit agents are expected to consult dossiers before probing, and to *append* (not rewrite) when a new audit produces evidence about a known function. Agents should create new dossiers for any function they perform non-trivial work on. ## Layout ``` docs/functions/ README.md — this file INDEX.md — one-line lookup table, sorted by address sub_XXXXXXXX.md — per-function dossier (one per function, address in UPPERCASE hex) ``` Filename convention: `sub_` + 8-hex-uppercase + `.md`. Match the name used in `sylpheed.db.functions.name`. If the function has a symbol (e.g. `GamePart_Title::UImpl::ctor`), still use the address-based filename; record the symbol inside. ## Schema Each dossier follows this shape: ```markdown --- address: 0xXXXXXXXX classification: confidence: last_audit: NNN aliases: - "human-readable name or prior misnomer (status)" --- # sub_XXXXXXXX ## Synopsis One short paragraph: the current best understanding. ONLY the latest consensus — old interpretations live in the audit log. ## Evidence Hard facts only. Disasm patterns, .rdata/.pdata references, runtime fires from instrumentation, byte-level dumps. No inference here; that goes in Activation or Notes. ## Activation When/how this function runs: - direct bl from caller X at PC Y - indirect via fnptr-array slot N at 0x... - vtable dispatch from class C, slot K (vtable at 0x...) - C++ EH catch-handler dispatch (FuncInfo @ 0x...) - thread_proc entry point (registered via ExCreateThread call site PC Z) ## Static graph - Callers (from sylpheed.db `xrefs` table, source_func column — never source per AUDIT-045): - PC `0xCCCCCCCC` inside `sub_DDDDDDDD` - Callees: - bl `sub_EEEEEEEE` at PC `0x...` - bctrl (computed) at PC `0x...` — candidates: ... ## Audit log Append-only. Most recent FIRST. Each entry pairs (audit-NNN, date, observation, status). Status options: confirmed | falsified | superseded-by-NNN. - **AUDIT-NNN (YYYY-MM-DD)** — observation + relevant data point [STATUS] - **AUDIT-MMM (YYYY-MM-DD)** — earlier observation [STATUS: falsified by NNN — reason] ## Open questions Future-work bullets: - Specific PC to probe - Hypothesis to test - Cross-reference to verify ## Cross-references - Related dossiers: [sub_XXXXX](sub_XXXXX.md) (relationship) - Audit memory entries: `project_xenia_rs_audit_NNN_*.md` - Trace artifacts: `audit-runs/audit-NNN-*/...` ``` ## Classification vocabulary Pick the **most specific** that fits. Add new ones if needed but don't bloat the list. | Class | Meaning | |-------|---------| | `normal_callee` | Plain function reached by direct `bl`. The default. | | `vtable_method` | Virtual method dispatched via `bctrl` from a class vtable. | | `thread_proc` | Entry point registered via `ExCreateThread` / `KeInitializeThread`. 0 static callers is correct; check for `lr=0xbcbcbcbc` thread-entry sentinel at first fire. | | `msvc_eh_catch_handler` | MSVC C++ catch handler. Prolog `subi r31, r12, N; mflr r12; ...`. Referenced from `.rdata` FuncInfo (magic `0x19930520..22`). 0 static callers; dispatched by EH runtime only. **Do not treat its `.rdata` references as call edges.** | | `msvc_eh_state_handler` | MSVC EH state/unwind handler. Similar to above but no `subi r31, r12` prolog. | | `import_thunk` | Wraps an xboxkrnl import (e.g. NtCreateEvent at thunk 0x8284DF1C). Behavior is host-side. | | `wrapper` | Thin wrapper around a kernel import or library call. | | `crt_init_driver` | CRT-style iterator that walks an array of fn pointers / vtables (e.g. `sub_824ACB38`). | | `fnptr_array_entry` | Function reached only via enumeration by a `crt_init_driver`. | | `dispatch_table_method` | Function installed into a runtime dispatch table by a ctor; reached via indirect call only. | | `synchronization_primitive` | Function that wraps Nt/Ke wait/set/release calls. | | `unknown` | Not yet investigated. Synopsis describes what little we know. | ## Confidence levels | Confidence | Meaning | |------------|---------| | `high` | Multiple converging evidence sources (disasm + runtime instrumentation + cross-engine probe). | | `medium` | One strong source (e.g. disasm alone or one canary trace). Plausible but not cross-checked. | | `low` | Inference from static call graph or one observation; should be probed if it becomes load-bearing. | | `refuted` | An earlier claim was falsified. Keep the dossier; document what the function actually is in synopsis + put the refuted claim in audit log with status `falsified`. | ## Golden rules — for agents and humans 1. **Append, don't overwrite.** New audits add entries to "Audit log". Old entries stay with their original wording so future readers can see the evolution. 2. **Falsify, don't delete.** If a later audit disproves an earlier claim, mark the old audit-log entry `[STATUS: falsified by AUDIT-NNN — reason]`. The earlier interpretation taught us *something* (often that a class of disasm pattern is ambiguous) — preserve it. 3. **Cite the source.** Every claim ties to either (a) an audit number + trace artifact path, or (b) a static-DB query you can reproduce. "X is a thread_proc" without a basis is unacceptable. 4. **Distinguish fact from inference.** "Fires 5× at -n 500M with lr=0x8246020C all five times" is a fact. "Therefore it's a vptr installer for slot 1 of dispatch_table 0x820B5830" is an inference. Put facts in Evidence; inferences in Synopsis/Activation/Notes — and label inferences as such. 5. **Update INDEX.md.** When you create a new dossier or change a classification, add/update the corresponding row in `INDEX.md`. 6. **Update the `last_audit` frontmatter.** Reflects the most recent audit that touched the dossier. 7. **One function per file.** If you find a fn is structurally a wrapper for another, write two dossiers and link them. ## Anti-patterns to avoid - **Reading EH metadata as call edges.** `.rdata` references to a fn inside an MSVC FuncInfo struct (magic `0x19930520..22` nearby) are unwind-handler bindings, NOT bl call sites. Pattern: catch-handler prolog `subi r31, r12, N; mflr r12; stwu r1, ...`. See [sub_821B6DF4.md](sub_821B6DF4.md) for the canonical falsified example. - **"0 static callers" = "dead in ours".** Three legitimate reasons a fn has 0 static callers and still runs: thread_proc (ExCreateThread), fnptr_array_entry (enumerated by crt_init_driver), msvc_eh_*_handler (dispatched by EH runtime). Always check. - **Comparing fire counts at fixed instruction horizons across engines.** Canary @ 60s wallclock and ours @ -n 500M are different time bases. State (i) and state (ii) data points must be normalized — either both at the same wallclock or both at the same boot milestone. - **Trusting handle IDs across runs.** `KernelState::alloc_handle` is monotonic; handles drift run-to-run. Function-context names (e.g. "sub_821CB030+0x128 creator") are stable; handle IDs are not. - **Quoting xrefs.source instead of xrefs.source_func.** See AUDIT-045 reading-error #12. Use `source_func` for caller-set queries. ## Backfill status Initial set (created in AUDIT-060 retrospective backfill, 2026-05-12): - The 10 most-cited fns from AUDIT-049–060. Future audits should extend coverage as they touch new fns. Backfilling earlier audit fns (AUDIT-030–048) is a nice-to-have but not blocking.