# AUDIT-069 Session 1 — wait-signal producer identification Date: 2026-05-20 Status: **LANDED — signaler tid + caller fns identified; AUDIT-066 circular framing FALSIFIED** ## Headline The wait at `sub_821CB030+0x1AC` (PC `0x821CB1DC`) — the canonical AUDIT-049/065 wedge wait — fires in canary on two tids (worker tid=17 and cache-loader tid=26). Both wedges are signaled by **tid=10**, a worker thread spawned EARLY (via `sub_8244FF50` → `ExCreateThread(entry=sub_82450A28)`), NOT by any of the four workers spawned by `sub_825070F0`. This refutes AUDIT-066's circular framing ("γ-signaler running inside the 4 workers spawned by sub_825070F0"): the actual signaler reaches the production phase WITHOUT depending on sub_825070F0 firing. ## Step 1 — wait site capture (canary) Probe: `--audit_61_branch_probe_pcs=0x821CB1DC --mute=true`, 180s cold. | tid | r3 (handle) | r4 (timeout) | r5 (wait_mode) | r6 (ctx) | r31 (stack) | lr | |----:|------------:|-------------:|---------------:|---------:|------------:|---:| | 17 | `F80000A4` | `FFFFFFFF` | `0` (auto) | `BC65CEC0` | `7064FA70` | `0x821CB1D0` | | 26 | `F8000110` | `FFFFFFFF` | `0` (auto) | `BC667F80` | `708FF990` | `0x821CB1D0` | Two distinct fires (one per logical caller). Both have r4=INFINITE timeout matching dossier. The lr=`0x821CB1D0` is `sub_821CB030+0x1A0` = the instruction AFTER the bl-wait — consistent with branch-probe firing at the basic-block-entry following the wait-call's return. Handle drift across cold runs is real: Step 1 vs Step 3 vs Step 4 trajectories produced wait handles `{F80000A0,F8000108}` / `{F80000A0,F8000108}` / `{F80000A4,F8000110}`. Per-run handles are still deterministic; the absolute ID is not. **Important framing correction**: The brief expected "~16 fires" per AUDIT-065. This was already partly retracted by AUDIT-066 (which observed that thid=17 "terminates via `ExTerminateThread(0)` WITHOUT ever calling Wait inside its cache loop"). Step 1 confirms AUDIT-066's correction: the wait at `+0x1AC` fires ~2× per boot (one for the work-queue load that ANON_Class_713383D7 work goes through; one for the cache-loader sister-flow). Not 16. The wait is the WORK-QUEUE wait, not a per-cache-file IO wait. Confidence: HIGH (probe fired, r3/r4/r5 match expected wait-call ABI, two distinct logical fires reproducible across cold runs). ## Step 2 — instrumentation (canary, ~280 LOC additive) New `audit_69_*` cvars + slowpath module: - **cpu_flags.{h,cc}** (+23/+48 LOC, of which ~30 LOC are mine vs cumulative): - `--audit_69_event_signal_watch` (CSV of guest handle IDs, max 4) - `--audit_69_event_signal_native_ptr` (CSV of guest VAs, max 4) - `--audit_69_log_all_sets` (bool — log EVERY XEvent::Set/Pulse fire) - **xenia-kernel/audit_69_event_signal_watch.h** (51 LOC) — fwd decls, hot-path inline wrapper (single relaxed atomic load + branch). - **xenia-kernel/audit_69_event_signal_watch.cc** (193 LOC) — lazy parse + UINT32_MAX sentinel + `XThread::TryGetCurrentThread()` for lr/tid capture. Mirrors AUDIT-068's static-init gate pattern. - **xenia-kernel/xevent.cc** (+9 LOC) — hook at `XEvent::Set` and `XEvent::Pulse` (the deepest convergence of Ke/Nt set + pulse paths). Reading-error registration: `XThread::GetCurrentThread()` asserts on host threads; first iteration used it and crashed. Fixed by switching to `TryGetCurrentThread()`. (Same lesson as AUDIT-067's bool-vs-pointer asymmetry but in a different fn.) Cumulative cross-run canary additions retained in tree (AUDIT-061/067/068/069). ## Step 3 — correlated capture Run: cold, 180s, `--mute=true --audit_61_branch_probe_pcs=0x821CB1DC,0x824AA2F0,0x824AAF50 --audit_69_log_all_sets=true`. Volume: 122,165 log lines (Step 3) / 155,627 lines (Step 4 with wrapper probes). Wait fires (Step 4): 2 (tid=17, tid=26, as in Step 1 but with handle drift to F80000A4/F8000110). Signals on wedge handles (Step 4): | wedge handle (waited on) | wait tid | signal fires | signal lr | signaling fn | signal tid | |---|---|---|---|---|---| | `0xF80000A4` | 17 | **1** | `0x824AA304` | `sub_824AA2F0` (NtSetEvent wrapper) | **10** | | `0xF8000110` | 26 | **100** | `0x824AAFC8` | `sub_824AAF50` (a generic event-set-with-arg wrapper) | **10** | The 100 fires on F8000110 are repeats — auto-reset events fire on first signal; the rest are no-ops. Volume reflects how often the work-queue processes items targeting this synchronizer. ## Step 4 — signaler-fn resolution (sylpheed.db cross-check) Wrapper-entry probe data for these two NtSet wrappers, filtered to tid=10: | wrapper | lr-of-caller | caller fn | tid=10 fire count | |---|---|---|---| | `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DA44` | **`sub_8245D9D8`** (γ-signaler D-A per AUDIT-062) | 23 | | `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DB08` | **`sub_8245DA78`** (γ-signaler D-B per AUDIT-062) | 8 | | `sub_824AAF50` (Ke-style wrapper) | `0x8245DC5C` | **`sub_8245DB40`** (NEW — not previously named) | 461 | `sub_824AAF50` disasm needs follow-up but lr=0x824AAFC8 = `sub_824AAF50+0x78` position is consistent with a `bl xeKeSetEvent` followed by status check in an N-arg helper. The wrapper takes `(handle, ptr, size)` and the internally-signaled event has a different handle from the input. Containing-fn cross-check (`sylpheed.db`): - `sub_8245D9D8` and `sub_8245DA78` are in the worker cluster (0x82450000-0x8245C000). Per AUDIT-062: both are γ-signaler-D family, hot from worker-side, missed by AUDIT-059/060 enumeration. - `sub_8245DB40` is in the same cluster; callers are `sub_824528A8+0x54` and `sub_8245EE50+0x20` (both worker-cluster internal). - All three are reached from tid=10's body fn `sub_82450A68`, the trampoline body for the entry `sub_82450A28` (which `ExCreateThread` registers via `sub_8244FF50`). **tid=10 caller chain (canary)**: ``` sub_8244FEA8 (caller of sub_8244FF50; itself called from 11 sites) → sub_8244FF50 (spawner — calls ExCreateThread w/ entry=sub_82450A28) → sub_82450A28 (thread-entry trampoline: KeSetThreadPriority(-2, 3); bl sub_82450A68) → sub_82450A68 (worker dispatch loop) → ... γ-signalers D / DA78 / DB40 ``` `sub_82450A28` is referenced as a data pointer at `0x8244FFF8` (inside `sub_8244FF50`). No call edges to it — it's purely a thread-entry data constant passed to ExCreateThread. ## Step 5 — ours cross-reference All identified signaler fns (`sub_8245D9D8`, `sub_8245DA78`, `sub_8245DB40`, `sub_824AA2F0`, `sub_824AAF50`, `sub_82450A28`, `sub_8244FF50`) are GAME (XEX) code — not kernel-imports. In ours these execute under the JIT, with no host-side analog to compare. The relevant question is whether the trajectory in ours REACHES these PCs. Direct evidence from prior runs: **AUDIT-062 ours `--lr-trace=0x824aa2f0`** trace (`ours-ntset.jsonl`, 136 fires across cold boot up to deadlock): - tid=6: 82 NtSet fires - tid=1: 28 fires - tid=5: 22 fires - tid=8: 2 fires - tid=13: 2 fires - **tid=10: 0 fires** ours NEVER spawns the canary-equivalent of tid=10 (the `sub_8244FF50/sub_82450A28/sub_82450A68` worker). This is consistent with AUDIT-057's "thread-gap" finding: ours has fewer threads than canary. Within ours, the γ-signalers DO fire — but on tid=5 (calling sub_824AA2F0 from lr=`0x8245DA44` = `sub_8245D9D8+0x6C`) per AUDIT-062's `ours-ntset.jsonl:line 1`. AUDIT-062 already established these signal WRONG handles in ours (neighbors of `0x12AC` are signaled; the wedge handle itself is not). **Conclusion**: ours's signaler PCs exist and run, but on the wrong tids (no tid=10 equivalent), and target the wrong handles. The PRODUCER → SIGNALER chain in ours is structurally broken at the **thread-spawn** layer, not the kernel-import layer. Confidence (Step 5): MEDIUM-HIGH for the chain identification (data is internally consistent and matches AUDIT-062's prior independent capture). LOW on the ours-side resolution mechanism (this audit did not re-run ours; cross-ref is read-only against prior dumps which may be stale relative to current ours HEAD `e6d43a23…`). ## AUDIT-066 framing refutation AUDIT-066 stated: > the producer-side signal for THAT event comes from a γ-signaler running > inside the 4 workers spawned by sub_825070F0 — per AUDIT-063's > static-reachability survey of NtSet wrapper callers. This is **falsified** by AUDIT-069 Step 3+4 evidence: 1. The signaler runs on **tid=10**, spawned by `sub_8244FF50` via `ExCreateThread(entry=sub_82450A28)`. This is NOT one of sub_825070F0's 4 workers. 2. sub_8244FF50's caller chain does NOT require ANON_Class_713383D7's vtable to be installed; it does NOT require sub_825070F0 to fire. 3. The circular-bootstrap concern AUDIT-066 raised ("workers can't signal until they spawn; they can't spawn until the wedge clears") was structurally correct framing IF the signaler were inside the sub_825070F0 4-worker family. Since the actual signaler is tid=10 (independently spawned), the circle is **broken** — the signaler IS reachable without the wedge clearing. Reading-error class **#37**: static-reachability surveys (AUDIT-063 walked 12 hops from sub_82452DC0 to NtSet wrapper callers) are scoped to a particular caller chain; they miss alternative producer paths reached via unrelated thread-spawn sites. Always probe at the runtime SIGNAL site to confirm which exact caller fired, not just which static path could fire. ## Cascade outcome - **A** (capture wait site PC + r3=handle in canary): **PASS**. PC `0x821CB1DC`, r3 captures the handle on first fire reproducibly. - **B** (capture signal fires on the wait targets): **PASS**. 1 fire on F80000A4 (wedge handle 1), 100 fires on F8000110 (wedge handle 2). - **C** (resolve signaling fn + immediate caller fn): **PASS**. `sub_824AA2F0` ← `sub_8245D9D8` / `sub_8245DA78` (γ-signaler D family); `sub_824AAF50` ← `sub_8245DB40` (new). All on tid=10. - **D** (ours-side cross-ref): **PARTIAL**. tid=10 IS missing in ours per existing AUDIT-062 data; γ-signalers DO fire but on wrong tids. Did not re-run ours in this session (per task discipline; cross-ref read-only against prior dumps). Net 3/4 PASS, 1/4 PARTIAL. ## Discipline - xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED. `git diff HEAD | sha256sum` at session start = `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357` and at session end IDENTICAL. - Canary patch is purely additive, cvar-gated default-off, UINT32_MAX sentinel + std::once parse pattern (per AUDIT-068 discipline). - Every canary run used `--mute=true`. - Cache wiped before each cold run (4 cold runs total: Step 1 90s, Step 1 180s rerun, Step 3 with handle watch, Step 3 with log_all_sets, Step 4 with wrapper probes). Each cache moved to `/tmp/_audit_069_step*` before next cold run. - Cache restoration from `/tmp/canary-cache-bak-audit-068` deferred to session end (done after this report). ## Artifacts ``` xenia-rs/audit-runs/audit-069-wait-signal-producer/ step1-wait-probe.log (90s baseline; 2 wait fires) step1-wait-probe.stdout step1-wait-probe-180s.log (180s rerun; 2 wait fires) step1-wait-probe-180s.stdout step3-signal-probe.log (180s; first signal-watch test; handles drifted, partial correlation) step3-signal-probe.stdout step3-correlated.log (180s; log_all_sets; 120k signal fires) step3-correlated.stdout step4-wrapper-callers.log (180s; log_all_sets + wrapper entries; 155k events; correlated lr-to-caller) step4-wrapper-callers.stdout fix-canary.diff (cumulative canary diff vs 6de80dffe) writer-report.md (this file) ``` ## Session 2 recommendation Two paths, both <100 LOC ours-side: **Path 1 (ours read-only probe + targeted root-cause)**: re-run ours with `--ctor-probe=0x82450A28` (the canary-tid=10 entry) — confirm it never fires. Then `--ctor-probe=0x8244FF50` (the spawner). If sub_8244FF50 also never fires, walk up its 11 callers in sylpheed.db — likely one of them gates on a flag/event that's not set in ours's early-boot trajectory. **Path 2 (canary additional capture)**: probe canary's tid=10 spawn sequence in detail. Add `audit_69_thread_spawn_watch` cvar that logs every ExCreateThread call with (entry_pc, ctx, suspend_flag, caller_lr). ~40 LOC. Compare to ours's spawn list — find which call goes unfired in ours. Both paths are cheaper than continuing on the wedge directly. Path 1 is preferred: it stays on the ours side which is the failing engine. Predicted Session 2 cascade: - A (find sub_82450A28's first-non-fire ancestor in ours): 75-85% - B (identify the missing precondition for that ancestor): 50-60% - C (fix LOC in ours ≤ 50): 30-40% - D (draws>0): 15-25% (single wedge unlock)