Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
13 KiB
AUDIT-069 Session 1 — wait-signal producer identification
Date: 2026-05-20 Status: LANDED — signaler tid + caller fns identified; AUDIT-066 circular framing FALSIFIED
Headline
The wait at sub_821CB030+0x1AC (PC 0x821CB1DC) — the canonical
AUDIT-049/065 wedge wait — fires in canary on two tids (worker tid=17 and
cache-loader tid=26). Both wedges are signaled by tid=10, a worker
thread spawned EARLY (via sub_8244FF50 → ExCreateThread(entry=sub_82450A28)),
NOT by any of the four workers spawned by sub_825070F0. This refutes
AUDIT-066's circular framing ("γ-signaler running inside the 4 workers
spawned by sub_825070F0"): the actual signaler reaches the production
phase WITHOUT depending on sub_825070F0 firing.
Step 1 — wait site capture (canary)
Probe: --audit_61_branch_probe_pcs=0x821CB1DC --mute=true, 180s cold.
| tid | r3 (handle) | r4 (timeout) | r5 (wait_mode) | r6 (ctx) | r31 (stack) | lr |
|---|---|---|---|---|---|---|
| 17 | F80000A4 |
FFFFFFFF |
0 (auto) |
BC65CEC0 |
7064FA70 |
0x821CB1D0 |
| 26 | F8000110 |
FFFFFFFF |
0 (auto) |
BC667F80 |
708FF990 |
0x821CB1D0 |
Two distinct fires (one per logical caller). Both have r4=INFINITE timeout
matching dossier. The lr=0x821CB1D0 is sub_821CB030+0x1A0 = the
instruction AFTER the bl-wait — consistent with branch-probe firing at the
basic-block-entry following the wait-call's return.
Handle drift across cold runs is real: Step 1 vs Step 3 vs Step 4 trajectories
produced wait handles {F80000A0,F8000108} / {F80000A0,F8000108} /
{F80000A4,F8000110}. Per-run handles are still deterministic; the absolute
ID is not.
Important framing correction: The brief expected "~16 fires" per
AUDIT-065. This was already partly retracted by AUDIT-066 (which observed
that thid=17 "terminates via ExTerminateThread(0) WITHOUT ever calling
Wait inside its cache loop"). Step 1 confirms AUDIT-066's correction:
the wait at +0x1AC fires ~2× per boot (one for the work-queue load
that ANON_Class_713383D7 work goes through; one for the cache-loader
sister-flow). Not 16. The wait is the WORK-QUEUE wait, not a per-cache-file
IO wait.
Confidence: HIGH (probe fired, r3/r4/r5 match expected wait-call ABI, two distinct logical fires reproducible across cold runs).
Step 2 — instrumentation (canary, ~280 LOC additive)
New audit_69_* cvars + slowpath module:
- cpu_flags.{h,cc} (+23/+48 LOC, of which ~30 LOC are mine vs cumulative):
--audit_69_event_signal_watch(CSV of guest handle IDs, max 4)--audit_69_event_signal_native_ptr(CSV of guest VAs, max 4)--audit_69_log_all_sets(bool — log EVERY XEvent::Set/Pulse fire)
- xenia-kernel/audit_69_event_signal_watch.h (51 LOC) — fwd decls, hot-path inline wrapper (single relaxed atomic load + branch).
- xenia-kernel/audit_69_event_signal_watch.cc (193 LOC) — lazy parse +
UINT32_MAX sentinel +
XThread::TryGetCurrentThread()for lr/tid capture. Mirrors AUDIT-068's static-init gate pattern. - xenia-kernel/xevent.cc (+9 LOC) — hook at
XEvent::SetandXEvent::Pulse(the deepest convergence of Ke/Nt set + pulse paths).
Reading-error registration: XThread::GetCurrentThread() asserts on host
threads; first iteration used it and crashed. Fixed by switching to
TryGetCurrentThread(). (Same lesson as AUDIT-067's bool-vs-pointer
asymmetry but in a different fn.)
Cumulative cross-run canary additions retained in tree (AUDIT-061/067/068/069).
Step 3 — correlated capture
Run: cold, 180s, --mute=true --audit_61_branch_probe_pcs=0x821CB1DC,0x824AA2F0,0x824AAF50 --audit_69_log_all_sets=true.
Volume: 122,165 log lines (Step 3) / 155,627 lines (Step 4 with wrapper probes).
Wait fires (Step 4): 2 (tid=17, tid=26, as in Step 1 but with handle drift to F80000A4/F8000110).
Signals on wedge handles (Step 4):
| wedge handle (waited on) | wait tid | signal fires | signal lr | signaling fn | signal tid |
|---|---|---|---|---|---|
0xF80000A4 |
17 | 1 | 0x824AA304 |
sub_824AA2F0 (NtSetEvent wrapper) |
10 |
0xF8000110 |
26 | 100 | 0x824AAFC8 |
sub_824AAF50 (a generic event-set-with-arg wrapper) |
10 |
The 100 fires on F8000110 are repeats — auto-reset events fire on first signal; the rest are no-ops. Volume reflects how often the work-queue processes items targeting this synchronizer.
Step 4 — signaler-fn resolution (sylpheed.db cross-check)
Wrapper-entry probe data for these two NtSet wrappers, filtered to tid=10:
| wrapper | lr-of-caller | caller fn | tid=10 fire count |
|---|---|---|---|
sub_824AA2F0 (NtSetEvent wrapper) |
0x8245DA44 |
sub_8245D9D8 (γ-signaler D-A per AUDIT-062) |
23 |
sub_824AA2F0 (NtSetEvent wrapper) |
0x8245DB08 |
sub_8245DA78 (γ-signaler D-B per AUDIT-062) |
8 |
sub_824AAF50 (Ke-style wrapper) |
0x8245DC5C |
sub_8245DB40 (NEW — not previously named) |
461 |
sub_824AAF50 disasm needs follow-up but lr=0x824AAFC8 = sub_824AAF50+0x78
position is consistent with a bl xeKeSetEvent followed by status check
in an N-arg helper. The wrapper takes (handle, ptr, size) and the
internally-signaled event has a different handle from the input.
Containing-fn cross-check (sylpheed.db):
sub_8245D9D8andsub_8245DA78are in the worker cluster (0x82450000-0x8245C000). Per AUDIT-062: both are γ-signaler-D family, hot from worker-side, missed by AUDIT-059/060 enumeration.sub_8245DB40is in the same cluster; callers aresub_824528A8+0x54andsub_8245EE50+0x20(both worker-cluster internal).- All three are reached from tid=10's body fn
sub_82450A68, the trampoline body for the entrysub_82450A28(whichExCreateThreadregisters viasub_8244FF50).
tid=10 caller chain (canary):
sub_8244FEA8 (caller of sub_8244FF50; itself called from 11 sites)
→ sub_8244FF50 (spawner — calls ExCreateThread w/ entry=sub_82450A28)
→ sub_82450A28 (thread-entry trampoline:
KeSetThreadPriority(-2, 3); bl sub_82450A68)
→ sub_82450A68 (worker dispatch loop)
→ ... γ-signalers D / DA78 / DB40
sub_82450A28 is referenced as a data pointer at 0x8244FFF8 (inside
sub_8244FF50). No call edges to it — it's purely a thread-entry data
constant passed to ExCreateThread.
Step 5 — ours cross-reference
All identified signaler fns (sub_8245D9D8, sub_8245DA78, sub_8245DB40,
sub_824AA2F0, sub_824AAF50, sub_82450A28, sub_8244FF50) are GAME
(XEX) code — not kernel-imports. In ours these execute under the JIT, with
no host-side analog to compare. The relevant question is whether the
trajectory in ours REACHES these PCs.
Direct evidence from prior runs:
AUDIT-062 ours --lr-trace=0x824aa2f0 trace (ours-ntset.jsonl, 136
fires across cold boot up to deadlock):
- tid=6: 82 NtSet fires
- tid=1: 28 fires
- tid=5: 22 fires
- tid=8: 2 fires
- tid=13: 2 fires
- tid=10: 0 fires
ours NEVER spawns the canary-equivalent of tid=10 (the
sub_8244FF50/sub_82450A28/sub_82450A68 worker). This is consistent with
AUDIT-057's "thread-gap" finding: ours has fewer threads than canary.
Within ours, the γ-signalers DO fire — but on tid=5 (calling sub_824AA2F0
from lr=0x8245DA44 = sub_8245D9D8+0x6C) per AUDIT-062's
ours-ntset.jsonl:line 1. AUDIT-062 already established these signal
WRONG handles in ours (neighbors of 0x12AC are signaled; the wedge
handle itself is not).
Conclusion: ours's signaler PCs exist and run, but on the wrong tids (no tid=10 equivalent), and target the wrong handles. The PRODUCER → SIGNALER chain in ours is structurally broken at the thread-spawn layer, not the kernel-import layer.
Confidence (Step 5): MEDIUM-HIGH for the chain identification (data is
internally consistent and matches AUDIT-062's prior independent capture).
LOW on the ours-side resolution mechanism (this audit did not re-run
ours; cross-ref is read-only against prior dumps which may be stale
relative to current ours HEAD e6d43a23…).
AUDIT-066 framing refutation
AUDIT-066 stated:
the producer-side signal for THAT event comes from a γ-signaler running inside the 4 workers spawned by sub_825070F0 — per AUDIT-063's static-reachability survey of NtSet wrapper callers.
This is falsified by AUDIT-069 Step 3+4 evidence:
- The signaler runs on tid=10, spawned by
sub_8244FF50viaExCreateThread(entry=sub_82450A28). This is NOT one of sub_825070F0's 4 workers. - sub_8244FF50's caller chain does NOT require ANON_Class_713383D7's vtable to be installed; it does NOT require sub_825070F0 to fire.
- The circular-bootstrap concern AUDIT-066 raised ("workers can't signal until they spawn; they can't spawn until the wedge clears") was structurally correct framing IF the signaler were inside the sub_825070F0 4-worker family. Since the actual signaler is tid=10 (independently spawned), the circle is broken — the signaler IS reachable without the wedge clearing.
Reading-error class #37: static-reachability surveys (AUDIT-063 walked 12 hops from sub_82452DC0 to NtSet wrapper callers) are scoped to a particular caller chain; they miss alternative producer paths reached via unrelated thread-spawn sites. Always probe at the runtime SIGNAL site to confirm which exact caller fired, not just which static path could fire.
Cascade outcome
- A (capture wait site PC + r3=handle in canary): PASS. PC
0x821CB1DC, r3 captures the handle on first fire reproducibly. - B (capture signal fires on the wait targets): PASS. 1 fire on F80000A4 (wedge handle 1), 100 fires on F8000110 (wedge handle 2).
- C (resolve signaling fn + immediate caller fn): PASS.
sub_824AA2F0←sub_8245D9D8/sub_8245DA78(γ-signaler D family);sub_824AAF50←sub_8245DB40(new). All on tid=10. - D (ours-side cross-ref): PARTIAL. tid=10 IS missing in ours per existing AUDIT-062 data; γ-signalers DO fire but on wrong tids. Did not re-run ours in this session (per task discipline; cross-ref read-only against prior dumps).
Net 3/4 PASS, 1/4 PARTIAL.
Discipline
- xenia-rs HEAD
e6d43a23ac393004d2e5adf2f0395fd0b5e6448bUNCHANGED.git diff HEAD | sha256sumat session start =ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357and at session end IDENTICAL. - Canary patch is purely additive, cvar-gated default-off, UINT32_MAX sentinel + std::once parse pattern (per AUDIT-068 discipline).
- Every canary run used
--mute=true. - Cache wiped before each cold run (4 cold runs total: Step 1 90s,
Step 1 180s rerun, Step 3 with handle watch, Step 3 with log_all_sets,
Step 4 with wrapper probes). Each cache moved to
/tmp/_audit_069_step*before next cold run. - Cache restoration from
/tmp/canary-cache-bak-audit-068deferred to session end (done after this report).
Artifacts
xenia-rs/audit-runs/audit-069-wait-signal-producer/
step1-wait-probe.log (90s baseline; 2 wait fires)
step1-wait-probe.stdout
step1-wait-probe-180s.log (180s rerun; 2 wait fires)
step1-wait-probe-180s.stdout
step3-signal-probe.log (180s; first signal-watch test;
handles drifted, partial correlation)
step3-signal-probe.stdout
step3-correlated.log (180s; log_all_sets; 120k signal fires)
step3-correlated.stdout
step4-wrapper-callers.log (180s; log_all_sets + wrapper entries;
155k events; correlated lr-to-caller)
step4-wrapper-callers.stdout
fix-canary.diff (cumulative canary diff vs 6de80dffe)
writer-report.md (this file)
Session 2 recommendation
Two paths, both <100 LOC ours-side:
Path 1 (ours read-only probe + targeted root-cause): re-run ours with
--ctor-probe=0x82450A28 (the canary-tid=10 entry) — confirm it never
fires. Then --ctor-probe=0x8244FF50 (the spawner). If sub_8244FF50 also
never fires, walk up its 11 callers in sylpheed.db — likely one of them
gates on a flag/event that's not set in ours's early-boot trajectory.
Path 2 (canary additional capture): probe canary's tid=10 spawn
sequence in detail. Add audit_69_thread_spawn_watch cvar that logs
every ExCreateThread call with (entry_pc, ctx, suspend_flag, caller_lr).
~40 LOC. Compare to ours's spawn list — find which call goes
unfired in ours.
Both paths are cheaper than continuing on the wedge directly. Path 1 is preferred: it stays on the ours side which is the failing engine.
Predicted Session 2 cascade:
- A (find sub_82450A28's first-non-fire ancestor in ours): 75-85%
- B (identify the missing precondition for that ancestor): 50-60%
- C (fix LOC in ours ≤ 50): 30-40%
- D (draws>0): 15-25% (single wedge unlock)