Files
xenia-rs/audit-runs/audit-069-wait-signal-producer/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

272 lines
13 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AUDIT-069 Session 1 — wait-signal producer identification
Date: 2026-05-20
Status: **LANDED — signaler tid + caller fns identified; AUDIT-066 circular framing FALSIFIED**
## Headline
The wait at `sub_821CB030+0x1AC` (PC `0x821CB1DC`) — the canonical
AUDIT-049/065 wedge wait — fires in canary on two tids (worker tid=17 and
cache-loader tid=26). Both wedges are signaled by **tid=10**, a worker
thread spawned EARLY (via `sub_8244FF50``ExCreateThread(entry=sub_82450A28)`),
NOT by any of the four workers spawned by `sub_825070F0`. This refutes
AUDIT-066's circular framing ("γ-signaler running inside the 4 workers
spawned by sub_825070F0"): the actual signaler reaches the production
phase WITHOUT depending on sub_825070F0 firing.
## Step 1 — wait site capture (canary)
Probe: `--audit_61_branch_probe_pcs=0x821CB1DC --mute=true`, 180s cold.
| tid | r3 (handle) | r4 (timeout) | r5 (wait_mode) | r6 (ctx) | r31 (stack) | lr |
|----:|------------:|-------------:|---------------:|---------:|------------:|---:|
| 17 | `F80000A4` | `FFFFFFFF` | `0` (auto) | `BC65CEC0` | `7064FA70` | `0x821CB1D0` |
| 26 | `F8000110` | `FFFFFFFF` | `0` (auto) | `BC667F80` | `708FF990` | `0x821CB1D0` |
Two distinct fires (one per logical caller). Both have r4=INFINITE timeout
matching dossier. The lr=`0x821CB1D0` is `sub_821CB030+0x1A0` = the
instruction AFTER the bl-wait — consistent with branch-probe firing at the
basic-block-entry following the wait-call's return.
Handle drift across cold runs is real: Step 1 vs Step 3 vs Step 4 trajectories
produced wait handles `{F80000A0,F8000108}` / `{F80000A0,F8000108}` /
`{F80000A4,F8000110}`. Per-run handles are still deterministic; the absolute
ID is not.
**Important framing correction**: The brief expected "~16 fires" per
AUDIT-065. This was already partly retracted by AUDIT-066 (which observed
that thid=17 "terminates via `ExTerminateThread(0)` WITHOUT ever calling
Wait inside its cache loop"). Step 1 confirms AUDIT-066's correction:
the wait at `+0x1AC` fires ~2× per boot (one for the work-queue load
that ANON_Class_713383D7 work goes through; one for the cache-loader
sister-flow). Not 16. The wait is the WORK-QUEUE wait, not a per-cache-file
IO wait.
Confidence: HIGH (probe fired, r3/r4/r5 match expected wait-call ABI,
two distinct logical fires reproducible across cold runs).
## Step 2 — instrumentation (canary, ~280 LOC additive)
New `audit_69_*` cvars + slowpath module:
- **cpu_flags.{h,cc}** (+23/+48 LOC, of which ~30 LOC are mine vs cumulative):
- `--audit_69_event_signal_watch` (CSV of guest handle IDs, max 4)
- `--audit_69_event_signal_native_ptr` (CSV of guest VAs, max 4)
- `--audit_69_log_all_sets` (bool — log EVERY XEvent::Set/Pulse fire)
- **xenia-kernel/audit_69_event_signal_watch.h** (51 LOC) — fwd decls,
hot-path inline wrapper (single relaxed atomic load + branch).
- **xenia-kernel/audit_69_event_signal_watch.cc** (193 LOC) — lazy parse +
UINT32_MAX sentinel + `XThread::TryGetCurrentThread()` for lr/tid capture.
Mirrors AUDIT-068's static-init gate pattern.
- **xenia-kernel/xevent.cc** (+9 LOC) — hook at `XEvent::Set` and
`XEvent::Pulse` (the deepest convergence of Ke/Nt set + pulse paths).
Reading-error registration: `XThread::GetCurrentThread()` asserts on host
threads; first iteration used it and crashed. Fixed by switching to
`TryGetCurrentThread()`. (Same lesson as AUDIT-067's bool-vs-pointer
asymmetry but in a different fn.)
Cumulative cross-run canary additions retained in tree (AUDIT-061/067/068/069).
## Step 3 — correlated capture
Run: cold, 180s, `--mute=true --audit_61_branch_probe_pcs=0x821CB1DC,0x824AA2F0,0x824AAF50 --audit_69_log_all_sets=true`.
Volume: 122,165 log lines (Step 3) / 155,627 lines (Step 4 with wrapper probes).
Wait fires (Step 4): 2 (tid=17, tid=26, as in Step 1 but with handle drift to F80000A4/F8000110).
Signals on wedge handles (Step 4):
| wedge handle (waited on) | wait tid | signal fires | signal lr | signaling fn | signal tid |
|---|---|---|---|---|---|
| `0xF80000A4` | 17 | **1** | `0x824AA304` | `sub_824AA2F0` (NtSetEvent wrapper) | **10** |
| `0xF8000110` | 26 | **100** | `0x824AAFC8` | `sub_824AAF50` (a generic event-set-with-arg wrapper) | **10** |
The 100 fires on F8000110 are repeats — auto-reset events fire on first
signal; the rest are no-ops. Volume reflects how often the work-queue
processes items targeting this synchronizer.
## Step 4 — signaler-fn resolution (sylpheed.db cross-check)
Wrapper-entry probe data for these two NtSet wrappers, filtered to tid=10:
| wrapper | lr-of-caller | caller fn | tid=10 fire count |
|---|---|---|---|
| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DA44` | **`sub_8245D9D8`** (γ-signaler D-A per AUDIT-062) | 23 |
| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DB08` | **`sub_8245DA78`** (γ-signaler D-B per AUDIT-062) | 8 |
| `sub_824AAF50` (Ke-style wrapper) | `0x8245DC5C` | **`sub_8245DB40`** (NEW — not previously named) | 461 |
`sub_824AAF50` disasm needs follow-up but lr=0x824AAFC8 = `sub_824AAF50+0x78`
position is consistent with a `bl xeKeSetEvent` followed by status check
in an N-arg helper. The wrapper takes `(handle, ptr, size)` and the
internally-signaled event has a different handle from the input.
Containing-fn cross-check (`sylpheed.db`):
- `sub_8245D9D8` and `sub_8245DA78` are in the worker cluster
(0x82450000-0x8245C000). Per AUDIT-062: both are γ-signaler-D family,
hot from worker-side, missed by AUDIT-059/060 enumeration.
- `sub_8245DB40` is in the same cluster; callers are `sub_824528A8+0x54`
and `sub_8245EE50+0x20` (both worker-cluster internal).
- All three are reached from tid=10's body fn `sub_82450A68`, the
trampoline body for the entry `sub_82450A28` (which `ExCreateThread`
registers via `sub_8244FF50`).
**tid=10 caller chain (canary)**:
```
sub_8244FEA8 (caller of sub_8244FF50; itself called from 11 sites)
→ sub_8244FF50 (spawner — calls ExCreateThread w/ entry=sub_82450A28)
→ sub_82450A28 (thread-entry trampoline:
KeSetThreadPriority(-2, 3); bl sub_82450A68)
→ sub_82450A68 (worker dispatch loop)
→ ... γ-signalers D / DA78 / DB40
```
`sub_82450A28` is referenced as a data pointer at `0x8244FFF8` (inside
`sub_8244FF50`). No call edges to it — it's purely a thread-entry data
constant passed to ExCreateThread.
## Step 5 — ours cross-reference
All identified signaler fns (`sub_8245D9D8`, `sub_8245DA78`, `sub_8245DB40`,
`sub_824AA2F0`, `sub_824AAF50`, `sub_82450A28`, `sub_8244FF50`) are GAME
(XEX) code — not kernel-imports. In ours these execute under the JIT, with
no host-side analog to compare. The relevant question is whether the
trajectory in ours REACHES these PCs.
Direct evidence from prior runs:
**AUDIT-062 ours `--lr-trace=0x824aa2f0`** trace (`ours-ntset.jsonl`, 136
fires across cold boot up to deadlock):
- tid=6: 82 NtSet fires
- tid=1: 28 fires
- tid=5: 22 fires
- tid=8: 2 fires
- tid=13: 2 fires
- **tid=10: 0 fires**
ours NEVER spawns the canary-equivalent of tid=10 (the
`sub_8244FF50/sub_82450A28/sub_82450A68` worker). This is consistent with
AUDIT-057's "thread-gap" finding: ours has fewer threads than canary.
Within ours, the γ-signalers DO fire — but on tid=5 (calling sub_824AA2F0
from lr=`0x8245DA44` = `sub_8245D9D8+0x6C`) per AUDIT-062's
`ours-ntset.jsonl:line 1`. AUDIT-062 already established these signal
WRONG handles in ours (neighbors of `0x12AC` are signaled; the wedge
handle itself is not).
**Conclusion**: ours's signaler PCs exist and run, but on the wrong tids
(no tid=10 equivalent), and target the wrong handles. The PRODUCER →
SIGNALER chain in ours is structurally broken at the **thread-spawn**
layer, not the kernel-import layer.
Confidence (Step 5): MEDIUM-HIGH for the chain identification (data is
internally consistent and matches AUDIT-062's prior independent capture).
LOW on the ours-side resolution mechanism (this audit did not re-run
ours; cross-ref is read-only against prior dumps which may be stale
relative to current ours HEAD `e6d43a23…`).
## AUDIT-066 framing refutation
AUDIT-066 stated:
> the producer-side signal for THAT event comes from a γ-signaler running
> inside the 4 workers spawned by sub_825070F0 — per AUDIT-063's
> static-reachability survey of NtSet wrapper callers.
This is **falsified** by AUDIT-069 Step 3+4 evidence:
1. The signaler runs on **tid=10**, spawned by `sub_8244FF50` via
`ExCreateThread(entry=sub_82450A28)`. This is NOT one of sub_825070F0's
4 workers.
2. sub_8244FF50's caller chain does NOT require ANON_Class_713383D7's
vtable to be installed; it does NOT require sub_825070F0 to fire.
3. The circular-bootstrap concern AUDIT-066 raised ("workers can't signal
until they spawn; they can't spawn until the wedge clears") was
structurally correct framing IF the signaler were inside the
sub_825070F0 4-worker family. Since the actual signaler is tid=10
(independently spawned), the circle is **broken** — the signaler IS
reachable without the wedge clearing.
Reading-error class **#37**: static-reachability surveys (AUDIT-063 walked
12 hops from sub_82452DC0 to NtSet wrapper callers) are scoped to a
particular caller chain; they miss alternative producer paths reached via
unrelated thread-spawn sites. Always probe at the runtime SIGNAL site to
confirm which exact caller fired, not just which static path could fire.
## Cascade outcome
- **A** (capture wait site PC + r3=handle in canary): **PASS**. PC
`0x821CB1DC`, r3 captures the handle on first fire reproducibly.
- **B** (capture signal fires on the wait targets): **PASS**. 1 fire on
F80000A4 (wedge handle 1), 100 fires on F8000110 (wedge handle 2).
- **C** (resolve signaling fn + immediate caller fn): **PASS**.
`sub_824AA2F0``sub_8245D9D8` / `sub_8245DA78` (γ-signaler D family);
`sub_824AAF50``sub_8245DB40` (new). All on tid=10.
- **D** (ours-side cross-ref): **PARTIAL**. tid=10 IS missing in ours
per existing AUDIT-062 data; γ-signalers DO fire but on wrong tids.
Did not re-run ours in this session (per task discipline; cross-ref
read-only against prior dumps).
Net 3/4 PASS, 1/4 PARTIAL.
## Discipline
- xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED.
`git diff HEAD | sha256sum` at session start =
`ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357`
and at session end IDENTICAL.
- Canary patch is purely additive, cvar-gated default-off, UINT32_MAX
sentinel + std::once parse pattern (per AUDIT-068 discipline).
- Every canary run used `--mute=true`.
- Cache wiped before each cold run (4 cold runs total: Step 1 90s,
Step 1 180s rerun, Step 3 with handle watch, Step 3 with log_all_sets,
Step 4 with wrapper probes). Each cache moved to `/tmp/_audit_069_step*`
before next cold run.
- Cache restoration from `/tmp/canary-cache-bak-audit-068` deferred to
session end (done after this report).
## Artifacts
```
xenia-rs/audit-runs/audit-069-wait-signal-producer/
step1-wait-probe.log (90s baseline; 2 wait fires)
step1-wait-probe.stdout
step1-wait-probe-180s.log (180s rerun; 2 wait fires)
step1-wait-probe-180s.stdout
step3-signal-probe.log (180s; first signal-watch test;
handles drifted, partial correlation)
step3-signal-probe.stdout
step3-correlated.log (180s; log_all_sets; 120k signal fires)
step3-correlated.stdout
step4-wrapper-callers.log (180s; log_all_sets + wrapper entries;
155k events; correlated lr-to-caller)
step4-wrapper-callers.stdout
fix-canary.diff (cumulative canary diff vs 6de80dffe)
writer-report.md (this file)
```
## Session 2 recommendation
Two paths, both <100 LOC ours-side:
**Path 1 (ours read-only probe + targeted root-cause)**: re-run ours with
`--ctor-probe=0x82450A28` (the canary-tid=10 entry) — confirm it never
fires. Then `--ctor-probe=0x8244FF50` (the spawner). If sub_8244FF50 also
never fires, walk up its 11 callers in sylpheed.db — likely one of them
gates on a flag/event that's not set in ours's early-boot trajectory.
**Path 2 (canary additional capture)**: probe canary's tid=10 spawn
sequence in detail. Add `audit_69_thread_spawn_watch` cvar that logs
every ExCreateThread call with (entry_pc, ctx, suspend_flag, caller_lr).
~40 LOC. Compare to ours's spawn list — find which call goes
unfired in ours.
Both paths are cheaper than continuing on the wedge directly. Path 1 is
preferred: it stays on the ours side which is the failing engine.
Predicted Session 2 cascade:
- A (find sub_82450A28's first-non-fire ancestor in ours): 75-85%
- B (identify the missing precondition for that ancestor): 50-60%
- C (fix LOC in ours ≤ 50): 30-40%
- D (draws>0): 15-25% (single wedge unlock)