handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,271 @@
# AUDIT-069 Session 1 — wait-signal producer identification
Date: 2026-05-20
Status: **LANDED — signaler tid + caller fns identified; AUDIT-066 circular framing FALSIFIED**
## Headline
The wait at `sub_821CB030+0x1AC` (PC `0x821CB1DC`) — the canonical
AUDIT-049/065 wedge wait — fires in canary on two tids (worker tid=17 and
cache-loader tid=26). Both wedges are signaled by **tid=10**, a worker
thread spawned EARLY (via `sub_8244FF50``ExCreateThread(entry=sub_82450A28)`),
NOT by any of the four workers spawned by `sub_825070F0`. This refutes
AUDIT-066's circular framing ("γ-signaler running inside the 4 workers
spawned by sub_825070F0"): the actual signaler reaches the production
phase WITHOUT depending on sub_825070F0 firing.
## Step 1 — wait site capture (canary)
Probe: `--audit_61_branch_probe_pcs=0x821CB1DC --mute=true`, 180s cold.
| tid | r3 (handle) | r4 (timeout) | r5 (wait_mode) | r6 (ctx) | r31 (stack) | lr |
|----:|------------:|-------------:|---------------:|---------:|------------:|---:|
| 17 | `F80000A4` | `FFFFFFFF` | `0` (auto) | `BC65CEC0` | `7064FA70` | `0x821CB1D0` |
| 26 | `F8000110` | `FFFFFFFF` | `0` (auto) | `BC667F80` | `708FF990` | `0x821CB1D0` |
Two distinct fires (one per logical caller). Both have r4=INFINITE timeout
matching dossier. The lr=`0x821CB1D0` is `sub_821CB030+0x1A0` = the
instruction AFTER the bl-wait — consistent with branch-probe firing at the
basic-block-entry following the wait-call's return.
Handle drift across cold runs is real: Step 1 vs Step 3 vs Step 4 trajectories
produced wait handles `{F80000A0,F8000108}` / `{F80000A0,F8000108}` /
`{F80000A4,F8000110}`. Per-run handles are still deterministic; the absolute
ID is not.
**Important framing correction**: The brief expected "~16 fires" per
AUDIT-065. This was already partly retracted by AUDIT-066 (which observed
that thid=17 "terminates via `ExTerminateThread(0)` WITHOUT ever calling
Wait inside its cache loop"). Step 1 confirms AUDIT-066's correction:
the wait at `+0x1AC` fires ~2× per boot (one for the work-queue load
that ANON_Class_713383D7 work goes through; one for the cache-loader
sister-flow). Not 16. The wait is the WORK-QUEUE wait, not a per-cache-file
IO wait.
Confidence: HIGH (probe fired, r3/r4/r5 match expected wait-call ABI,
two distinct logical fires reproducible across cold runs).
## Step 2 — instrumentation (canary, ~280 LOC additive)
New `audit_69_*` cvars + slowpath module:
- **cpu_flags.{h,cc}** (+23/+48 LOC, of which ~30 LOC are mine vs cumulative):
- `--audit_69_event_signal_watch` (CSV of guest handle IDs, max 4)
- `--audit_69_event_signal_native_ptr` (CSV of guest VAs, max 4)
- `--audit_69_log_all_sets` (bool — log EVERY XEvent::Set/Pulse fire)
- **xenia-kernel/audit_69_event_signal_watch.h** (51 LOC) — fwd decls,
hot-path inline wrapper (single relaxed atomic load + branch).
- **xenia-kernel/audit_69_event_signal_watch.cc** (193 LOC) — lazy parse +
UINT32_MAX sentinel + `XThread::TryGetCurrentThread()` for lr/tid capture.
Mirrors AUDIT-068's static-init gate pattern.
- **xenia-kernel/xevent.cc** (+9 LOC) — hook at `XEvent::Set` and
`XEvent::Pulse` (the deepest convergence of Ke/Nt set + pulse paths).
Reading-error registration: `XThread::GetCurrentThread()` asserts on host
threads; first iteration used it and crashed. Fixed by switching to
`TryGetCurrentThread()`. (Same lesson as AUDIT-067's bool-vs-pointer
asymmetry but in a different fn.)
Cumulative cross-run canary additions retained in tree (AUDIT-061/067/068/069).
## Step 3 — correlated capture
Run: cold, 180s, `--mute=true --audit_61_branch_probe_pcs=0x821CB1DC,0x824AA2F0,0x824AAF50 --audit_69_log_all_sets=true`.
Volume: 122,165 log lines (Step 3) / 155,627 lines (Step 4 with wrapper probes).
Wait fires (Step 4): 2 (tid=17, tid=26, as in Step 1 but with handle drift to F80000A4/F8000110).
Signals on wedge handles (Step 4):
| wedge handle (waited on) | wait tid | signal fires | signal lr | signaling fn | signal tid |
|---|---|---|---|---|---|
| `0xF80000A4` | 17 | **1** | `0x824AA304` | `sub_824AA2F0` (NtSetEvent wrapper) | **10** |
| `0xF8000110` | 26 | **100** | `0x824AAFC8` | `sub_824AAF50` (a generic event-set-with-arg wrapper) | **10** |
The 100 fires on F8000110 are repeats — auto-reset events fire on first
signal; the rest are no-ops. Volume reflects how often the work-queue
processes items targeting this synchronizer.
## Step 4 — signaler-fn resolution (sylpheed.db cross-check)
Wrapper-entry probe data for these two NtSet wrappers, filtered to tid=10:
| wrapper | lr-of-caller | caller fn | tid=10 fire count |
|---|---|---|---|
| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DA44` | **`sub_8245D9D8`** (γ-signaler D-A per AUDIT-062) | 23 |
| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DB08` | **`sub_8245DA78`** (γ-signaler D-B per AUDIT-062) | 8 |
| `sub_824AAF50` (Ke-style wrapper) | `0x8245DC5C` | **`sub_8245DB40`** (NEW — not previously named) | 461 |
`sub_824AAF50` disasm needs follow-up but lr=0x824AAFC8 = `sub_824AAF50+0x78`
position is consistent with a `bl xeKeSetEvent` followed by status check
in an N-arg helper. The wrapper takes `(handle, ptr, size)` and the
internally-signaled event has a different handle from the input.
Containing-fn cross-check (`sylpheed.db`):
- `sub_8245D9D8` and `sub_8245DA78` are in the worker cluster
(0x82450000-0x8245C000). Per AUDIT-062: both are γ-signaler-D family,
hot from worker-side, missed by AUDIT-059/060 enumeration.
- `sub_8245DB40` is in the same cluster; callers are `sub_824528A8+0x54`
and `sub_8245EE50+0x20` (both worker-cluster internal).
- All three are reached from tid=10's body fn `sub_82450A68`, the
trampoline body for the entry `sub_82450A28` (which `ExCreateThread`
registers via `sub_8244FF50`).
**tid=10 caller chain (canary)**:
```
sub_8244FEA8 (caller of sub_8244FF50; itself called from 11 sites)
→ sub_8244FF50 (spawner — calls ExCreateThread w/ entry=sub_82450A28)
→ sub_82450A28 (thread-entry trampoline:
KeSetThreadPriority(-2, 3); bl sub_82450A68)
→ sub_82450A68 (worker dispatch loop)
→ ... γ-signalers D / DA78 / DB40
```
`sub_82450A28` is referenced as a data pointer at `0x8244FFF8` (inside
`sub_8244FF50`). No call edges to it — it's purely a thread-entry data
constant passed to ExCreateThread.
## Step 5 — ours cross-reference
All identified signaler fns (`sub_8245D9D8`, `sub_8245DA78`, `sub_8245DB40`,
`sub_824AA2F0`, `sub_824AAF50`, `sub_82450A28`, `sub_8244FF50`) are GAME
(XEX) code — not kernel-imports. In ours these execute under the JIT, with
no host-side analog to compare. The relevant question is whether the
trajectory in ours REACHES these PCs.
Direct evidence from prior runs:
**AUDIT-062 ours `--lr-trace=0x824aa2f0`** trace (`ours-ntset.jsonl`, 136
fires across cold boot up to deadlock):
- tid=6: 82 NtSet fires
- tid=1: 28 fires
- tid=5: 22 fires
- tid=8: 2 fires
- tid=13: 2 fires
- **tid=10: 0 fires**
ours NEVER spawns the canary-equivalent of tid=10 (the
`sub_8244FF50/sub_82450A28/sub_82450A68` worker). This is consistent with
AUDIT-057's "thread-gap" finding: ours has fewer threads than canary.
Within ours, the γ-signalers DO fire — but on tid=5 (calling sub_824AA2F0
from lr=`0x8245DA44` = `sub_8245D9D8+0x6C`) per AUDIT-062's
`ours-ntset.jsonl:line 1`. AUDIT-062 already established these signal
WRONG handles in ours (neighbors of `0x12AC` are signaled; the wedge
handle itself is not).
**Conclusion**: ours's signaler PCs exist and run, but on the wrong tids
(no tid=10 equivalent), and target the wrong handles. The PRODUCER →
SIGNALER chain in ours is structurally broken at the **thread-spawn**
layer, not the kernel-import layer.
Confidence (Step 5): MEDIUM-HIGH for the chain identification (data is
internally consistent and matches AUDIT-062's prior independent capture).
LOW on the ours-side resolution mechanism (this audit did not re-run
ours; cross-ref is read-only against prior dumps which may be stale
relative to current ours HEAD `e6d43a23…`).
## AUDIT-066 framing refutation
AUDIT-066 stated:
> the producer-side signal for THAT event comes from a γ-signaler running
> inside the 4 workers spawned by sub_825070F0 — per AUDIT-063's
> static-reachability survey of NtSet wrapper callers.
This is **falsified** by AUDIT-069 Step 3+4 evidence:
1. The signaler runs on **tid=10**, spawned by `sub_8244FF50` via
`ExCreateThread(entry=sub_82450A28)`. This is NOT one of sub_825070F0's
4 workers.
2. sub_8244FF50's caller chain does NOT require ANON_Class_713383D7's
vtable to be installed; it does NOT require sub_825070F0 to fire.
3. The circular-bootstrap concern AUDIT-066 raised ("workers can't signal
until they spawn; they can't spawn until the wedge clears") was
structurally correct framing IF the signaler were inside the
sub_825070F0 4-worker family. Since the actual signaler is tid=10
(independently spawned), the circle is **broken** — the signaler IS
reachable without the wedge clearing.
Reading-error class **#37**: static-reachability surveys (AUDIT-063 walked
12 hops from sub_82452DC0 to NtSet wrapper callers) are scoped to a
particular caller chain; they miss alternative producer paths reached via
unrelated thread-spawn sites. Always probe at the runtime SIGNAL site to
confirm which exact caller fired, not just which static path could fire.
## Cascade outcome
- **A** (capture wait site PC + r3=handle in canary): **PASS**. PC
`0x821CB1DC`, r3 captures the handle on first fire reproducibly.
- **B** (capture signal fires on the wait targets): **PASS**. 1 fire on
F80000A4 (wedge handle 1), 100 fires on F8000110 (wedge handle 2).
- **C** (resolve signaling fn + immediate caller fn): **PASS**.
`sub_824AA2F0``sub_8245D9D8` / `sub_8245DA78` (γ-signaler D family);
`sub_824AAF50``sub_8245DB40` (new). All on tid=10.
- **D** (ours-side cross-ref): **PARTIAL**. tid=10 IS missing in ours
per existing AUDIT-062 data; γ-signalers DO fire but on wrong tids.
Did not re-run ours in this session (per task discipline; cross-ref
read-only against prior dumps).
Net 3/4 PASS, 1/4 PARTIAL.
## Discipline
- xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED.
`git diff HEAD | sha256sum` at session start =
`ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357`
and at session end IDENTICAL.
- Canary patch is purely additive, cvar-gated default-off, UINT32_MAX
sentinel + std::once parse pattern (per AUDIT-068 discipline).
- Every canary run used `--mute=true`.
- Cache wiped before each cold run (4 cold runs total: Step 1 90s,
Step 1 180s rerun, Step 3 with handle watch, Step 3 with log_all_sets,
Step 4 with wrapper probes). Each cache moved to `/tmp/_audit_069_step*`
before next cold run.
- Cache restoration from `/tmp/canary-cache-bak-audit-068` deferred to
session end (done after this report).
## Artifacts
```
xenia-rs/audit-runs/audit-069-wait-signal-producer/
step1-wait-probe.log (90s baseline; 2 wait fires)
step1-wait-probe.stdout
step1-wait-probe-180s.log (180s rerun; 2 wait fires)
step1-wait-probe-180s.stdout
step3-signal-probe.log (180s; first signal-watch test;
handles drifted, partial correlation)
step3-signal-probe.stdout
step3-correlated.log (180s; log_all_sets; 120k signal fires)
step3-correlated.stdout
step4-wrapper-callers.log (180s; log_all_sets + wrapper entries;
155k events; correlated lr-to-caller)
step4-wrapper-callers.stdout
fix-canary.diff (cumulative canary diff vs 6de80dffe)
writer-report.md (this file)
```
## Session 2 recommendation
Two paths, both <100 LOC ours-side:
**Path 1 (ours read-only probe + targeted root-cause)**: re-run ours with
`--ctor-probe=0x82450A28` (the canary-tid=10 entry) — confirm it never
fires. Then `--ctor-probe=0x8244FF50` (the spawner). If sub_8244FF50 also
never fires, walk up its 11 callers in sylpheed.db — likely one of them
gates on a flag/event that's not set in ours's early-boot trajectory.
**Path 2 (canary additional capture)**: probe canary's tid=10 spawn
sequence in detail. Add `audit_69_thread_spawn_watch` cvar that logs
every ExCreateThread call with (entry_pc, ctx, suspend_flag, caller_lr).
~40 LOC. Compare to ours's spawn list — find which call goes
unfired in ours.
Both paths are cheaper than continuing on the wedge directly. Path 1 is
preferred: it stays on the ours side which is the failing engine.
Predicted Session 2 cascade:
- A (find sub_82450A28's first-non-fire ancestor in ours): 75-85%
- B (identify the missing precondition for that ancestor): 50-60%
- C (fix LOC in ours ≤ 50): 30-40%
- D (draws>0): 15-25% (single wedge unlock)