handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/review-a-boot-state/canary-boot-trajectory.md
+++ b/audit-runs/review-a-boot-state/canary-boot-trajectory.md
@@ -0,0 +1,121 @@
+# Canary boot-to-first-draw trajectory
+
+**Source data:** `xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl`
+(4.4 GB, 18.7M events, 90s wallclock, cold run). Profile builder at
+`xenia-rs/audit-runs/phase-nonmatch-investigation/build_profiles.py`.
+
+## TL;DR
+
+- **First boot-time `VdSwap` fires on canary's tid=6 (guest main) at
+  ~9.5 s wallclock**, immediately after the rendering subsystem is
+  initialized.  This is the *empty / system-command-buffer* swap that
+  ours also reaches (ours's metric `swaps=1` is this swap).
+- **First gameplay `VdSwap` (intro-movie frame) fires on canary's
+  tid=13 (renderer) starting at ~10.7 s wallclock**, after the
+  `sub_825070F0` worker fan-out at host_ns ≈ 10.382-10.384 s.  Canary
+  tid=13 emits **12,092** `VdSwap` + `VdGetSystemCommandBuffer` calls
+  in the 90-s window, i.e. ~150 fps sustained.
+- The gating event between "boot swap" and "first gameplay swap" is
+  the 4-worker fan-out spawned by `sub_825070F0` at PCs `0x82506528 /
+  0x82506558 / 0x82506588 / 0x825065B8` with ctx `0xBCE251C0`.  Three
+  of the four workers begin emitting events at host_ns ≈ 10.705 s
+  (tids 27/28/29 — see `canary-tid-profiles.md` row 33-35).
+
+## Phase-by-phase trajectory
+
+| t (host_ns) | Phase | What | Citation |
+|------:|-------|------|----------|
+| 0–660 ms | XEX load / startup | `XexLoadImage`, ELF→guest init, kernel-state ctor.  Spawn tid=6 ("guest main") at host_ns=660 ms. | `phase-nonmatch-investigation/canary-tid-profiles.md:14` |
+| 660 ms–1.42 s | **Pre-spawn init** | tid=6 sets up TLS, runs CRT init.  Establishes vtables / globals.  *Sylpheed-specific*: writes `0x8200A1E8` (vtable for `ANON_Class_713383D7`) at the install-epoch host_ns ≈ 9.4–9.6 s via a 12-byte POD struct copy `{vptr, self, self}` (see `project_audit_068_session3`).  **Critical**: this is the vtable whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
+| 1.42–1.94 s | **Main init burst** | 10 thread spawns (tids 8–17) by tid=6.  Ours matches this 1:1. Entries include `0x82181830`, `0x8245A5D0`, `0x82450A28`, `0x82457EF0`, `0x824CD458`, **`0x822F1EE0` (renderer, susp=T)**, `0x824D2878/0x824D2940` (XAudio, susp=T), `0x82178950` (XMA), `0x821748F0` (file IO spawner, susp=T). | `canary-tid-profiles.md:42-55` |
+| 1.671 s | **Renderer spawn** | tid=6 calls `ExCreateThread` with entry `0x822F1EE0`, ctx `0xBCE24A40`, suspended=True. Becomes canary tid=13. | `canary-tid-profiles.md:21,49` |
+| 1.726–1.728 s | **XAudio spawn** | tids 14/15 (XAudio voice-mask poll + sister) spawned suspended. Will dominate event volume (~11M events combined). | `canary-tid-profiles.md:50-51` |
+| 1.94–2.15 s | **Secondary init burst** | 8 more spawns (tids 18–25), file-IO + XAM helpers.  **Ours emits 0** here — already wedged. | `result.md:48` |
+| 9.4–9.6 s | **vtable install epoch** | Host-side POD struct copy installs `0x8200A1E8` at run-specific arena address (`0xBCE25340` or `0xBCE251C0` per arena drift).  This is the ANON_Class_713383D7 instance whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
+| ~9.5 s | **Boot-init `VdSwap` (on tid=6)** | After `VdInitializeEngines + VdShutdownEngines + VdInitializeEngines + VdSetGraphicsInterruptCallback + VdSetSystemCommandBufferGpuIdentifierAddress + VdInitializeRingBuffer + VdEnableRingBufferRPtrWriteBack + VdGetSystemCommandBuffer`, tid=6 emits **one** `VdSwap` to publish the boot framebuffer.  draws=0 still (no PM4 draw packets). | Mirror of `ours-postfix.jsonl` idx 105044-105285; canary same shape. |
+| 10.080 s | tid=26 second-call helper | `0x821748F0` second invocation. | `canary-tid-profiles.md:32` |
+| **10.383 s** | **sub_825070F0 worker fan-out** | **Four `ExCreateThread` calls in 1 ms** spawn entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8` all sharing ctx `0xBCE251C0` (the ANON_Class instance).  These are the workers that consume cache-file IO and signal the wedge event(s) that AUDIT-049 found dangling in ours. | `canary-tid-profiles.md:63-66`, `sub_825070F0.md` |
+| 10.7 s | **Worker resume / first events** | tids 27, 28, 29 emit their first events.  tid=28 dominates (3.26M events) doing file IO (`530× NtReadFile` of `cache:\…`), heavy CS contention (1.07M RtlEnterCS), and signaling the wedge events. | `canary-tid-profiles.md:33-35`, `sub_82452DC0.md` |
+| ~10.7+ s | **Renderer wakes** | Once `sub_825070F0` workers begin, the events that canary's tid=13 was waiting on get signaled.  tid=13 transitions Blocked→Running, starts producing `VdGetSystemCommandBuffer`/`VdSwap` pairs at ~150 fps. | `canary-tid-profiles.md:21`, `result.md:30-39` |
+| ~10.7–90 s | **Sustained rendering** | tid=13 emits 12,092 `VdSwap` calls.  Intro movie ⇒ title screen ⇒ gameplay (depends on user input).  In an unattended cold run, canary likely plateaus on the title screen but is genuinely rendering. | `canary-tid-profiles.md:21` |
+
+## Canary call-chain from entry_point to first gameplay draw
+
+```
+canary tid=6 (guest main)
+  entry_point
+   → sub_8216EA68 (post-init dispatcher)
+      → sub_822F1AA8 (game-loop dispatcher)            (sub_822F1AA8.md)
+         → bctrl vtable[0]({sub_82175330 → tail → sub_82173990})
+            → sub_82173990 (sync task-spawn-and-join)  (sub_82173990.md)
+               → bl sub_821746B0 (alloc task + spawn worker thid=17, F8000094)
+                 [worker thid=17 runs body sub_821748F0
+                   → sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08
+                   → sub_821CB030 (creates Event, submits work via sub_82452DC0)
+                   → … cache file loads (cache:\aab216c3\..., cache:\87719002\..., etc.)
+                   → spawns child workers via ExCreateThread(...,821C4AD0,...)
+                   → eventually ExTerminateThread(0)]
+               → KeWaitForSingleObject(thid=17.handle) INFINITE
+                 [blocks ~445 log lines wallclock; completes when thid=17 terminates]
+               ← returns
+         ← returns to sub_822F1AA8 outer loop
+         → iterates sub_821741C8 → sub_82172BA0 → bctrl vtable[6]
+            → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800
+               → bctrl vtable[1] = sub_825070F0    (sub_825070F0.md)
+                  → 4× ExCreateThread(...,0x82506528/58/88/B8, ctx=0xBCE25xxx, susp=T)
+                  → 4× NtResumeThread / scheduler enables the workers
+            [workers tids 27/28/29/+1 begin executing]
+         → outer loop continues
+            → KeWaitForSingleObject (4040×/60 s = ~67 fps frame-pacing wait)
+            → bctrl vtable[2] → various per-frame work
+            → tid=6's main loop produces no VdSwap directly past the init swap
+canary tid=13 (renderer; spawned by tid=6 at 0x822F1EE0)
+  [stays suspended OR Blocked-on-event until worker fan-out at 10.38 s]
+  → after wake, enters render loop:
+     while (running) {
+       VdGetSystemCommandBuffer(...)        ; 12,092× / 90 s
+       … build per-frame command buffer …
+       VdSwap(buffer_ptr, fetch_ptr, …)      ; 12,092× / 90 s
+     }
+```
+
+## Pre-conditions canary establishes before first gameplay draw
+
+In time order, all must hold:
+
+1. **GPU subsystem initialized**: `VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback`.  Ours: ✓ (idx 105044-105117).
+2. **Renderer thread alive**: tid=13 created suspended via `ExCreateThread(entry=0x822F1EE0, susp=T)`. Ours: ✓ (idx 105348).
+3. **Worker-cluster activation**: 4 workers spawned by `sub_825070F0` consuming `sub_82452DC0` work.  Ours: **✗ 0 fires**.
+4. **`sub_821CB030`'s Event signaled**: the per-load completion event created at `sub_821CB030+0x128` and waited at `+0x1AC` must be signaled by a `sub_825070F0` worker.  Ours: **✗ `NO_SIGNALS_DESPITE_WAITS` on handle 0x12d0**.
+5. **`sub_82173990`'s join-wait completes**: tid=6's wait at `sub_82173990+0x2D0` on the thid=17 thread handle.  Ours: **✗ tid=1 stuck on handle 0x12c8 (= tid=13's thread handle)**.
+6. **Renderer wakes**: per AUDIT-049, the worker-cluster must signal whatever guards tid=13's body.  Canary: ✓.  Ours: **✗ tid=13 itself wedges in sub_821CB030**.
+
+## Numerical signature of canary at ~50 s wallclock (for reference)
+
+- 18.7 M events / 28 tids.
+- Renderer tid=13: 594 k events, including 12,092 VdSwap.
+- Worker tid=28 (sub_825070F0 worker 0): 3.26 M events.
+- XAudio tid=14/15: 6.15 M / 4.78 M events.
+- ours at 50 M-instr / ~3 s wallclock: 121 k events / 13 tids.  Renderer
+  tid=13 in ours: ~80 events (wedged).
+- The order of magnitude differs by ~150× because ours wedges ~7 s before
+  canary's `sub_825070F0` fan-out fires.
+
+## Uncertainty / open questions
+
+- **What is the precise host-side install of the `ANON_Class_713383D7`
+  vtable `0x8200A1E8`?** AUDIT-068 sessions 1–4 localized this to a
+  POD struct copy in the install epoch [9.4 s, 9.6 s], with the writer
+  identified at GUEST PPC `sub_824FD240+0x24` (NOT a host-side kernel
+  import as initially feared).  But in ours, `sub_824FD240` and its
+  callers `sub_824F7800/CD0/8398` fire 0× because that chain is
+  downstream of the tid=13 wedge.  See `project_audit_068_session4`.
+- **First "gameplay draw" precisely**: the first VdSwap that emits PM4
+  draw packets (e.g. `PM4_TYPE3 DRAW_INDX`) into the ringbuffer.  Need
+  to inspect canary's PM4 ring at host_ns ≈ 10.7 s to confirm.  AUDIT
+  history hasn't disambiguated boot/empty-swap from gameplay-swap at
+  the PM4-packet level.  This is a methodology gap.
+- **What unwedges canary's worker-cluster activation chain?**  AUDIT-068
+  pinned the install epoch but not the **trigger** — what guest call
+  causes `sub_824FD240+0x24`'s POD-copy to fire?  Identifying the
+  trigger and replaying it in ours is the unanswered Path β attack.
--- a/audit-runs/review-a-boot-state/methodology-assessment.md
+++ b/audit-runs/review-a-boot-state/methodology-assessment.md
@@ -0,0 +1,193 @@
+# Methodology assessment
+
+## The matched-prefix metric: load-bearing or load-shedding?
+
+Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
+Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
+matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
+advanced:
+
+| Phase | Matched-prefix | Δ |
+|---|---:|---:|
+| Phase B baseline (pre-C+1) | ~102,168 | — |
+| Phase D D-extension landing | 104,607 → 105,046 | +439 |
+| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
+| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
+
+| Phase | `swaps` | `draws` | `unique_render_targets` |
+|---|---:|---:|---:|
+| Phase B baseline | 1 | 0 | 0 |
+| Phase W | 1 | 0 | 0 |
+| Phase C+25 | 1 | 0 | 0 |
+
+**The two metrics are decoupled.**  Matched-prefix is moving along
+ENGINE-internal divergences (kernel-call return values, thread IDs,
+heap arena base addresses).  The progression metric is gated by
+boot-state activation, which lives one or more layers above the diff
+points.
+
+## Why the decoupling happened
+
+Three reading-errors compound:
+
+1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
+   default-scheduling produces different *intra-thread* event ordering
+   than ours's coroutine scheduler.  Diff-tool absorbers (C+18, C+21,
+   D-extension) correctly hide this jitter — but they hide *real
+   bootstrap-time divergences too*.  Phase W explicitly noted: "If
+   ours's worker fails to enqueue something canary's worker awaits,
+   we'd never see the gap because the matched-prefix isn't on the
+   worker tid in the first place."
+2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
+   process-global dispatchers (e.g., the work-queue semaphore at
+   handle `0xF800003C` in canary).  But the wedge handle `0x12d0`
+   uses a per-tid create-site SID that does NOT match across engines.
+   So even when the same logical event exists in both engines, the
+   diff harness reports SID mismatch and absorbs OR diverges
+   incorrectly.
+3. **#38 (cross-spawn producer paths)**: static reachability (the
+   sylpheed.db `xrefs` table) misses producer paths that cross
+   thread-spawn boundaries.  The result.md from Phase Non-match shows
+   canary's tid=14 (XAudio voice-mask poll) communicates with
+   downstream code via a path that has no static `bl` edge — it
+   crosses via guest kernel APIs.
+
+## Alternative metric proposals
+
+### Option 1 — `draws ≥ 1` (sharp gate)
+
+**Pros**: directly measures the target.  Boolean.  Reproducible.
+**Cons**: gives no signal during iteration — every iterate before the
+breakthrough is `draws = 0`.  Loss function is non-smooth.
+
+### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
+
+**Pros**: still sharp; one bit looser than draws.  Distinguishes
+boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
+(`swaps≥2`).
+**Cons**: same non-smooth loss.  Achievable in principle by a crowbar
+without solving the underlying bug.
+
+### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
+
+Compute: events emitted on the thread spawned at entry `0x822F1EE0`
+in any 90-s wallclock window.  Canary: 594,000.  Ours: ~0.
+
+**Pros**: smooth-ish (event count can move slowly).  Directly measures
+"is the renderer running."  Bypasses the diff-tool jitter problem
+because it's a per-engine internal count.
+**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
+ceiling).  Could be gamed by a crowbar that resumes the renderer
+without unblocking the wedge.
+
+### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
+
+Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
+Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
+plus the post-10s workers 21/27/28/29).  Ours at 50M instr: 5 tids.
+
+**Pros**: directly measures the AUDIT-057 thread-gap.  Smooth metric:
+each unwedged thread adds 1 to the count.
+**Cons**: requires 90-s wallclock runs — ours can't reach this
+without solving the wedge first, so it's pre-requisite-equivalent to
+Option 3.
+
+### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
+
+Compute: how many `NtReleaseSemaphore` calls on the work semaphore
+(handle `0xF800003C` in canary, equivalent in ours) over 90 s
+wallclock.  Canary: 414.  Ours: 99 (24%).
+
+**Pros**: pinpoints the under-production directly.  Mechanically
+measurable.  Already instrumented in canary (audit_70_semaphore_release_watch).
+**Cons**: same wallclock requirement; same gameability.
+
+### Option 6 — composite: `progression_score`
+
+Define:
+
+```
+progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+                  + 0.001 * matched_prefix
+```
+
+This recovers signal during iteration (matched-prefix moves)
+without pretending it's progression.  The 1000:1 weight ratio
+matches the bug-class severity.
+
+**Pros**: continuous gradient over both wedge-solving and
+canonicalization work.  Honest about which is more important.
+**Cons**: arbitrary weights.  Composite metrics drift in meaning.
+
+## Recommendation
+
+**Adopt Option 6 (composite progression_score) as the primary
+methodology metric**, with a hard secondary gate of "Option 2
+(`swaps ≥ 2`) is what matters; everything else is fitness."
+
+Concrete proposal:
+
+1. The `digest.json` output gains a `progression_score` field
+   computed from the existing fields (zero new instrumentation).
+2. Every iterate must report Δprogression_score in its
+   re-validation.md.
+3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
+   = (small) × Δmatched_prefix) MUST be tagged in their memory entry
+   as "**canonicalization only — no progression**" and counted
+   against a *budget*: max 5 consecutive iterates in this class
+   before mandatory pivot to wedge-attack work.
+4. Audits that move `swaps` or `draws` (the high-weight terms) are
+   tagged "**progression**" and given priority for resource
+   allocation.
+
+This methodology change costs ~10 LOC in the digest output and
+imposes a discipline cap of 5 canonicalization-only audits between
+progression attempts.
+
+## Falsification of the matched-prefix-as-proxy belief
+
+Phase C through C+25 explicitly assumed that matched-prefix is a
+**proxy** for progression.  This assumption is now empirically
+falsified:
+
+> +2,960 events of matched-prefix advancement produced exactly
+> ZERO units of progression.
+
+Reading-error #39 (newly registered by this review):
+
+> **#39 (matched-prefix as progression proxy)**: matched-prefix
+> measures *engine-to-engine divergence point*, not *game-to-game
+> functional gap*.  When the wedge is on a different thread than the
+> matched-prefix anchor thread, advancing matched-prefix is orthogonal
+> to unwedging.  Future audits MUST distinguish "ours's tid-X main
+> thread diverges from canary's tid-Y" from "ours's tid-X main thread
+> is *blocked because tid-Z is wedged*", and target the wedge directly
+> when present.
+
+## What "progression discipline" looks like in practice
+
+For the next 3 iterates:
+
+- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar).  No
+  diff-tool work.  Target: `swaps ≥ 2`.
+- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
+  analysis).  No engine LOC.  Target: identification of the missing
+  kernel call(s).
+- Iterate N+3: **Step 3 of roadmap** (mirror the trigger).  Target:
+  ours unblocks without the crowbar.
+
+Each iterate must produce a `progression_score` delta report.  If
+3 iterates in a row produce Δprogression_score ≤ ε (where
+ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
+again before continuing — this would mean even the crowbar approach
+failed and a deeper rethink is needed.
+
+## Closing note
+
+The user's instinct in calling this strategic pause and review was
+correct.  The matched-prefix-only chain was producing real
+canonicalization work but had ceased producing progression.  The
+roadmap above is one principled attempt at breaking the cycle; if it
+fails, the next-level fallback is to formally accept Sylpheed's
+boot-state as currently unreachable in ours and pivot to a different
+title for the methodology demonstration.
--- a/audit-runs/review-a-boot-state/ours-wedge-localization.md
+++ b/audit-runs/review-a-boot-state/ours-wedge-localization.md
@@ -0,0 +1,205 @@
+# Ours wedge localization
+
+**Source data**: `phase-w-wedge-reattack/ours-postfix.jsonl` (50M-instr
+cold run, ~3 s wallclock, 121,569 events, 13 tids).
+`phase-w-wedge-reattack/halt-on-deadlock-dump.txt` (per-tid state @
+deadlock).
+
+## TL;DR
+
+Ours's wedge is **structurally identical** to AUDIT-049 (first found
+2026-05-10).  Across 25+ subsequent iterates (Phase C+1 … Phase C+25,
+Phase D, AUDIT-049 .. AUDIT-069), the wedge has **never moved**:
+
+- **tid=1 (main)** wedges at `sub_82173990+0x2D4` (PC `0x824ac578`,
+  `do_wait_single`) on **handle `0x12c8`** = `Thread(id=13)` — the
+  renderer thread's join handle.
+- **tid=13 (renderer / cache-IO worker)** wedges at
+  `sub_821CB030+0x1B0` (PC `0x824ac578`, `do_wait_single`) on
+  **handle `0x12d0`** = `Event/Auto`, created by tid=13 itself at
+  `sub_821CB030+0x128` via `NtCreateEvent`.  `<NO_SIGNALS_DESPITE_WAITS>`.
+- **`sub_825070F0` fires 0×** at any horizon probed (50M, 500M, ∞
+  wallclock).  The 4 workers (entries `0x82506528/58/88/B8`) never
+  spawn in ours.
+
+This is what audits 049/058/059/060/062/063/064/065/066/067/068/069
+collectively call "the wedge."
+
+## Graph view: ours's actual reachable subgraph vs canary's
+
+### What runs in BOTH engines (matched-prefix 105,128)
+
+```
+entry_point
+ └─ early CRT init        ✓ ours ✓ canary
+ └─ subsystem init        ✓
+    ├─ VdInitializeEngines (×2, then VdShutdownEngines, then again)
+    ├─ VdInitializeRingBuffer
+    ├─ VdEnableRingBufferRPtrWriteBack
+    ├─ VdSetGraphicsInterruptCallback
+    └─ VdSetSystemCommandBufferGpuIdentifierAddress
+ └─ 10× ExCreateThread (the matched first spawn burst)
+    ├─ 0x82181830 / 0x8245A5D0 / 0x82450A28      ✓ ✓
+    ├─ 0x82457EF0 (spawned by tid=10 → tid=11)   ✓ ✓
+    ├─ 0x824CD458 (KeWait worker, susp=F)        ✓ ✓
+    ├─ 0x822F1EE0 (renderer, susp=T)             ✓ ✓
+    ├─ 0x824D2878 / 0x824D2940 (XAudio, susp=T)  ✓ ✓
+    ├─ 0x82178950 (XMA, susp=F)                  ✓ ✓
+    └─ 0x821748F0 (file IO spawner, susp=T)      ✓ ✓
+ └─ 1× boot-init VdSwap                          ✓ swaps=1
+ └─ tid=1 enters sub_8216EA68 → sub_822F1AA8
+    └─ bctrl vtable[0] of *(0x828E1F08)
+       └─ sub_82175330 → tail → sub_82173990
+          └─ sub_821746B0 → spawn worker (= ours tid=13, susp=F)
+          └─ KeWaitForSingleObject INFINITE on tid=13.handle    ← WEDGE
+```
+
+### What runs ONLY in canary (the missing subgraph)
+
+```
+After tid=6's tid=17 worker (= ours's tid=13) terminates:
+  sub_82173990 returns to sub_822F1AA8's outer loop
+   └─ iterates sub_821741C8 → sub_82172BA0 → vtable[6] = sub_821B55D8
+      → sub_824F8398 → sub_824F7CD0 → sub_824F7800 → vtable[1] = sub_825070F0
+         └─ 4× ExCreateThread(entry=0x82506528/58/88/B8, susp=T)
+            ├─ Worker 0 → tid=28 (file IO, 3.26M events)
+            ├─ Worker 1 → tid=27 (36k events)
+            ├─ Worker 2 → tid=29 (91k events)
+            └─ Worker 3 (0x825065B8 — never resumed in jitter-1 run)
+
+After workers come online:
+  Canary's secondary spawn burst (1.94–2.15 s) — 8 helpers (tids 18–25)
+  Canary's tid=14/15 XAudio resumes (~ms after tid=6 spawns them in
+    susp=T; ours also spawns them susp=T but never resumes them)
+  Renderer tid=13 unblocks, starts emitting VdSwap at ~150 fps
+  Per-frame game loop: tid=6 emits `0x822F1BCC` 4040× / 60 s
+```
+
+## The wedge dependency graph (cyclic)
+
+```
+            [tid=1 (main) wedge]
+                       │
+                       ▼
+     wait on handle 0x12c8 (= tid=13.thread_handle)
+                       │
+                       ▼
+       only signaled when tid=13 calls ExTerminateThread
+                       │
+                       ▼
+         tid=13 needs to complete sub_821CB030 body
+                       │
+                       ▼
+         sub_821CB030 waits on event 0x12d0
+                       │
+                       ▼
+   only signaled by sub_825070F0 worker cluster
+                       │
+                       ▼
+        sub_825070F0 never fires in ours
+                       │
+                       ▼
+    sub_825070F0 is reached via:
+       sub_82172BA0 → ... → sub_824F7800 → bctrl vtable[1]
+    ↑↑↑ which is downstream of sub_822F1AA8's outer loop
+        which is downstream of sub_82173990 returning
+        which is downstream of tid=1's wait completing
+        ← BACK TO TOP
+```
+
+This is the **AUDIT-063 self-referential lock**: the activation chain
+that produces the signal that unwedges the wait is itself downstream
+of the wait completing.  In canary, the lock resolves because the
+tid=17 worker (= ours tid=13's analog) calls `ExTerminateThread`
+**by completing** its `sub_821CB030` body — and that completion is
+fed by some OTHER signal source that ours doesn't replicate.
+
+## Where the "other signal source" lives (the actual root cause)
+
+From AUDIT-069 Session 5 (work-semaphore release-rate diff):
+
+> Canary 414 release events vs ours 99 (24% rate).  Worker (tid=10/5):
+> 382 vs 90.  Main (tid=6/1): 7 vs 8.  **Other producers: 25 vs 1**.
+
+The discrepancy in "other producers" (25 producers vs 1) is the key.
+**Canary has multiple non-worker threads that release the work
+semaphore during bootstrap — releasing this semaphore is what feeds
+the worker-side wait that eventually causes sub_821CB030's event to
+be signaled.** Ours has only one (tid=13 itself, before it wedges).
+
+From AUDIT-069 Session 4 (`sub_82450A68` dispatch loop):
+
+> Ours r3=0x1 (semaphore acquired) 91/91 captures (100%); canary
+> r3=0x102 (TIMEOUT) 3/4 (75%).
+
+**Ours's work-semaphore has count > 0 every time tid=5 checks; canary's
+times out 75% of the time.**  This is a *paradox at face value*: how
+can ours have MORE semaphore signals available but still process
+LESS work?  The S5 reframe resolves it: ours's worker self-releases
+the work semaphore from `sub_82450B68+0xCDC/+0xD28` MORE OFTEN than
+it consumes, because the consume path early-exits when the dispatch
+table doesn't have an entry to process — and the dispatch table
+doesn't have entries because the producers (canary's "other 25 tids")
+aren't running.
+
+## Bootstrap divergence (when does ours first diverge from canary?)
+
+Per the AUDIT-069 H3 framing: somewhere in the *bootstrap* of the
+worker-cluster, a producer thread that should be alive in canary
+isn't alive in ours.  Candidates:
+
+1. **XAudio render thread (canary tid=14/15)**: spawned suspended in
+   ours, **never resumed**.  Canary resumes within ~1 ms of spawn at
+   1.726 s.  Canary's tid=14 calls `XAudioGetVoiceCategoryVolumeChangeMask`
+   26,126× and is one of the top event producers.  This thread runs
+   the host-audio bridge feed loop — *if it isn't running, downstream
+   producers expecting audio cues block.*
+2. **XMA decoder (tid=16, entry `0x82178950`)**: spawned non-suspended
+   in both; ours emits 0 events from this thread because it presumably
+   waits on a kernel object that's never signaled.
+3. **NtWaitForMultipleObjectsEx worker (canary tid=21, entry
+   `0x824563E0`)**: 1M events in canary; absent in ours (canary's
+   second spawn burst doesn't happen).
+4. **The "tid=10 helper" (canary tid=10, entry `0x82450A28`)**: ours
+   has this thread (ours tid=5), but it's running the dispatch loop
+   `sub_82450A68` in a degenerate fast-path mode (S4 finding).
+
+The most defensible single-root claim:
+
+> **Ours never resumes the XAudio threads (tid=14/15), because the
+> guest API call that triggers their resume in canary doesn't fire in
+> ours, and as a knock-on the worker cluster never gets the bootstrap
+> producer it expects.**
+
+But this claim is not yet proven; AUDIT-068/069 stopped short of
+identifying the resume trigger.
+
+## Verified-but-doesn't-help LOC budget across recent audits
+
+(For methodology context — every recent audit landed correctness or
+diagnostic LOC but moved progression 0%.)
+
+| Audit / Phase | LOC added | Component | Effect on progression |
+|---|---:|---|---|
+| AUDIT-067 vptr-mem-watch | +422 (canary) | Mem-watch diagnostic | 0 |
+| AUDIT-068 S1-S4 | +520 cumul (canary) | Host-side write hooks | 0 (writer identified at guest PC) |
+| AUDIT-069 S1-S5 | +60 (canary), 0 (ours) | Wait/release watch | 0 (counts diverge, no fix) |
+| Phase D Stages 0-4 | +450-500 (ours+tools) | Contention manifest | 0 (104,607 cap unbroken) |
+| Phase D D-extension | +95 (tool) | Nested-CS absorber | +439 matched-prefix only |
+| Phase C+1 .. C+25 | varies | Allocator/event/thread shims | 0 (matched-prefix only) |
+| Phase W | +20 (ours) | VdInitializeEngines r3=1 | +66 matched-prefix only |
+| **Total to break wedge: 0 LOC of any kind** | | | |
+
+This is the single most striking pattern from the audit chain: **every
+honest correctness fix advances matched-prefix; none move
+`draws / swaps / unique_render_targets`.**
+
+## Falsification budget for the wedge framing
+
+The wedge framing IS robust (no audit has falsified it since AUDIT-049).
+But it has limited explanatory power: it tells us *what is blocked*,
+not *what should unblock it*.  Reading-error #38 (cross-spawn producer
+paths missed by static reachability) and #36 (POD struct copy bypass)
+both proved that the install / wake mechanism in canary involves paths
+guest static analysis cannot see.  This is a methodology constraint,
+not an unsolvable problem.
--- a/audit-runs/review-a-boot-state/plan.md
+++ b/audit-runs/review-a-boot-state/plan.md
@@ -0,0 +1,333 @@
+# Review A — boot-state review and shortest-path roadmap
+
+**Session type**: PLAN-only.  No engine LOC changes; no canary
+instrumentation changes.  Read-only investigation across the
+existing audit chain artifacts.
+**Date**: 2026-05-21
+**Companion documents** (in this directory):
+- `canary-boot-trajectory.md` — canary's call chain from entry_point
+  to first gameplay draw, with wallclock timestamps.
+- `ours-wedge-localization.md` — precise where-ours-stops, in graph
+  terms.
+- `shortest-path-roadmap.md` — 3-5 step roadmap with expected
+  progression delta per step.
+- `methodology-assessment.md` — alternative metric proposal.
+
+This `plan.md` summarizes the five framing questions with answers
+backed by file:line citations.
+
+---
+
+## Q1 — What is "first draw" in canary's Sylpheed boot?
+
+**Two distinct "draws" must be disambiguated.**
+
+### Q1.a: First boot-init `VdSwap` (the swap=1 event)
+
+Canary's tid=6 (guest main) emits **one** `VdSwap` at ~9.5 s
+wallclock, immediately after the GPU subsystem init sequence
+`VdInitializeEngines → VdInitializeRingBuffer →
+VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback →
+VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer`.
+This swap publishes the boot framebuffer and contains no draw packets.
+
+**Ours also reaches this swap** — visible in
+`phase-w-wedge-reattack/ours-postfix.jsonl` at idx 105283 (host_ns
+496,276,229).  This is what produces ours's `swaps=1` metric.
+
+Both engines reach this point.  **It is NOT the gate.**
+
+### Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event)
+
+Canary's renderer tid=13 (entry `0x822F1EE0`, spawned suspended at
+1.671 s) wakes after the `sub_825070F0` worker fan-out at host_ns
+≈ 10.383 s and begins emitting `VdGetSystemCommandBuffer` /
+`VdSwap` pairs at ~150 fps.  Canary's tid=13 emits **12,092
+VdSwap calls in the 90-s window** (per
+`phase-nonmatch-investigation/canary-tid-profiles.md:21`).
+
+The first of these is the **first gameplay draw**, fired at ~10.7 s
+wallclock — about 1.2 s after the `sub_825070F0` fan-out triggers
+the worker cluster.
+
+**Pre-conditions canary establishes before this point** (per
+`canary-boot-trajectory.md`):
+
+1. Vtable `0x8200A1E8` of `ANON_Class_713383D7` installed at host_ns
+   ≈ 9.4-9.6 s via POD-copy at GUEST PC `sub_824FD240+0x24`
+   (per `project_audit_068_session4_2026_05_20`).
+2. Activation chain `sub_822F1AA8 → sub_82173990 → sub_821746B0 →
+   sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 →
+   sub_824F7800 → bctrl vtable[1] = sub_825070F0` fires on tid=6.
+3. `sub_825070F0` spawns 4 worker threads with entries
+   `0x82506528/58/88/B8` and shared ctx `0xBCE251C0`.
+4. Workers (canary tids 27/28/29) emit signals that unwedge the
+   `sub_821CB030` Event waits across the cache-file IO completion
+   chain.
+5. Renderer tid=13's body (entered earlier but blocked on a
+   tid=14/15 XAudio-coordinated event) unblocks; per-frame
+   `VdGetSystemCommandBuffer` / `VdSwap` loop begins.
+
+---
+
+## Q2 — What is ours's actual progress, and what's the wedge root cause?
+
+**Ours stops at the first wait in the activation chain.** Specifically:
+
+- **tid=1 (main)** wedged at `sub_82173990+0x2D4` (PC `0x824ac578` =
+  `do_wait_single`) on handle `0x12c8` = `Thread(id=13)` — waiting
+  for the renderer's thread handle to signal (which happens only when
+  tid=13 calls `ExTerminateThread`).
+- **tid=13 (renderer / cache-IO worker)** wedged at
+  `sub_821CB030+0x1B0` on handle `0x12d0` = `Event/Auto`, created by
+  itself via `NtCreateEvent` at `sub_821CB030+0x128`.  `signals=0,
+  wakes=0` — `<NO_SIGNALS_DESPITE_WAITS>`.
+- **`sub_825070F0` fires 0×** at any horizon probed.
+
+Citation: `phase-w-wedge-reattack/halt-on-deadlock-dump.txt` +
+`phase-w-wedge-reattack/current-state.md`.
+
+### Root cause (at one structural level deeper than the wedge symptom)
+
+**Per AUDIT-069 Session 5 (the most recent measurement):**
+
+- Canary fires 414 `NtReleaseSemaphore` calls on the work-queue
+  semaphore in the 90-s window.
+- Ours fires 99 (24%).
+- Breakdown: Worker (382 vs 90), Main (7 vs 8), **Other producers
+  (25 vs 1)**.
+
+The "**other producers (25 vs 1)**" gap is the load-bearing
+discrepancy.  Canary has **24 additional thread sources** releasing
+the work semaphore during bootstrap that ours does not have.  These
+correspond to:
+
+1. The 4 `sub_825070F0` workers (canary tids 27/28/29 + 1) — absent
+   in ours.
+2. XAudio render threads (canary tids 14/15, spawned suspended in
+   both engines, **resumed only in canary**).
+3. The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) —
+   8 helpers including file-IO and NtWaitForMultipleObjectsEx workers
+   — absent in ours.
+
+### The ONE structural issue
+
+> **Ours never reaches `sub_825070F0` because the activation chain
+> that calls it is downstream of tid=13's wedge; and tid=13's wedge
+> is downstream of the worker cluster activation; and the worker
+> cluster activation is `sub_825070F0`.  This is a self-referential
+> lock.**
+
+Canary breaks the lock because some part of the bootstrap
+*pre-activates* the producers (probably via XAudio thread resume at
+1.726 s, which then runs ahead, populates the work queue, signals
+events, etc.).  Ours never resumes the XAudio threads — they're
+spawned suspended and stay that way.
+
+**The single highest-leverage gap is the XAudio thread resume,**
+because (a) it happens early (1.726 s in canary vs. ours's wedge
+which fixes around 1.4 s — i.e. the resume should happen before the
+wedge), (b) it activates the dominant event producers, and (c) AUDIT-069
+S5's "other producers 25 vs 1" finding implicates exactly this class
+of thread.
+
+---
+
+## Q3 — Shortest-path-to-first-draw roadmap
+
+Three to four steps (full detail in `shortest-path-roadmap.md`):
+
+- **Step 1 (~80-150 LOC, ours-side)**: add `--force-spawn-workers`
+  cvar that crowbars `sub_825070F0` activation by directly spawning
+  the 4 worker threads with the right ctx after `VdInitializeRingBuffer`
+  returns.  Tests "are the workers functionally correct if activated"
+  and "does activating them unwedge sub_821CB030."
+- **Step 2 (~0 LOC)**: with Step 1 active, mine the canary jsonl for
+  the kernel-call sequence on tid=6 in the wallclock window [9.4 s,
+  9.6 s] (the install epoch).  Identify what guest call triggers
+  `sub_824FD240+0x24`'s POD-copy of the vtable in canary.
+- **Step 3 (~10-500 LOC, depending on what Step 2 finds)**: mirror
+  that trigger in ours — likely a missing kernel-import return value
+  or a missing post-condition that the trigger inspects.
+- **Step 4 (~0 LOC; remove crowbar)**: re-test ours without
+  `--force-spawn-workers`.  Verify natural bootstrap reaches
+  `sub_825070F0` activation.
+- **Step 5 (~0-50 LOC)**: measure renderer-thread VdSwap rate over 90 s
+  wallclock; target ±30% of canary's 12,092 calls.
+
+Expected delta:
+
+| After step | `swaps` | `draws` | `unique_render_targets` |
+|---|---:|---:|---:|
+| Pre | 1 | 0 | 0 |
+| Step 1 (crowbar) | 2+ | 1+ | 1+ |
+| Step 4 (decrowbar) | 2+ | 1+ | 1+ |
+| Step 5 (parity) | 100+ | 100+ | 1-5 |
+
+---
+
+## Q4 — What's NOT on the shortest path
+
+Explicitly deferred (full rationale in `shortest-path-roadmap.md`):
+
+- **Audio (host-audio-* / XAudio implementation)** — even though
+  XAudio thread resume MAY be the trigger from Q2, ours's existing
+  XAudio shim is sufficient for the workers to bootstrap if they
+  receive the right kernel-call sequence.  Full XAudio
+  implementation is beyond first-draw scope.
+- **HID** — Sylpheed's intro/title screens are auto-advance; no
+  input needed.
+- **XAM content / save games** — not on first-draw path.
+- **Scheduler determinism work** (Phase D Stages 0-4 and beyond) —
+  null result; the wedge is upstream of contention scheduling.
+  Close or indefinitely defer.
+- **Diff-tool canonicalization** (Phase C+N for N > 25) — saturated
+  on matched-prefix without progression; halt this work class until
+  Step 4 lands and the workload re-baselines.
+- **AUDIT-068 host-side install probes** — superseded by AUDIT-068
+  Session 4 finding (writer is GUEST PC, not host).  The followup
+  question is what *triggers* the guest code path, which Step 2
+  addresses through cheaper means.
+
+---
+
+## Q5 — Methodology assessment
+
+**Current methodology relied on matched-prefix as a progression
+proxy.  This assumption is now empirically falsified**: +2,960
+events of matched-prefix advancement produced 0 units of progression
+(`swaps=1, draws=0` across 25+ iterates).
+
+### Proposed alternative metric
+
+**Option 6 (composite `progression_score`)**:
+
+```
+progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+                  + 0.001 * matched_prefix
+```
+
+Continuous gradient; honest about wedge-solving vs. canonicalization
+priority.  Requires ~10 LOC to add to `digest.json`.
+
+Discipline: tag every iterate as either
+"**canonicalization only — no progression**" or
+"**progression**".  Cap at 5 consecutive canonicalization-only
+iterates before mandatory pivot to wedge-attack work.
+
+### New reading-error #39
+
+> **#39 (matched-prefix as progression proxy)**: matched-prefix
+> measures engine-to-engine divergence point, NOT game-to-game
+> functional gap.  When the wedge is on a different thread than the
+> matched-prefix anchor thread, advancing matched-prefix is
+> orthogonal to unwedging.  Future audits MUST distinguish "ours's
+> tid-X diverges from canary's tid-Y" from "ours's tid-X is *blocked
+> because tid-Z is wedged*", and target the wedge directly when
+> present.
+
+---
+
+## Counterintuitive findings (anti-anchoring)
+
+Per Tripstones in the task brief:
+
+### 1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap.
+
+The shared boot-init `VdSwap` fires in both.  Ours's `swaps=1` metric
+is "achieved, just at the same point canary also did it".  The
+divergence is NOT "ours can't do the first swap"; it's "ours can't do
+the SECOND through Nth swap (the gameplay loop)".
+
+### 2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.
+
+`canary-jitter-1.jsonl` shows 12,092 VdSwap calls on canary tid=13 in
+90 s wallclock — definitively in the gameplay rendering loop, not
+pre-first-draw.  Ours's tid analogous to canary tid=13 emits ~80
+events total before wedging — definitively before gameplay starts.
+The "both engines pre-first-draw" hypothesis is FALSE.
+
+### 3. The matched-prefix metric is on the WRONG thread.
+
+Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main
+threads.  But the wedge is on **tid=13 in both engines** — the
+renderer thread.  Tid=1's matched-prefix can advance 105,128 events
+without ever touching the wedge.
+
+### 4. The "boot-state-machine" framing is misleading.
+
+There's no monolithic boot state machine.  There are ~28 threads in
+canary, each running their own lifecycle, communicating via shared
+kernel objects.  The bottleneck isn't a state transition; it's a
+THREAD ACTIVATION GAP.
+
+### 5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.
+
+The vtable install IS interesting but it's downstream of the producer
+gap.  Producers must be running to populate the work queue, which
+gets the worker to do its thing, which signals the wedge, which lets
+the activation chain continue, which calls `sub_824FD240+0x24`,
+which writes the vtable.  Fixing the vtable install in isolation
+(e.g., via a host-side mem-write hack) doesn't help if no producer
+is feeding work to the workers.
+
+---
+
+## Cascade prediction confidence
+
+- A — canary boot trajectory characterized: **DONE, HIGH** (canary-jitter-1.jsonl provides direct evidence).
+- B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": **DONE, MEDIUM-HIGH** (AUDIT-069 S5 "other producers 25 vs 1" finding).
+- C — shortest-path roadmap with ≤5 steps: **DONE, MEDIUM** (5 steps; Step 1 confidence ~60%).
+- D — alternative metric proposed: **DONE, HIGH** (Option 6 composite, plus reading-error #39).
+
+---
+
+## Open questions / known unknowns
+
+1. **What is the bootstrap trigger for canary's `sub_824FD240+0x24`?**
+   Roadmap Step 2 addresses.  Could be answered in <1 session of
+   canary jsonl analysis.
+2. **Does Step 1's crowbar produce a clean wedge-unblock, or does it
+   reveal additional unmodelled state in the ctx object?** Empirical;
+   testable in one session.
+3. **Are canary's XAudio threads (tids 14/15) the actual missing
+   producer, or are they downstream of the same trigger?** Worth a
+   targeted probe before Step 1; ~50 LOC ours-side to log
+   NtResumeThread on the XAudio entry PCs.
+4. **Will the AUDIT-067 "vtable install is host-side" finding
+   resurface?** No — AUDIT-068 S4 falsified this; the writer is
+   GUEST PC `sub_824FD240+0x24`.  The "host-side" framing was a
+   mis-read of the POD-copy semantics (reading-error #36).
+
+---
+
+## Recommended next action
+
+**Dispatch a "progression iterate" implementing Step 1 of the
+roadmap** (`--force-spawn-workers` crowbar, ~80-150 LOC ours-side).
+This is a high-variance, high-reward iterate; expected outcome is
+either `swaps ≥ 2, draws ≥ 1` (success — wedge structurally
+isolated to thread activation) or an informative failure mode (e.g.,
+worker faults at first vtable bctrl indicating additional state
+needed in ctx object).  Time-box: 1 session, max 2h.
+
+If Step 1 succeeds in ANY way (even if draws stays 0), the next
+iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl).
+This step has minimal risk and uses existing tooling.
+
+If Step 1 fails completely (panic / segfault unrecoverable), revert
+the crowbar and reframe: the wedge may be in ours's kernel-handler
+implementations themselves, not just bootstrap activation.  At that
+point a deeper Path β engine investigation is unavoidable.
+
+---
+
+## Memory hygiene note
+
+This review is read-only.  xenia-rs HEAD unchanged.  canary HEAD
+unchanged.  sylpheed.db unchanged.  No new artifacts beyond this
+directory.
+
+After dispatching Step 1, future memory entries should adopt the
+new `progression_score` + tagging discipline outlined in
+`methodology-assessment.md`.
--- a/audit-runs/review-a-boot-state/shortest-path-roadmap.md
+++ b/audit-runs/review-a-boot-state/shortest-path-roadmap.md
@@ -0,0 +1,253 @@
+# Shortest-path-to-first-gameplay-draw roadmap
+
+**Date**: 2026-05-21
+**Read-only investigation; no LOC changes proposed.**
+**Premise**: 25+ iterates have advanced matched-prefix 102,168 →
+105,128 (+2,960 events) but `draws=0, swaps=1, render_targets=0`
+have not moved.  This roadmap proposes a non-canonicalization path
+forward.
+
+## Definitions
+
+- **First gameplay draw** = the first `VdSwap` call by ours's
+  renderer (the thread spawned at entry `0x822F1EE0`, ours's tid
+  analog of canary tid=13) that emits at least one `PM4_TYPE3
+  DRAW_INDX` packet into the ringbuffer.
+- **Observable success criterion**: `draws ≥ 1, swaps ≥ 2,
+  unique_render_targets ≥ 1` in `xenia-rs check --stable-digest`
+  output.  At least one frame from the **renderer thread** (not the
+  boot-init swap that ours already emits).
+
+## Why current iteration has stalled
+
+The wedge has been mapped and remapped 20+ times.  Every audit
+correctly identifies symptoms; every fix correctly canonicalizes a
+diff-tool divergence.  But the wedge is **structurally cyclic**: the
+worker cluster that signals the wait is downstream of the wait
+completing.  Standard "find the divergent kernel call, mirror canary's
+semantics" has saturated.
+
+Two strategies remain that have NOT been tried at full scope:
+
+1. **(A) Decouple the cycle by faking the worker activation**:
+   directly call `sub_825070F0` from a host shim, or directly spawn
+   the 4 worker threads with the right ctx, sidestepping the
+   activation chain.  This is a *crowbar*: it doesn't fix the
+   underlying bootstrap bug, but it tests "are the workers
+   functionally correct IF activated."  If they signal the wedge and
+   ours then reaches first draw, we know the bug is *exclusively* in
+   the activation gate, and we can attack just that.
+
+2. **(B) Find what triggers `sub_824FD240+0x24`'s POD-copy in canary**.
+   AUDIT-068 Session 4 pinned the install epoch of vtable
+   `0x8200A1E8` to this writer site.  But the *caller* of
+   `sub_824FD240` — what guest call leads to it firing — is
+   unidentified.  In ours, `sub_824FD240` fires 0× because the call
+   chain `sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240`
+   is downstream of the tid=13 wedge.  So we have circular reasoning
+   again — UNLESS Strategy A is applied first.
+
+The roadmap below uses Strategy A as a wedge-crowbar and Strategy B
+as the principled fix that follows.
+
+## Roadmap
+
+### Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80–150 LOC)
+
+**Action**: in `xenia-rs` add a debug-only cvar
+`--force-spawn-workers` that, when set, after some bootstrap
+checkpoint (e.g., first `VdInitializeRingBuffer` return), manually
+spawns 4 ExCreateThread-equivalent guest threads with:
+
+- entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8`
+- ctx_ptr = run-determined; allocate a fresh
+  `ANON_Class_713383D7`-shaped object on the unified heap and write
+  vtable `0x8200A1E8` to slot 0 (mirror the POD-copy at
+  `sub_824FD240+0x24`)
+- stack_size 65536, suspended=True initially, then NtResumeThread
+
+**Expected effect**:
+
+- If the workers run correctly and signal the wedge: ours's tid=13
+  unblocks, tid=1's join completes, normal game-loop begins.
+  `draws ≥ 1, swaps ≥ 2`.
+- If the workers fail (e.g., faulting because the ctx object's other
+  fields aren't initialized): we learn what *else* needs to be
+  installed alongside the vtable.
+
+**Failure modes to expect**:
+
+- The worker entries dispatch via vtable slots 35/36/37/38 of the
+  ANON_Class — those slots also need to be populated.  Audit-067
+  static analysis shows the vtable has 7 entries; the worker entries
+  use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable)
+  per `sub_825070F0.md` line 32-37.  So we'll need a parent class /
+  derived class layout.
+- The ctx object also has refcount/header fields that must be
+  initialized — see AUDIT-068 Session 3 finding of 12-byte struct
+  copy `{vptr, self, self}` followed by refcount=1.
+
+**LOC budget**: 80-150 LOC ours-side; 0 LOC canary.
+**Read-only fallback**: if force-spawn fails immediately, we've still
+captured the failure mode, which is informative.
+**Risk**: high — this is structurally a hack.  Acceptable as a
+diagnostic.
+
+### Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC)
+
+**Action**: with Step 1's crowbar enabled, ours reaches the
+post-wedge code path.  Compare ours and canary on what `import.call`
+(kernel API) sequence the **caller** of `sub_824FD240` makes
+immediately before the POD-copy install.
+
+The caller chain (per AUDIT-064/068) is:
+
+```
+sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]
+```
+
+So `sub_824F7800` calls `sub_824FD240` at offset `+0x38`, BEFORE it
+calls `sub_825070F0` at offset `+0x320`.
+
+Question: what does `sub_824F8398`'s caller (one level up,
+`sub_821B55D8`) pass as arguments, and what kernel APIs run in
+between?  We need to trace tid=6's events in canary in the wallclock
+window [9.4 s, 9.6 s] — the install epoch.
+
+**LOC budget**: 0.  Pure event-stream analysis on captured canary
+jsonl (we already have `canary-jitter-1.jsonl`, 18.7M events).
+**Output**: an ordered list of kernel calls just before
+`sub_824FD240+0x24` fires.  If any are missing in ours, that's a
+candidate gap.
+
+### Step 3 — Mirror the trigger in ours (variable LOC)
+
+Once Step 2 names the missing kernel call(s), implement them in ours
+following Phase C cadence (verify per-call return values match canary;
+add diff-tool tests; document in memory).
+
+**LOC budget**: depends on what's missing.  Could be 10–500 LOC.
+
+### Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)
+
+With Step 3's fix in place, remove `--force-spawn-workers`.  Re-run
+ours.  If the natural bootstrap chain runs and `draws ≥ 1, swaps ≥ 2`,
+we've fixed the bug.
+
+If progression still fails without the crowbar, there's another gap;
+re-enter at Step 2 with a refined trigger search.
+
+### Step 5 — Validate gameplay frame parity (~0–50 LOC)
+
+Capture renderer-thread VdSwap counts at 90 s wallclock in both
+engines.  Target: ours's renderer emits within ±30% of canary's
+12,092 VdSwap/90s.  If yes: first-draw is reached and sustained.
+
+If ours's renderer emits but at a much lower rate, that's a follow-up
+performance issue, not a correctness one.  Defer.
+
+## Expected progression per step
+
+| Step | Expected `swaps` | Expected `draws` | Expected `unique_render_targets` | LOC delta |
+|---|---:|---:|---:|---:|
+| Pre-roadmap | 1 | 0 | 0 | — |
+| Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 |
+| Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 |
+| Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 |
+| Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) |
+| Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 |
+
+## What's NOT on this path (explicitly deferred)
+
+1. **Host-audio bridge / XAudio resume**: the XAudio thread tids 14/15
+   spawning suspended-and-never-resumed in ours is real but parallel
+   to the worker-cluster wedge.  In canary, both threads run; in ours,
+   neither runs.  Pursuing XAudio fixes does not address the
+   graphics-blocking wedge.  Defer to a separate
+   "post-first-draw" audit cluster.
+2. **HID / controller**: Sylpheed's intro movie / title screen play
+   without user input.  HID is irrelevant for first-draw.
+3. **XAM content / save games**: irrelevant for first-draw; the
+   intro/title screens don't require save-game enumeration.
+4. **Scheduler determinism** (per `scheduler_determinism_plan` /
+   Phase D Stages 0-4): null result, off-path.  The wedge is upstream
+   of any contention.  Defer indefinitely or close.
+5. **Diff-tool canonicalization** (Phase C-style fixes): saturated on
+   moving matched-prefix without moving progression.  **Halt** further
+   work in this class until Step 4 lands and re-baselines the diff
+   workload.
+6. **AUDIT-068 host-side install probes**: superseded by AUDIT-068
+   Session 4 (writer identified at GUEST PC `sub_824FD240+0x24`).
+   The remaining question is *what triggers* `sub_824FD240`, which
+   Step 2 addresses.
+
+## Alternative path (rejected)
+
+**Skip the crowbar; do the trigger investigation cold.**  Read canary
+source for `sub_824FD240` callers, walk upward, identify the trigger.
+Why rejected: `sub_824FD240` is GAME code, not canary engine code —
+the file we'd "read" is the disassembly of the XEX.  We'd need to
+disassemble Sylpheed's RE'd PE and trace the call graph by hand.  Per
+sylpheed.db, `sub_824FD240`'s static caller is `sub_824F7800+0x38`
+(in line with AUDIT-064).  But what guest *call* causes `sub_824F7800`
+to be invoked is itself a multi-fn upstream investigation that
+returns to the same wedge cycle.  The crowbar bypasses this paradox.
+
+## Risk assessment
+
+- **Step 1 catastrophic failure**: ours's emulator panics or
+  segfaults when the force-spawn workers run.  Mitigation: gate
+  behind `--debug-only` cvar; ensure ours's CPU executes the worker
+  entries in normal sandboxed PPC JIT; if they fault on missing
+  guest state, log and exit cleanly.
+- **Step 1 "succeeds but draws=0 anyway"**: the workers run but
+  ours's tid=13 still doesn't unblock — there's an unmodelled state
+  beyond just the missing thread spawns.  Mitigation: log every event
+  the new workers emit; compare with canary's tid=27/28/29 streams in
+  `canary-jitter-1.jsonl`.
+- **Step 3 LOC explosion**: the trigger turns out to be a large
+  subsystem (XAM content, XCONFIG, etc.).  Mitigation: scope-cut to
+  a stub that returns "canary-equivalent" values without full
+  implementation.
+
+## Confidence levels
+
+- Step 1 unblocks the wedge if executed correctly: **MEDIUM** (60%).
+  Honest assessment: 25 prior audits have not unblocked it through
+  natural fixes, so the crowbar approach is novel and the failure
+  mode may not match expectations.
+- Step 2 identifies a trigger in ≤1 session: **HIGH** (85%) — the
+  canary jsonl already has the data; analysis is mechanical.
+- Step 3 LOC budget ≤500: **MEDIUM** (50%) — depends entirely on Step
+  2's answer.
+- Step 4 natural bootstrap works post-Step-3: **MEDIUM** (50%) —
+  there may be additional gaps the crowbar masked.
+
+## Memory hygiene
+
+After Step 1 lands (crowbar binary in place), check that
+`xenia-rs/target/release/xenia-rs` builds cleanly with the new cvar.
+Verify Phase B `image_canonical_sha256` is updated (the crowbar
+changes engine LOC); document the new baseline.  Confirm 3× cold
+runs produce identical digests with the crowbar enabled.
+
+## What "winning" looks like
+
+`xenia-rs check --stable-digest -n 50000000` (or higher cap, e.g.
+`-n 500000000` to reach 30 s wallclock) outputs:
+
+```json
+{
+  "instructions": 50000007,
+  "imports": 40390+,
+  "draws": >= 1,
+  "swaps": >= 2,
+  "unique_render_targets": >= 1,
+  "shader_blobs_live": >= 1,
+  "texture_cache_entries": >= 1
+}
+```
+
+…and the value is reproducible across 3 cold runs.  A non-zero
+`draws` value means at least one PM4_TYPE3 DRAW_INDX packet was
+emitted by the renderer thread.