# Review A — boot-state review and shortest-path roadmap **Session type**: PLAN-only. No engine LOC changes; no canary instrumentation changes. Read-only investigation across the existing audit chain artifacts. **Date**: 2026-05-21 **Companion documents** (in this directory): - `canary-boot-trajectory.md` — canary's call chain from entry_point to first gameplay draw, with wallclock timestamps. - `ours-wedge-localization.md` — precise where-ours-stops, in graph terms. - `shortest-path-roadmap.md` — 3-5 step roadmap with expected progression delta per step. - `methodology-assessment.md` — alternative metric proposal. This `plan.md` summarizes the five framing questions with answers backed by file:line citations. --- ## Q1 — What is "first draw" in canary's Sylpheed boot? **Two distinct "draws" must be disambiguated.** ### Q1.a: First boot-init `VdSwap` (the swap=1 event) Canary's tid=6 (guest main) emits **one** `VdSwap` at ~9.5 s wallclock, immediately after the GPU subsystem init sequence `VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback → VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer`. This swap publishes the boot framebuffer and contains no draw packets. **Ours also reaches this swap** — visible in `phase-w-wedge-reattack/ours-postfix.jsonl` at idx 105283 (host_ns 496,276,229). This is what produces ours's `swaps=1` metric. Both engines reach this point. **It is NOT the gate.** ### Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event) Canary's renderer tid=13 (entry `0x822F1EE0`, spawned suspended at 1.671 s) wakes after the `sub_825070F0` worker fan-out at host_ns ≈ 10.383 s and begins emitting `VdGetSystemCommandBuffer` / `VdSwap` pairs at ~150 fps. Canary's tid=13 emits **12,092 VdSwap calls in the 90-s window** (per `phase-nonmatch-investigation/canary-tid-profiles.md:21`). The first of these is the **first gameplay draw**, fired at ~10.7 s wallclock — about 1.2 s after the `sub_825070F0` fan-out triggers the worker cluster. **Pre-conditions canary establishes before this point** (per `canary-boot-trajectory.md`): 1. Vtable `0x8200A1E8` of `ANON_Class_713383D7` installed at host_ns ≈ 9.4-9.6 s via POD-copy at GUEST PC `sub_824FD240+0x24` (per `project_audit_068_session4_2026_05_20`). 2. Activation chain `sub_822F1AA8 → sub_82173990 → sub_821746B0 → sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800 → bctrl vtable[1] = sub_825070F0` fires on tid=6. 3. `sub_825070F0` spawns 4 worker threads with entries `0x82506528/58/88/B8` and shared ctx `0xBCE251C0`. 4. Workers (canary tids 27/28/29) emit signals that unwedge the `sub_821CB030` Event waits across the cache-file IO completion chain. 5. Renderer tid=13's body (entered earlier but blocked on a tid=14/15 XAudio-coordinated event) unblocks; per-frame `VdGetSystemCommandBuffer` / `VdSwap` loop begins. --- ## Q2 — What is ours's actual progress, and what's the wedge root cause? **Ours stops at the first wait in the activation chain.** Specifically: - **tid=1 (main)** wedged at `sub_82173990+0x2D4` (PC `0x824ac578` = `do_wait_single`) on handle `0x12c8` = `Thread(id=13)` — waiting for the renderer's thread handle to signal (which happens only when tid=13 calls `ExTerminateThread`). - **tid=13 (renderer / cache-IO worker)** wedged at `sub_821CB030+0x1B0` on handle `0x12d0` = `Event/Auto`, created by itself via `NtCreateEvent` at `sub_821CB030+0x128`. `signals=0, wakes=0` — ``. - **`sub_825070F0` fires 0×** at any horizon probed. Citation: `phase-w-wedge-reattack/halt-on-deadlock-dump.txt` + `phase-w-wedge-reattack/current-state.md`. ### Root cause (at one structural level deeper than the wedge symptom) **Per AUDIT-069 Session 5 (the most recent measurement):** - Canary fires 414 `NtReleaseSemaphore` calls on the work-queue semaphore in the 90-s window. - Ours fires 99 (24%). - Breakdown: Worker (382 vs 90), Main (7 vs 8), **Other producers (25 vs 1)**. The "**other producers (25 vs 1)**" gap is the load-bearing discrepancy. Canary has **24 additional thread sources** releasing the work semaphore during bootstrap that ours does not have. These correspond to: 1. The 4 `sub_825070F0` workers (canary tids 27/28/29 + 1) — absent in ours. 2. XAudio render threads (canary tids 14/15, spawned suspended in both engines, **resumed only in canary**). 3. The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) — 8 helpers including file-IO and NtWaitForMultipleObjectsEx workers — absent in ours. ### The ONE structural issue > **Ours never reaches `sub_825070F0` because the activation chain > that calls it is downstream of tid=13's wedge; and tid=13's wedge > is downstream of the worker cluster activation; and the worker > cluster activation is `sub_825070F0`. This is a self-referential > lock.** Canary breaks the lock because some part of the bootstrap *pre-activates* the producers (probably via XAudio thread resume at 1.726 s, which then runs ahead, populates the work queue, signals events, etc.). Ours never resumes the XAudio threads — they're spawned suspended and stay that way. **The single highest-leverage gap is the XAudio thread resume,** because (a) it happens early (1.726 s in canary vs. ours's wedge which fixes around 1.4 s — i.e. the resume should happen before the wedge), (b) it activates the dominant event producers, and (c) AUDIT-069 S5's "other producers 25 vs 1" finding implicates exactly this class of thread. --- ## Q3 — Shortest-path-to-first-draw roadmap Three to four steps (full detail in `shortest-path-roadmap.md`): - **Step 1 (~80-150 LOC, ours-side)**: add `--force-spawn-workers` cvar that crowbars `sub_825070F0` activation by directly spawning the 4 worker threads with the right ctx after `VdInitializeRingBuffer` returns. Tests "are the workers functionally correct if activated" and "does activating them unwedge sub_821CB030." - **Step 2 (~0 LOC)**: with Step 1 active, mine the canary jsonl for the kernel-call sequence on tid=6 in the wallclock window [9.4 s, 9.6 s] (the install epoch). Identify what guest call triggers `sub_824FD240+0x24`'s POD-copy of the vtable in canary. - **Step 3 (~10-500 LOC, depending on what Step 2 finds)**: mirror that trigger in ours — likely a missing kernel-import return value or a missing post-condition that the trigger inspects. - **Step 4 (~0 LOC; remove crowbar)**: re-test ours without `--force-spawn-workers`. Verify natural bootstrap reaches `sub_825070F0` activation. - **Step 5 (~0-50 LOC)**: measure renderer-thread VdSwap rate over 90 s wallclock; target ±30% of canary's 12,092 calls. Expected delta: | After step | `swaps` | `draws` | `unique_render_targets` | |---|---:|---:|---:| | Pre | 1 | 0 | 0 | | Step 1 (crowbar) | 2+ | 1+ | 1+ | | Step 4 (decrowbar) | 2+ | 1+ | 1+ | | Step 5 (parity) | 100+ | 100+ | 1-5 | --- ## Q4 — What's NOT on the shortest path Explicitly deferred (full rationale in `shortest-path-roadmap.md`): - **Audio (host-audio-* / XAudio implementation)** — even though XAudio thread resume MAY be the trigger from Q2, ours's existing XAudio shim is sufficient for the workers to bootstrap if they receive the right kernel-call sequence. Full XAudio implementation is beyond first-draw scope. - **HID** — Sylpheed's intro/title screens are auto-advance; no input needed. - **XAM content / save games** — not on first-draw path. - **Scheduler determinism work** (Phase D Stages 0-4 and beyond) — null result; the wedge is upstream of contention scheduling. Close or indefinitely defer. - **Diff-tool canonicalization** (Phase C+N for N > 25) — saturated on matched-prefix without progression; halt this work class until Step 4 lands and the workload re-baselines. - **AUDIT-068 host-side install probes** — superseded by AUDIT-068 Session 4 finding (writer is GUEST PC, not host). The followup question is what *triggers* the guest code path, which Step 2 addresses through cheaper means. --- ## Q5 — Methodology assessment **Current methodology relied on matched-prefix as a progression proxy. This assumption is now empirically falsified**: +2,960 events of matched-prefix advancement produced 0 units of progression (`swaps=1, draws=0` across 25+ iterates). ### Proposed alternative metric **Option 6 (composite `progression_score`)**: ``` progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets + 0.001 * matched_prefix ``` Continuous gradient; honest about wedge-solving vs. canonicalization priority. Requires ~10 LOC to add to `digest.json`. Discipline: tag every iterate as either "**canonicalization only — no progression**" or "**progression**". Cap at 5 consecutive canonicalization-only iterates before mandatory pivot to wedge-attack work. ### New reading-error #39 > **#39 (matched-prefix as progression proxy)**: matched-prefix > measures engine-to-engine divergence point, NOT game-to-game > functional gap. When the wedge is on a different thread than the > matched-prefix anchor thread, advancing matched-prefix is > orthogonal to unwedging. Future audits MUST distinguish "ours's > tid-X diverges from canary's tid-Y" from "ours's tid-X is *blocked > because tid-Z is wedged*", and target the wedge directly when > present. --- ## Counterintuitive findings (anti-anchoring) Per Tripstones in the task brief: ### 1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap. The shared boot-init `VdSwap` fires in both. Ours's `swaps=1` metric is "achieved, just at the same point canary also did it". The divergence is NOT "ours can't do the first swap"; it's "ours can't do the SECOND through Nth swap (the gameplay loop)". ### 2. Tripstone 4 verified: canary does reach gameplay draws, ours does not. `canary-jitter-1.jsonl` shows 12,092 VdSwap calls on canary tid=13 in 90 s wallclock — definitively in the gameplay rendering loop, not pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80 events total before wedging — definitively before gameplay starts. The "both engines pre-first-draw" hypothesis is FALSE. ### 3. The matched-prefix metric is on the WRONG thread. Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main threads. But the wedge is on **tid=13 in both engines** — the renderer thread. Tid=1's matched-prefix can advance 105,128 events without ever touching the wedge. ### 4. The "boot-state-machine" framing is misleading. There's no monolithic boot state machine. There are ~28 threads in canary, each running their own lifecycle, communicating via shared kernel objects. The bottleneck isn't a state transition; it's a THREAD ACTIVATION GAP. ### 5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch. The vtable install IS interesting but it's downstream of the producer gap. Producers must be running to populate the work queue, which gets the worker to do its thing, which signals the wedge, which lets the activation chain continue, which calls `sub_824FD240+0x24`, which writes the vtable. Fixing the vtable install in isolation (e.g., via a host-side mem-write hack) doesn't help if no producer is feeding work to the workers. --- ## Cascade prediction confidence - A — canary boot trajectory characterized: **DONE, HIGH** (canary-jitter-1.jsonl provides direct evidence). - B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": **DONE, MEDIUM-HIGH** (AUDIT-069 S5 "other producers 25 vs 1" finding). - C — shortest-path roadmap with ≤5 steps: **DONE, MEDIUM** (5 steps; Step 1 confidence ~60%). - D — alternative metric proposed: **DONE, HIGH** (Option 6 composite, plus reading-error #39). --- ## Open questions / known unknowns 1. **What is the bootstrap trigger for canary's `sub_824FD240+0x24`?** Roadmap Step 2 addresses. Could be answered in <1 session of canary jsonl analysis. 2. **Does Step 1's crowbar produce a clean wedge-unblock, or does it reveal additional unmodelled state in the ctx object?** Empirical; testable in one session. 3. **Are canary's XAudio threads (tids 14/15) the actual missing producer, or are they downstream of the same trigger?** Worth a targeted probe before Step 1; ~50 LOC ours-side to log NtResumeThread on the XAudio entry PCs. 4. **Will the AUDIT-067 "vtable install is host-side" finding resurface?** No — AUDIT-068 S4 falsified this; the writer is GUEST PC `sub_824FD240+0x24`. The "host-side" framing was a mis-read of the POD-copy semantics (reading-error #36). --- ## Recommended next action **Dispatch a "progression iterate" implementing Step 1 of the roadmap** (`--force-spawn-workers` crowbar, ~80-150 LOC ours-side). This is a high-variance, high-reward iterate; expected outcome is either `swaps ≥ 2, draws ≥ 1` (success — wedge structurally isolated to thread activation) or an informative failure mode (e.g., worker faults at first vtable bctrl indicating additional state needed in ctx object). Time-box: 1 session, max 2h. If Step 1 succeeds in ANY way (even if draws stays 0), the next iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl). This step has minimal risk and uses existing tooling. If Step 1 fails completely (panic / segfault unrecoverable), revert the crowbar and reframe: the wedge may be in ours's kernel-handler implementations themselves, not just bootstrap activation. At that point a deeper Path β engine investigation is unavoidable. --- ## Memory hygiene note This review is read-only. xenia-rs HEAD unchanged. canary HEAD unchanged. sylpheed.db unchanged. No new artifacts beyond this directory. After dispatching Step 1, future memory entries should adopt the new `progression_score` + tagging discipline outlined in `methodology-assessment.md`.