handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,121 @@
# Canary boot-to-first-draw trajectory
**Source data:** `xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl`
(4.4 GB, 18.7M events, 90s wallclock, cold run). Profile builder at
`xenia-rs/audit-runs/phase-nonmatch-investigation/build_profiles.py`.
## TL;DR
- **First boot-time `VdSwap` fires on canary's tid=6 (guest main) at
~9.5 s wallclock**, immediately after the rendering subsystem is
initialized. This is the *empty / system-command-buffer* swap that
ours also reaches (ours's metric `swaps=1` is this swap).
- **First gameplay `VdSwap` (intro-movie frame) fires on canary's
tid=13 (renderer) starting at ~10.7 s wallclock**, after the
`sub_825070F0` worker fan-out at host_ns ≈ 10.382-10.384 s. Canary
tid=13 emits **12,092** `VdSwap` + `VdGetSystemCommandBuffer` calls
in the 90-s window, i.e. ~150 fps sustained.
- The gating event between "boot swap" and "first gameplay swap" is
the 4-worker fan-out spawned by `sub_825070F0` at PCs `0x82506528 /
0x82506558 / 0x82506588 / 0x825065B8` with ctx `0xBCE251C0`. Three
of the four workers begin emitting events at host_ns ≈ 10.705 s
(tids 27/28/29 — see `canary-tid-profiles.md` row 33-35).
## Phase-by-phase trajectory
| t (host_ns) | Phase | What | Citation |
|------:|-------|------|----------|
| 0660 ms | XEX load / startup | `XexLoadImage`, ELF→guest init, kernel-state ctor. Spawn tid=6 ("guest main") at host_ns=660 ms. | `phase-nonmatch-investigation/canary-tid-profiles.md:14` |
| 660 ms1.42 s | **Pre-spawn init** | tid=6 sets up TLS, runs CRT init. Establishes vtables / globals. *Sylpheed-specific*: writes `0x8200A1E8` (vtable for `ANON_Class_713383D7`) at the install-epoch host_ns ≈ 9.49.6 s via a 12-byte POD struct copy `{vptr, self, self}` (see `project_audit_068_session3`). **Critical**: this is the vtable whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
| 1.421.94 s | **Main init burst** | 10 thread spawns (tids 817) by tid=6. Ours matches this 1:1. Entries include `0x82181830`, `0x8245A5D0`, `0x82450A28`, `0x82457EF0`, `0x824CD458`, **`0x822F1EE0` (renderer, susp=T)**, `0x824D2878/0x824D2940` (XAudio, susp=T), `0x82178950` (XMA), `0x821748F0` (file IO spawner, susp=T). | `canary-tid-profiles.md:42-55` |
| 1.671 s | **Renderer spawn** | tid=6 calls `ExCreateThread` with entry `0x822F1EE0`, ctx `0xBCE24A40`, suspended=True. Becomes canary tid=13. | `canary-tid-profiles.md:21,49` |
| 1.7261.728 s | **XAudio spawn** | tids 14/15 (XAudio voice-mask poll + sister) spawned suspended. Will dominate event volume (~11M events combined). | `canary-tid-profiles.md:50-51` |
| 1.942.15 s | **Secondary init burst** | 8 more spawns (tids 1825), file-IO + XAM helpers. **Ours emits 0** here — already wedged. | `result.md:48` |
| 9.49.6 s | **vtable install epoch** | Host-side POD struct copy installs `0x8200A1E8` at run-specific arena address (`0xBCE25340` or `0xBCE251C0` per arena drift). This is the ANON_Class_713383D7 instance whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
| ~9.5 s | **Boot-init `VdSwap` (on tid=6)** | After `VdInitializeEngines + VdShutdownEngines + VdInitializeEngines + VdSetGraphicsInterruptCallback + VdSetSystemCommandBufferGpuIdentifierAddress + VdInitializeRingBuffer + VdEnableRingBufferRPtrWriteBack + VdGetSystemCommandBuffer`, tid=6 emits **one** `VdSwap` to publish the boot framebuffer. draws=0 still (no PM4 draw packets). | Mirror of `ours-postfix.jsonl` idx 105044-105285; canary same shape. |
| 10.080 s | tid=26 second-call helper | `0x821748F0` second invocation. | `canary-tid-profiles.md:32` |
| **10.383 s** | **sub_825070F0 worker fan-out** | **Four `ExCreateThread` calls in 1 ms** spawn entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8` all sharing ctx `0xBCE251C0` (the ANON_Class instance). These are the workers that consume cache-file IO and signal the wedge event(s) that AUDIT-049 found dangling in ours. | `canary-tid-profiles.md:63-66`, `sub_825070F0.md` |
| 10.7 s | **Worker resume / first events** | tids 27, 28, 29 emit their first events. tid=28 dominates (3.26M events) doing file IO (`530× NtReadFile` of `cache:\…`), heavy CS contention (1.07M RtlEnterCS), and signaling the wedge events. | `canary-tid-profiles.md:33-35`, `sub_82452DC0.md` |
| ~10.7+ s | **Renderer wakes** | Once `sub_825070F0` workers begin, the events that canary's tid=13 was waiting on get signaled. tid=13 transitions Blocked→Running, starts producing `VdGetSystemCommandBuffer`/`VdSwap` pairs at ~150 fps. | `canary-tid-profiles.md:21`, `result.md:30-39` |
| ~10.790 s | **Sustained rendering** | tid=13 emits 12,092 `VdSwap` calls. Intro movie ⇒ title screen ⇒ gameplay (depends on user input). In an unattended cold run, canary likely plateaus on the title screen but is genuinely rendering. | `canary-tid-profiles.md:21` |
## Canary call-chain from entry_point to first gameplay draw
```
canary tid=6 (guest main)
entry_point
→ sub_8216EA68 (post-init dispatcher)
→ sub_822F1AA8 (game-loop dispatcher) (sub_822F1AA8.md)
→ bctrl vtable[0]({sub_82175330 → tail → sub_82173990})
→ sub_82173990 (sync task-spawn-and-join) (sub_82173990.md)
→ bl sub_821746B0 (alloc task + spawn worker thid=17, F8000094)
[worker thid=17 runs body sub_821748F0
→ sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08
→ sub_821CB030 (creates Event, submits work via sub_82452DC0)
→ … cache file loads (cache:\aab216c3\..., cache:\87719002\..., etc.)
→ spawns child workers via ExCreateThread(...,821C4AD0,...)
→ eventually ExTerminateThread(0)]
→ KeWaitForSingleObject(thid=17.handle) INFINITE
[blocks ~445 log lines wallclock; completes when thid=17 terminates]
← returns
← returns to sub_822F1AA8 outer loop
→ iterates sub_821741C8 → sub_82172BA0 → bctrl vtable[6]
→ sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800
→ bctrl vtable[1] = sub_825070F0 (sub_825070F0.md)
→ 4× ExCreateThread(...,0x82506528/58/88/B8, ctx=0xBCE25xxx, susp=T)
→ 4× NtResumeThread / scheduler enables the workers
[workers tids 27/28/29/+1 begin executing]
→ outer loop continues
→ KeWaitForSingleObject (4040×/60 s = ~67 fps frame-pacing wait)
→ bctrl vtable[2] → various per-frame work
→ tid=6's main loop produces no VdSwap directly past the init swap
canary tid=13 (renderer; spawned by tid=6 at 0x822F1EE0)
[stays suspended OR Blocked-on-event until worker fan-out at 10.38 s]
→ after wake, enters render loop:
while (running) {
VdGetSystemCommandBuffer(...) ; 12,092× / 90 s
… build per-frame command buffer …
VdSwap(buffer_ptr, fetch_ptr, …) ; 12,092× / 90 s
}
```
## Pre-conditions canary establishes before first gameplay draw
In time order, all must hold:
1. **GPU subsystem initialized**: `VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback`. Ours: ✓ (idx 105044-105117).
2. **Renderer thread alive**: tid=13 created suspended via `ExCreateThread(entry=0x822F1EE0, susp=T)`. Ours: ✓ (idx 105348).
3. **Worker-cluster activation**: 4 workers spawned by `sub_825070F0` consuming `sub_82452DC0` work. Ours: **✗ 0 fires**.
4. **`sub_821CB030`'s Event signaled**: the per-load completion event created at `sub_821CB030+0x128` and waited at `+0x1AC` must be signaled by a `sub_825070F0` worker. Ours: **`NO_SIGNALS_DESPITE_WAITS` on handle 0x12d0**.
5. **`sub_82173990`'s join-wait completes**: tid=6's wait at `sub_82173990+0x2D0` on the thid=17 thread handle. Ours: **✗ tid=1 stuck on handle 0x12c8 (= tid=13's thread handle)**.
6. **Renderer wakes**: per AUDIT-049, the worker-cluster must signal whatever guards tid=13's body. Canary: ✓. Ours: **✗ tid=13 itself wedges in sub_821CB030**.
## Numerical signature of canary at ~50 s wallclock (for reference)
- 18.7 M events / 28 tids.
- Renderer tid=13: 594 k events, including 12,092 VdSwap.
- Worker tid=28 (sub_825070F0 worker 0): 3.26 M events.
- XAudio tid=14/15: 6.15 M / 4.78 M events.
- ours at 50 M-instr / ~3 s wallclock: 121 k events / 13 tids. Renderer
tid=13 in ours: ~80 events (wedged).
- The order of magnitude differs by ~150× because ours wedges ~7 s before
canary's `sub_825070F0` fan-out fires.
## Uncertainty / open questions
- **What is the precise host-side install of the `ANON_Class_713383D7`
vtable `0x8200A1E8`?** AUDIT-068 sessions 14 localized this to a
POD struct copy in the install epoch [9.4 s, 9.6 s], with the writer
identified at GUEST PPC `sub_824FD240+0x24` (NOT a host-side kernel
import as initially feared). But in ours, `sub_824FD240` and its
callers `sub_824F7800/CD0/8398` fire 0× because that chain is
downstream of the tid=13 wedge. See `project_audit_068_session4`.
- **First "gameplay draw" precisely**: the first VdSwap that emits PM4
draw packets (e.g. `PM4_TYPE3 DRAW_INDX`) into the ringbuffer. Need
to inspect canary's PM4 ring at host_ns ≈ 10.7 s to confirm. AUDIT
history hasn't disambiguated boot/empty-swap from gameplay-swap at
the PM4-packet level. This is a methodology gap.
- **What unwedges canary's worker-cluster activation chain?** AUDIT-068
pinned the install epoch but not the **trigger** — what guest call
causes `sub_824FD240+0x24`'s POD-copy to fire? Identifying the
trigger and replaying it in ours is the unanswered Path β attack.

View File

@@ -0,0 +1,193 @@
# Methodology assessment
## The matched-prefix metric: load-bearing or load-shedding?
Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
advanced:
| Phase | Matched-prefix | Δ |
|---|---:|---:|
| Phase B baseline (pre-C+1) | ~102,168 | — |
| Phase D D-extension landing | 104,607 → 105,046 | +439 |
| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
| Phase | `swaps` | `draws` | `unique_render_targets` |
|---|---:|---:|---:|
| Phase B baseline | 1 | 0 | 0 |
| Phase W | 1 | 0 | 0 |
| Phase C+25 | 1 | 0 | 0 |
**The two metrics are decoupled.** Matched-prefix is moving along
ENGINE-internal divergences (kernel-call return values, thread IDs,
heap arena base addresses). The progression metric is gated by
boot-state activation, which lives one or more layers above the diff
points.
## Why the decoupling happened
Three reading-errors compound:
1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
default-scheduling produces different *intra-thread* event ordering
than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21,
D-extension) correctly hide this jitter — but they hide *real
bootstrap-time divergences too*. Phase W explicitly noted: "If
ours's worker fails to enqueue something canary's worker awaits,
we'd never see the gap because the matched-prefix isn't on the
worker tid in the first place."
2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
process-global dispatchers (e.g., the work-queue semaphore at
handle `0xF800003C` in canary). But the wedge handle `0x12d0`
uses a per-tid create-site SID that does NOT match across engines.
So even when the same logical event exists in both engines, the
diff harness reports SID mismatch and absorbs OR diverges
incorrectly.
3. **#38 (cross-spawn producer paths)**: static reachability (the
sylpheed.db `xrefs` table) misses producer paths that cross
thread-spawn boundaries. The result.md from Phase Non-match shows
canary's tid=14 (XAudio voice-mask poll) communicates with
downstream code via a path that has no static `bl` edge — it
crosses via guest kernel APIs.
## Alternative metric proposals
### Option 1 — `draws ≥ 1` (sharp gate)
**Pros**: directly measures the target. Boolean. Reproducible.
**Cons**: gives no signal during iteration — every iterate before the
breakthrough is `draws = 0`. Loss function is non-smooth.
### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
**Pros**: still sharp; one bit looser than draws. Distinguishes
boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
(`swaps≥2`).
**Cons**: same non-smooth loss. Achievable in principle by a crowbar
without solving the underlying bug.
### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
Compute: events emitted on the thread spawned at entry `0x822F1EE0`
in any 90-s wallclock window. Canary: 594,000. Ours: ~0.
**Pros**: smooth-ish (event count can move slowly). Directly measures
"is the renderer running." Bypasses the diff-tool jitter problem
because it's a per-engine internal count.
**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
ceiling). Could be gamed by a crowbar that resumes the renderer
without unblocking the wedge.
### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids.
**Pros**: directly measures the AUDIT-057 thread-gap. Smooth metric:
each unwedged thread adds 1 to the count.
**Cons**: requires 90-s wallclock runs — ours can't reach this
without solving the wedge first, so it's pre-requisite-equivalent to
Option 3.
### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
Compute: how many `NtReleaseSemaphore` calls on the work semaphore
(handle `0xF800003C` in canary, equivalent in ours) over 90 s
wallclock. Canary: 414. Ours: 99 (24%).
**Pros**: pinpoints the under-production directly. Mechanically
measurable. Already instrumented in canary (audit_70_semaphore_release_watch).
**Cons**: same wallclock requirement; same gameability.
### Option 6 — composite: `progression_score`
Define:
```
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+ 0.001 * matched_prefix
```
This recovers signal during iteration (matched-prefix moves)
without pretending it's progression. The 1000:1 weight ratio
matches the bug-class severity.
**Pros**: continuous gradient over both wedge-solving and
canonicalization work. Honest about which is more important.
**Cons**: arbitrary weights. Composite metrics drift in meaning.
## Recommendation
**Adopt Option 6 (composite progression_score) as the primary
methodology metric**, with a hard secondary gate of "Option 2
(`swaps ≥ 2`) is what matters; everything else is fitness."
Concrete proposal:
1. The `digest.json` output gains a `progression_score` field
computed from the existing fields (zero new instrumentation).
2. Every iterate must report Δprogression_score in its
re-validation.md.
3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
= (small) × Δmatched_prefix) MUST be tagged in their memory entry
as "**canonicalization only — no progression**" and counted
against a *budget*: max 5 consecutive iterates in this class
before mandatory pivot to wedge-attack work.
4. Audits that move `swaps` or `draws` (the high-weight terms) are
tagged "**progression**" and given priority for resource
allocation.
This methodology change costs ~10 LOC in the digest output and
imposes a discipline cap of 5 canonicalization-only audits between
progression attempts.
## Falsification of the matched-prefix-as-proxy belief
Phase C through C+25 explicitly assumed that matched-prefix is a
**proxy** for progression. This assumption is now empirically
falsified:
> +2,960 events of matched-prefix advancement produced exactly
> ZERO units of progression.
Reading-error #39 (newly registered by this review):
> **#39 (matched-prefix as progression proxy)**: matched-prefix
> measures *engine-to-engine divergence point*, not *game-to-game
> functional gap*. When the wedge is on a different thread than the
> matched-prefix anchor thread, advancing matched-prefix is orthogonal
> to unwedging. Future audits MUST distinguish "ours's tid-X main
> thread diverges from canary's tid-Y" from "ours's tid-X main thread
> is *blocked because tid-Z is wedged*", and target the wedge directly
> when present.
## What "progression discipline" looks like in practice
For the next 3 iterates:
- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar). No
diff-tool work. Target: `swaps ≥ 2`.
- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
analysis). No engine LOC. Target: identification of the missing
kernel call(s).
- Iterate N+3: **Step 3 of roadmap** (mirror the trigger). Target:
ours unblocks without the crowbar.
Each iterate must produce a `progression_score` delta report. If
3 iterates in a row produce Δprogression_score ≤ ε (where
ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
again before continuing — this would mean even the crowbar approach
failed and a deeper rethink is needed.
## Closing note
The user's instinct in calling this strategic pause and review was
correct. The matched-prefix-only chain was producing real
canonicalization work but had ceased producing progression. The
roadmap above is one principled attempt at breaking the cycle; if it
fails, the next-level fallback is to formally accept Sylpheed's
boot-state as currently unreachable in ours and pivot to a different
title for the methodology demonstration.

View File

@@ -0,0 +1,205 @@
# Ours wedge localization
**Source data**: `phase-w-wedge-reattack/ours-postfix.jsonl` (50M-instr
cold run, ~3 s wallclock, 121,569 events, 13 tids).
`phase-w-wedge-reattack/halt-on-deadlock-dump.txt` (per-tid state @
deadlock).
## TL;DR
Ours's wedge is **structurally identical** to AUDIT-049 (first found
2026-05-10). Across 25+ subsequent iterates (Phase C+1 … Phase C+25,
Phase D, AUDIT-049 .. AUDIT-069), the wedge has **never moved**:
- **tid=1 (main)** wedges at `sub_82173990+0x2D4` (PC `0x824ac578`,
`do_wait_single`) on **handle `0x12c8`** = `Thread(id=13)` — the
renderer thread's join handle.
- **tid=13 (renderer / cache-IO worker)** wedges at
`sub_821CB030+0x1B0` (PC `0x824ac578`, `do_wait_single`) on
**handle `0x12d0`** = `Event/Auto`, created by tid=13 itself at
`sub_821CB030+0x128` via `NtCreateEvent`. `<NO_SIGNALS_DESPITE_WAITS>`.
- **`sub_825070F0` fires 0×** at any horizon probed (50M, 500M, ∞
wallclock). The 4 workers (entries `0x82506528/58/88/B8`) never
spawn in ours.
This is what audits 049/058/059/060/062/063/064/065/066/067/068/069
collectively call "the wedge."
## Graph view: ours's actual reachable subgraph vs canary's
### What runs in BOTH engines (matched-prefix 105,128)
```
entry_point
└─ early CRT init ✓ ours ✓ canary
└─ subsystem init ✓
├─ VdInitializeEngines (×2, then VdShutdownEngines, then again)
├─ VdInitializeRingBuffer
├─ VdEnableRingBufferRPtrWriteBack
├─ VdSetGraphicsInterruptCallback
└─ VdSetSystemCommandBufferGpuIdentifierAddress
└─ 10× ExCreateThread (the matched first spawn burst)
├─ 0x82181830 / 0x8245A5D0 / 0x82450A28 ✓ ✓
├─ 0x82457EF0 (spawned by tid=10 → tid=11) ✓ ✓
├─ 0x824CD458 (KeWait worker, susp=F) ✓ ✓
├─ 0x822F1EE0 (renderer, susp=T) ✓ ✓
├─ 0x824D2878 / 0x824D2940 (XAudio, susp=T) ✓ ✓
├─ 0x82178950 (XMA, susp=F) ✓ ✓
└─ 0x821748F0 (file IO spawner, susp=T) ✓ ✓
└─ 1× boot-init VdSwap ✓ swaps=1
└─ tid=1 enters sub_8216EA68 → sub_822F1AA8
└─ bctrl vtable[0] of *(0x828E1F08)
└─ sub_82175330 → tail → sub_82173990
└─ sub_821746B0 → spawn worker (= ours tid=13, susp=F)
└─ KeWaitForSingleObject INFINITE on tid=13.handle ← WEDGE
```
### What runs ONLY in canary (the missing subgraph)
```
After tid=6's tid=17 worker (= ours's tid=13) terminates:
sub_82173990 returns to sub_822F1AA8's outer loop
└─ iterates sub_821741C8 → sub_82172BA0 → vtable[6] = sub_821B55D8
→ sub_824F8398 → sub_824F7CD0 → sub_824F7800 → vtable[1] = sub_825070F0
└─ 4× ExCreateThread(entry=0x82506528/58/88/B8, susp=T)
├─ Worker 0 → tid=28 (file IO, 3.26M events)
├─ Worker 1 → tid=27 (36k events)
├─ Worker 2 → tid=29 (91k events)
└─ Worker 3 (0x825065B8 — never resumed in jitter-1 run)
After workers come online:
Canary's secondary spawn burst (1.942.15 s) — 8 helpers (tids 1825)
Canary's tid=14/15 XAudio resumes (~ms after tid=6 spawns them in
susp=T; ours also spawns them susp=T but never resumes them)
Renderer tid=13 unblocks, starts emitting VdSwap at ~150 fps
Per-frame game loop: tid=6 emits `0x822F1BCC` 4040× / 60 s
```
## The wedge dependency graph (cyclic)
```
[tid=1 (main) wedge]
wait on handle 0x12c8 (= tid=13.thread_handle)
only signaled when tid=13 calls ExTerminateThread
tid=13 needs to complete sub_821CB030 body
sub_821CB030 waits on event 0x12d0
only signaled by sub_825070F0 worker cluster
sub_825070F0 never fires in ours
sub_825070F0 is reached via:
sub_82172BA0 → ... → sub_824F7800 → bctrl vtable[1]
↑↑↑ which is downstream of sub_822F1AA8's outer loop
which is downstream of sub_82173990 returning
which is downstream of tid=1's wait completing
← BACK TO TOP
```
This is the **AUDIT-063 self-referential lock**: the activation chain
that produces the signal that unwedges the wait is itself downstream
of the wait completing. In canary, the lock resolves because the
tid=17 worker (= ours tid=13's analog) calls `ExTerminateThread`
**by completing** its `sub_821CB030` body — and that completion is
fed by some OTHER signal source that ours doesn't replicate.
## Where the "other signal source" lives (the actual root cause)
From AUDIT-069 Session 5 (work-semaphore release-rate diff):
> Canary 414 release events vs ours 99 (24% rate). Worker (tid=10/5):
> 382 vs 90. Main (tid=6/1): 7 vs 8. **Other producers: 25 vs 1**.
The discrepancy in "other producers" (25 producers vs 1) is the key.
**Canary has multiple non-worker threads that release the work
semaphore during bootstrap — releasing this semaphore is what feeds
the worker-side wait that eventually causes sub_821CB030's event to
be signaled.** Ours has only one (tid=13 itself, before it wedges).
From AUDIT-069 Session 4 (`sub_82450A68` dispatch loop):
> Ours r3=0x1 (semaphore acquired) 91/91 captures (100%); canary
> r3=0x102 (TIMEOUT) 3/4 (75%).
**Ours's work-semaphore has count > 0 every time tid=5 checks; canary's
times out 75% of the time.** This is a *paradox at face value*: how
can ours have MORE semaphore signals available but still process
LESS work? The S5 reframe resolves it: ours's worker self-releases
the work semaphore from `sub_82450B68+0xCDC/+0xD28` MORE OFTEN than
it consumes, because the consume path early-exits when the dispatch
table doesn't have an entry to process — and the dispatch table
doesn't have entries because the producers (canary's "other 25 tids")
aren't running.
## Bootstrap divergence (when does ours first diverge from canary?)
Per the AUDIT-069 H3 framing: somewhere in the *bootstrap* of the
worker-cluster, a producer thread that should be alive in canary
isn't alive in ours. Candidates:
1. **XAudio render thread (canary tid=14/15)**: spawned suspended in
ours, **never resumed**. Canary resumes within ~1 ms of spawn at
1.726 s. Canary's tid=14 calls `XAudioGetVoiceCategoryVolumeChangeMask`
26,126× and is one of the top event producers. This thread runs
the host-audio bridge feed loop — *if it isn't running, downstream
producers expecting audio cues block.*
2. **XMA decoder (tid=16, entry `0x82178950`)**: spawned non-suspended
in both; ours emits 0 events from this thread because it presumably
waits on a kernel object that's never signaled.
3. **NtWaitForMultipleObjectsEx worker (canary tid=21, entry
`0x824563E0`)**: 1M events in canary; absent in ours (canary's
second spawn burst doesn't happen).
4. **The "tid=10 helper" (canary tid=10, entry `0x82450A28`)**: ours
has this thread (ours tid=5), but it's running the dispatch loop
`sub_82450A68` in a degenerate fast-path mode (S4 finding).
The most defensible single-root claim:
> **Ours never resumes the XAudio threads (tid=14/15), because the
> guest API call that triggers their resume in canary doesn't fire in
> ours, and as a knock-on the worker cluster never gets the bootstrap
> producer it expects.**
But this claim is not yet proven; AUDIT-068/069 stopped short of
identifying the resume trigger.
## Verified-but-doesn't-help LOC budget across recent audits
(For methodology context — every recent audit landed correctness or
diagnostic LOC but moved progression 0%.)
| Audit / Phase | LOC added | Component | Effect on progression |
|---|---:|---|---|
| AUDIT-067 vptr-mem-watch | +422 (canary) | Mem-watch diagnostic | 0 |
| AUDIT-068 S1-S4 | +520 cumul (canary) | Host-side write hooks | 0 (writer identified at guest PC) |
| AUDIT-069 S1-S5 | +60 (canary), 0 (ours) | Wait/release watch | 0 (counts diverge, no fix) |
| Phase D Stages 0-4 | +450-500 (ours+tools) | Contention manifest | 0 (104,607 cap unbroken) |
| Phase D D-extension | +95 (tool) | Nested-CS absorber | +439 matched-prefix only |
| Phase C+1 .. C+25 | varies | Allocator/event/thread shims | 0 (matched-prefix only) |
| Phase W | +20 (ours) | VdInitializeEngines r3=1 | +66 matched-prefix only |
| **Total to break wedge: 0 LOC of any kind** | | | |
This is the single most striking pattern from the audit chain: **every
honest correctness fix advances matched-prefix; none move
`draws / swaps / unique_render_targets`.**
## Falsification budget for the wedge framing
The wedge framing IS robust (no audit has falsified it since AUDIT-049).
But it has limited explanatory power: it tells us *what is blocked*,
not *what should unblock it*. Reading-error #38 (cross-spawn producer
paths missed by static reachability) and #36 (POD struct copy bypass)
both proved that the install / wake mechanism in canary involves paths
guest static analysis cannot see. This is a methodology constraint,
not an unsolvable problem.

View File

@@ -0,0 +1,333 @@
# Review A — boot-state review and shortest-path roadmap
**Session type**: PLAN-only. No engine LOC changes; no canary
instrumentation changes. Read-only investigation across the
existing audit chain artifacts.
**Date**: 2026-05-21
**Companion documents** (in this directory):
- `canary-boot-trajectory.md` — canary's call chain from entry_point
to first gameplay draw, with wallclock timestamps.
- `ours-wedge-localization.md` — precise where-ours-stops, in graph
terms.
- `shortest-path-roadmap.md` — 3-5 step roadmap with expected
progression delta per step.
- `methodology-assessment.md` — alternative metric proposal.
This `plan.md` summarizes the five framing questions with answers
backed by file:line citations.
---
## Q1 — What is "first draw" in canary's Sylpheed boot?
**Two distinct "draws" must be disambiguated.**
### Q1.a: First boot-init `VdSwap` (the swap=1 event)
Canary's tid=6 (guest main) emits **one** `VdSwap` at ~9.5 s
wallclock, immediately after the GPU subsystem init sequence
`VdInitializeEngines → VdInitializeRingBuffer →
VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback →
VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer`.
This swap publishes the boot framebuffer and contains no draw packets.
**Ours also reaches this swap** — visible in
`phase-w-wedge-reattack/ours-postfix.jsonl` at idx 105283 (host_ns
496,276,229). This is what produces ours's `swaps=1` metric.
Both engines reach this point. **It is NOT the gate.**
### Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event)
Canary's renderer tid=13 (entry `0x822F1EE0`, spawned suspended at
1.671 s) wakes after the `sub_825070F0` worker fan-out at host_ns
≈ 10.383 s and begins emitting `VdGetSystemCommandBuffer` /
`VdSwap` pairs at ~150 fps. Canary's tid=13 emits **12,092
VdSwap calls in the 90-s window** (per
`phase-nonmatch-investigation/canary-tid-profiles.md:21`).
The first of these is the **first gameplay draw**, fired at ~10.7 s
wallclock — about 1.2 s after the `sub_825070F0` fan-out triggers
the worker cluster.
**Pre-conditions canary establishes before this point** (per
`canary-boot-trajectory.md`):
1. Vtable `0x8200A1E8` of `ANON_Class_713383D7` installed at host_ns
≈ 9.4-9.6 s via POD-copy at GUEST PC `sub_824FD240+0x24`
(per `project_audit_068_session4_2026_05_20`).
2. Activation chain `sub_822F1AA8 → sub_82173990 → sub_821746B0 →
sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 →
sub_824F7800 → bctrl vtable[1] = sub_825070F0` fires on tid=6.
3. `sub_825070F0` spawns 4 worker threads with entries
`0x82506528/58/88/B8` and shared ctx `0xBCE251C0`.
4. Workers (canary tids 27/28/29) emit signals that unwedge the
`sub_821CB030` Event waits across the cache-file IO completion
chain.
5. Renderer tid=13's body (entered earlier but blocked on a
tid=14/15 XAudio-coordinated event) unblocks; per-frame
`VdGetSystemCommandBuffer` / `VdSwap` loop begins.
---
## Q2 — What is ours's actual progress, and what's the wedge root cause?
**Ours stops at the first wait in the activation chain.** Specifically:
- **tid=1 (main)** wedged at `sub_82173990+0x2D4` (PC `0x824ac578` =
`do_wait_single`) on handle `0x12c8` = `Thread(id=13)` — waiting
for the renderer's thread handle to signal (which happens only when
tid=13 calls `ExTerminateThread`).
- **tid=13 (renderer / cache-IO worker)** wedged at
`sub_821CB030+0x1B0` on handle `0x12d0` = `Event/Auto`, created by
itself via `NtCreateEvent` at `sub_821CB030+0x128`. `signals=0,
wakes=0` — `<NO_SIGNALS_DESPITE_WAITS>`.
- **`sub_825070F0` fires 0×** at any horizon probed.
Citation: `phase-w-wedge-reattack/halt-on-deadlock-dump.txt` +
`phase-w-wedge-reattack/current-state.md`.
### Root cause (at one structural level deeper than the wedge symptom)
**Per AUDIT-069 Session 5 (the most recent measurement):**
- Canary fires 414 `NtReleaseSemaphore` calls on the work-queue
semaphore in the 90-s window.
- Ours fires 99 (24%).
- Breakdown: Worker (382 vs 90), Main (7 vs 8), **Other producers
(25 vs 1)**.
The "**other producers (25 vs 1)**" gap is the load-bearing
discrepancy. Canary has **24 additional thread sources** releasing
the work semaphore during bootstrap that ours does not have. These
correspond to:
1. The 4 `sub_825070F0` workers (canary tids 27/28/29 + 1) — absent
in ours.
2. XAudio render threads (canary tids 14/15, spawned suspended in
both engines, **resumed only in canary**).
3. The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) —
8 helpers including file-IO and NtWaitForMultipleObjectsEx workers
— absent in ours.
### The ONE structural issue
> **Ours never reaches `sub_825070F0` because the activation chain
> that calls it is downstream of tid=13's wedge; and tid=13's wedge
> is downstream of the worker cluster activation; and the worker
> cluster activation is `sub_825070F0`. This is a self-referential
> lock.**
Canary breaks the lock because some part of the bootstrap
*pre-activates* the producers (probably via XAudio thread resume at
1.726 s, which then runs ahead, populates the work queue, signals
events, etc.). Ours never resumes the XAudio threads — they're
spawned suspended and stay that way.
**The single highest-leverage gap is the XAudio thread resume,**
because (a) it happens early (1.726 s in canary vs. ours's wedge
which fixes around 1.4 s — i.e. the resume should happen before the
wedge), (b) it activates the dominant event producers, and (c) AUDIT-069
S5's "other producers 25 vs 1" finding implicates exactly this class
of thread.
---
## Q3 — Shortest-path-to-first-draw roadmap
Three to four steps (full detail in `shortest-path-roadmap.md`):
- **Step 1 (~80-150 LOC, ours-side)**: add `--force-spawn-workers`
cvar that crowbars `sub_825070F0` activation by directly spawning
the 4 worker threads with the right ctx after `VdInitializeRingBuffer`
returns. Tests "are the workers functionally correct if activated"
and "does activating them unwedge sub_821CB030."
- **Step 2 (~0 LOC)**: with Step 1 active, mine the canary jsonl for
the kernel-call sequence on tid=6 in the wallclock window [9.4 s,
9.6 s] (the install epoch). Identify what guest call triggers
`sub_824FD240+0x24`'s POD-copy of the vtable in canary.
- **Step 3 (~10-500 LOC, depending on what Step 2 finds)**: mirror
that trigger in ours — likely a missing kernel-import return value
or a missing post-condition that the trigger inspects.
- **Step 4 (~0 LOC; remove crowbar)**: re-test ours without
`--force-spawn-workers`. Verify natural bootstrap reaches
`sub_825070F0` activation.
- **Step 5 (~0-50 LOC)**: measure renderer-thread VdSwap rate over 90 s
wallclock; target ±30% of canary's 12,092 calls.
Expected delta:
| After step | `swaps` | `draws` | `unique_render_targets` |
|---|---:|---:|---:|
| Pre | 1 | 0 | 0 |
| Step 1 (crowbar) | 2+ | 1+ | 1+ |
| Step 4 (decrowbar) | 2+ | 1+ | 1+ |
| Step 5 (parity) | 100+ | 100+ | 1-5 |
---
## Q4 — What's NOT on the shortest path
Explicitly deferred (full rationale in `shortest-path-roadmap.md`):
- **Audio (host-audio-* / XAudio implementation)** — even though
XAudio thread resume MAY be the trigger from Q2, ours's existing
XAudio shim is sufficient for the workers to bootstrap if they
receive the right kernel-call sequence. Full XAudio
implementation is beyond first-draw scope.
- **HID** — Sylpheed's intro/title screens are auto-advance; no
input needed.
- **XAM content / save games** — not on first-draw path.
- **Scheduler determinism work** (Phase D Stages 0-4 and beyond) —
null result; the wedge is upstream of contention scheduling.
Close or indefinitely defer.
- **Diff-tool canonicalization** (Phase C+N for N > 25) — saturated
on matched-prefix without progression; halt this work class until
Step 4 lands and the workload re-baselines.
- **AUDIT-068 host-side install probes** — superseded by AUDIT-068
Session 4 finding (writer is GUEST PC, not host). The followup
question is what *triggers* the guest code path, which Step 2
addresses through cheaper means.
---
## Q5 — Methodology assessment
**Current methodology relied on matched-prefix as a progression
proxy. This assumption is now empirically falsified**: +2,960
events of matched-prefix advancement produced 0 units of progression
(`swaps=1, draws=0` across 25+ iterates).
### Proposed alternative metric
**Option 6 (composite `progression_score`)**:
```
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+ 0.001 * matched_prefix
```
Continuous gradient; honest about wedge-solving vs. canonicalization
priority. Requires ~10 LOC to add to `digest.json`.
Discipline: tag every iterate as either
"**canonicalization only — no progression**" or
"**progression**". Cap at 5 consecutive canonicalization-only
iterates before mandatory pivot to wedge-attack work.
### New reading-error #39
> **#39 (matched-prefix as progression proxy)**: matched-prefix
> measures engine-to-engine divergence point, NOT game-to-game
> functional gap. When the wedge is on a different thread than the
> matched-prefix anchor thread, advancing matched-prefix is
> orthogonal to unwedging. Future audits MUST distinguish "ours's
> tid-X diverges from canary's tid-Y" from "ours's tid-X is *blocked
> because tid-Z is wedged*", and target the wedge directly when
> present.
---
## Counterintuitive findings (anti-anchoring)
Per Tripstones in the task brief:
### 1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap.
The shared boot-init `VdSwap` fires in both. Ours's `swaps=1` metric
is "achieved, just at the same point canary also did it". The
divergence is NOT "ours can't do the first swap"; it's "ours can't do
the SECOND through Nth swap (the gameplay loop)".
### 2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.
`canary-jitter-1.jsonl` shows 12,092 VdSwap calls on canary tid=13 in
90 s wallclock — definitively in the gameplay rendering loop, not
pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80
events total before wedging — definitively before gameplay starts.
The "both engines pre-first-draw" hypothesis is FALSE.
### 3. The matched-prefix metric is on the WRONG thread.
Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main
threads. But the wedge is on **tid=13 in both engines** — the
renderer thread. Tid=1's matched-prefix can advance 105,128 events
without ever touching the wedge.
### 4. The "boot-state-machine" framing is misleading.
There's no monolithic boot state machine. There are ~28 threads in
canary, each running their own lifecycle, communicating via shared
kernel objects. The bottleneck isn't a state transition; it's a
THREAD ACTIVATION GAP.
### 5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.
The vtable install IS interesting but it's downstream of the producer
gap. Producers must be running to populate the work queue, which
gets the worker to do its thing, which signals the wedge, which lets
the activation chain continue, which calls `sub_824FD240+0x24`,
which writes the vtable. Fixing the vtable install in isolation
(e.g., via a host-side mem-write hack) doesn't help if no producer
is feeding work to the workers.
---
## Cascade prediction confidence
- A — canary boot trajectory characterized: **DONE, HIGH** (canary-jitter-1.jsonl provides direct evidence).
- B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": **DONE, MEDIUM-HIGH** (AUDIT-069 S5 "other producers 25 vs 1" finding).
- C — shortest-path roadmap with ≤5 steps: **DONE, MEDIUM** (5 steps; Step 1 confidence ~60%).
- D — alternative metric proposed: **DONE, HIGH** (Option 6 composite, plus reading-error #39).
---
## Open questions / known unknowns
1. **What is the bootstrap trigger for canary's `sub_824FD240+0x24`?**
Roadmap Step 2 addresses. Could be answered in <1 session of
canary jsonl analysis.
2. **Does Step 1's crowbar produce a clean wedge-unblock, or does it
reveal additional unmodelled state in the ctx object?** Empirical;
testable in one session.
3. **Are canary's XAudio threads (tids 14/15) the actual missing
producer, or are they downstream of the same trigger?** Worth a
targeted probe before Step 1; ~50 LOC ours-side to log
NtResumeThread on the XAudio entry PCs.
4. **Will the AUDIT-067 "vtable install is host-side" finding
resurface?** No — AUDIT-068 S4 falsified this; the writer is
GUEST PC `sub_824FD240+0x24`. The "host-side" framing was a
mis-read of the POD-copy semantics (reading-error #36).
---
## Recommended next action
**Dispatch a "progression iterate" implementing Step 1 of the
roadmap** (`--force-spawn-workers` crowbar, ~80-150 LOC ours-side).
This is a high-variance, high-reward iterate; expected outcome is
either `swaps ≥ 2, draws ≥ 1` (success — wedge structurally
isolated to thread activation) or an informative failure mode (e.g.,
worker faults at first vtable bctrl indicating additional state
needed in ctx object). Time-box: 1 session, max 2h.
If Step 1 succeeds in ANY way (even if draws stays 0), the next
iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl).
This step has minimal risk and uses existing tooling.
If Step 1 fails completely (panic / segfault unrecoverable), revert
the crowbar and reframe: the wedge may be in ours's kernel-handler
implementations themselves, not just bootstrap activation. At that
point a deeper Path β engine investigation is unavoidable.
---
## Memory hygiene note
This review is read-only. xenia-rs HEAD unchanged. canary HEAD
unchanged. sylpheed.db unchanged. No new artifacts beyond this
directory.
After dispatching Step 1, future memory entries should adopt the
new `progression_score` + tagging discipline outlined in
`methodology-assessment.md`.

View File

@@ -0,0 +1,253 @@
# Shortest-path-to-first-gameplay-draw roadmap
**Date**: 2026-05-21
**Read-only investigation; no LOC changes proposed.**
**Premise**: 25+ iterates have advanced matched-prefix 102,168 →
105,128 (+2,960 events) but `draws=0, swaps=1, render_targets=0`
have not moved. This roadmap proposes a non-canonicalization path
forward.
## Definitions
- **First gameplay draw** = the first `VdSwap` call by ours's
renderer (the thread spawned at entry `0x822F1EE0`, ours's tid
analog of canary tid=13) that emits at least one `PM4_TYPE3
DRAW_INDX` packet into the ringbuffer.
- **Observable success criterion**: `draws ≥ 1, swaps ≥ 2,
unique_render_targets ≥ 1` in `xenia-rs check --stable-digest`
output. At least one frame from the **renderer thread** (not the
boot-init swap that ours already emits).
## Why current iteration has stalled
The wedge has been mapped and remapped 20+ times. Every audit
correctly identifies symptoms; every fix correctly canonicalizes a
diff-tool divergence. But the wedge is **structurally cyclic**: the
worker cluster that signals the wait is downstream of the wait
completing. Standard "find the divergent kernel call, mirror canary's
semantics" has saturated.
Two strategies remain that have NOT been tried at full scope:
1. **(A) Decouple the cycle by faking the worker activation**:
directly call `sub_825070F0` from a host shim, or directly spawn
the 4 worker threads with the right ctx, sidestepping the
activation chain. This is a *crowbar*: it doesn't fix the
underlying bootstrap bug, but it tests "are the workers
functionally correct IF activated." If they signal the wedge and
ours then reaches first draw, we know the bug is *exclusively* in
the activation gate, and we can attack just that.
2. **(B) Find what triggers `sub_824FD240+0x24`'s POD-copy in canary**.
AUDIT-068 Session 4 pinned the install epoch of vtable
`0x8200A1E8` to this writer site. But the *caller* of
`sub_824FD240` — what guest call leads to it firing — is
unidentified. In ours, `sub_824FD240` fires 0× because the call
chain `sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240`
is downstream of the tid=13 wedge. So we have circular reasoning
again — UNLESS Strategy A is applied first.
The roadmap below uses Strategy A as a wedge-crowbar and Strategy B
as the principled fix that follows.
## Roadmap
### Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80150 LOC)
**Action**: in `xenia-rs` add a debug-only cvar
`--force-spawn-workers` that, when set, after some bootstrap
checkpoint (e.g., first `VdInitializeRingBuffer` return), manually
spawns 4 ExCreateThread-equivalent guest threads with:
- entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8`
- ctx_ptr = run-determined; allocate a fresh
`ANON_Class_713383D7`-shaped object on the unified heap and write
vtable `0x8200A1E8` to slot 0 (mirror the POD-copy at
`sub_824FD240+0x24`)
- stack_size 65536, suspended=True initially, then NtResumeThread
**Expected effect**:
- If the workers run correctly and signal the wedge: ours's tid=13
unblocks, tid=1's join completes, normal game-loop begins.
`draws ≥ 1, swaps ≥ 2`.
- If the workers fail (e.g., faulting because the ctx object's other
fields aren't initialized): we learn what *else* needs to be
installed alongside the vtable.
**Failure modes to expect**:
- The worker entries dispatch via vtable slots 35/36/37/38 of the
ANON_Class — those slots also need to be populated. Audit-067
static analysis shows the vtable has 7 entries; the worker entries
use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable)
per `sub_825070F0.md` line 32-37. So we'll need a parent class /
derived class layout.
- The ctx object also has refcount/header fields that must be
initialized — see AUDIT-068 Session 3 finding of 12-byte struct
copy `{vptr, self, self}` followed by refcount=1.
**LOC budget**: 80-150 LOC ours-side; 0 LOC canary.
**Read-only fallback**: if force-spawn fails immediately, we've still
captured the failure mode, which is informative.
**Risk**: high — this is structurally a hack. Acceptable as a
diagnostic.
### Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC)
**Action**: with Step 1's crowbar enabled, ours reaches the
post-wedge code path. Compare ours and canary on what `import.call`
(kernel API) sequence the **caller** of `sub_824FD240` makes
immediately before the POD-copy install.
The caller chain (per AUDIT-064/068) is:
```
sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]
```
So `sub_824F7800` calls `sub_824FD240` at offset `+0x38`, BEFORE it
calls `sub_825070F0` at offset `+0x320`.
Question: what does `sub_824F8398`'s caller (one level up,
`sub_821B55D8`) pass as arguments, and what kernel APIs run in
between? We need to trace tid=6's events in canary in the wallclock
window [9.4 s, 9.6 s] — the install epoch.
**LOC budget**: 0. Pure event-stream analysis on captured canary
jsonl (we already have `canary-jitter-1.jsonl`, 18.7M events).
**Output**: an ordered list of kernel calls just before
`sub_824FD240+0x24` fires. If any are missing in ours, that's a
candidate gap.
### Step 3 — Mirror the trigger in ours (variable LOC)
Once Step 2 names the missing kernel call(s), implement them in ours
following Phase C cadence (verify per-call return values match canary;
add diff-tool tests; document in memory).
**LOC budget**: depends on what's missing. Could be 10500 LOC.
### Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)
With Step 3's fix in place, remove `--force-spawn-workers`. Re-run
ours. If the natural bootstrap chain runs and `draws ≥ 1, swaps ≥ 2`,
we've fixed the bug.
If progression still fails without the crowbar, there's another gap;
re-enter at Step 2 with a refined trigger search.
### Step 5 — Validate gameplay frame parity (~050 LOC)
Capture renderer-thread VdSwap counts at 90 s wallclock in both
engines. Target: ours's renderer emits within ±30% of canary's
12,092 VdSwap/90s. If yes: first-draw is reached and sustained.
If ours's renderer emits but at a much lower rate, that's a follow-up
performance issue, not a correctness one. Defer.
## Expected progression per step
| Step | Expected `swaps` | Expected `draws` | Expected `unique_render_targets` | LOC delta |
|---|---:|---:|---:|---:|
| Pre-roadmap | 1 | 0 | 0 | — |
| Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 |
| Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 |
| Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 |
| Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) |
| Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 |
## What's NOT on this path (explicitly deferred)
1. **Host-audio bridge / XAudio resume**: the XAudio thread tids 14/15
spawning suspended-and-never-resumed in ours is real but parallel
to the worker-cluster wedge. In canary, both threads run; in ours,
neither runs. Pursuing XAudio fixes does not address the
graphics-blocking wedge. Defer to a separate
"post-first-draw" audit cluster.
2. **HID / controller**: Sylpheed's intro movie / title screen play
without user input. HID is irrelevant for first-draw.
3. **XAM content / save games**: irrelevant for first-draw; the
intro/title screens don't require save-game enumeration.
4. **Scheduler determinism** (per `scheduler_determinism_plan` /
Phase D Stages 0-4): null result, off-path. The wedge is upstream
of any contention. Defer indefinitely or close.
5. **Diff-tool canonicalization** (Phase C-style fixes): saturated on
moving matched-prefix without moving progression. **Halt** further
work in this class until Step 4 lands and re-baselines the diff
workload.
6. **AUDIT-068 host-side install probes**: superseded by AUDIT-068
Session 4 (writer identified at GUEST PC `sub_824FD240+0x24`).
The remaining question is *what triggers* `sub_824FD240`, which
Step 2 addresses.
## Alternative path (rejected)
**Skip the crowbar; do the trigger investigation cold.** Read canary
source for `sub_824FD240` callers, walk upward, identify the trigger.
Why rejected: `sub_824FD240` is GAME code, not canary engine code —
the file we'd "read" is the disassembly of the XEX. We'd need to
disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per
sylpheed.db, `sub_824FD240`'s static caller is `sub_824F7800+0x38`
(in line with AUDIT-064). But what guest *call* causes `sub_824F7800`
to be invoked is itself a multi-fn upstream investigation that
returns to the same wedge cycle. The crowbar bypasses this paradox.
## Risk assessment
- **Step 1 catastrophic failure**: ours's emulator panics or
segfaults when the force-spawn workers run. Mitigation: gate
behind `--debug-only` cvar; ensure ours's CPU executes the worker
entries in normal sandboxed PPC JIT; if they fault on missing
guest state, log and exit cleanly.
- **Step 1 "succeeds but draws=0 anyway"**: the workers run but
ours's tid=13 still doesn't unblock — there's an unmodelled state
beyond just the missing thread spawns. Mitigation: log every event
the new workers emit; compare with canary's tid=27/28/29 streams in
`canary-jitter-1.jsonl`.
- **Step 3 LOC explosion**: the trigger turns out to be a large
subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to
a stub that returns "canary-equivalent" values without full
implementation.
## Confidence levels
- Step 1 unblocks the wedge if executed correctly: **MEDIUM** (60%).
Honest assessment: 25 prior audits have not unblocked it through
natural fixes, so the crowbar approach is novel and the failure
mode may not match expectations.
- Step 2 identifies a trigger in ≤1 session: **HIGH** (85%) — the
canary jsonl already has the data; analysis is mechanical.
- Step 3 LOC budget ≤500: **MEDIUM** (50%) — depends entirely on Step
2's answer.
- Step 4 natural bootstrap works post-Step-3: **MEDIUM** (50%) —
there may be additional gaps the crowbar masked.
## Memory hygiene
After Step 1 lands (crowbar binary in place), check that
`xenia-rs/target/release/xenia-rs` builds cleanly with the new cvar.
Verify Phase B `image_canonical_sha256` is updated (the crowbar
changes engine LOC); document the new baseline. Confirm 3× cold
runs produce identical digests with the crowbar enabled.
## What "winning" looks like
`xenia-rs check --stable-digest -n 50000000` (or higher cap, e.g.
`-n 500000000` to reach 30 s wallclock) outputs:
```json
{
"instructions": 50000007,
"imports": 40390+,
"draws": >= 1,
"swaps": >= 2,
"unique_render_targets": >= 1,
"shader_blobs_live": >= 1,
"texture_cache_entries": >= 1
}
```
…and the value is reproducible across 3 cold runs. A non-zero
`draws` value means at least one PM4_TYPE3 DRAW_INDX packet was
emitted by the renderer thread.