# Shortest-path-to-first-gameplay-draw roadmap **Date**: 2026-05-21 **Read-only investigation; no LOC changes proposed.** **Premise**: 25+ iterates have advanced matched-prefix 102,168 → 105,128 (+2,960 events) but `draws=0, swaps=1, render_targets=0` have not moved. This roadmap proposes a non-canonicalization path forward. ## Definitions - **First gameplay draw** = the first `VdSwap` call by ours's renderer (the thread spawned at entry `0x822F1EE0`, ours's tid analog of canary tid=13) that emits at least one `PM4_TYPE3 DRAW_INDX` packet into the ringbuffer. - **Observable success criterion**: `draws ≥ 1, swaps ≥ 2, unique_render_targets ≥ 1` in `xenia-rs check --stable-digest` output. At least one frame from the **renderer thread** (not the boot-init swap that ours already emits). ## Why current iteration has stalled The wedge has been mapped and remapped 20+ times. Every audit correctly identifies symptoms; every fix correctly canonicalizes a diff-tool divergence. But the wedge is **structurally cyclic**: the worker cluster that signals the wait is downstream of the wait completing. Standard "find the divergent kernel call, mirror canary's semantics" has saturated. Two strategies remain that have NOT been tried at full scope: 1. **(A) Decouple the cycle by faking the worker activation**: directly call `sub_825070F0` from a host shim, or directly spawn the 4 worker threads with the right ctx, sidestepping the activation chain. This is a *crowbar*: it doesn't fix the underlying bootstrap bug, but it tests "are the workers functionally correct IF activated." If they signal the wedge and ours then reaches first draw, we know the bug is *exclusively* in the activation gate, and we can attack just that. 2. **(B) Find what triggers `sub_824FD240+0x24`'s POD-copy in canary**. AUDIT-068 Session 4 pinned the install epoch of vtable `0x8200A1E8` to this writer site. But the *caller* of `sub_824FD240` — what guest call leads to it firing — is unidentified. In ours, `sub_824FD240` fires 0× because the call chain `sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240` is downstream of the tid=13 wedge. So we have circular reasoning again — UNLESS Strategy A is applied first. The roadmap below uses Strategy A as a wedge-crowbar and Strategy B as the principled fix that follows. ## Roadmap ### Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80–150 LOC) **Action**: in `xenia-rs` add a debug-only cvar `--force-spawn-workers` that, when set, after some bootstrap checkpoint (e.g., first `VdInitializeRingBuffer` return), manually spawns 4 ExCreateThread-equivalent guest threads with: - entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8` - ctx_ptr = run-determined; allocate a fresh `ANON_Class_713383D7`-shaped object on the unified heap and write vtable `0x8200A1E8` to slot 0 (mirror the POD-copy at `sub_824FD240+0x24`) - stack_size 65536, suspended=True initially, then NtResumeThread **Expected effect**: - If the workers run correctly and signal the wedge: ours's tid=13 unblocks, tid=1's join completes, normal game-loop begins. `draws ≥ 1, swaps ≥ 2`. - If the workers fail (e.g., faulting because the ctx object's other fields aren't initialized): we learn what *else* needs to be installed alongside the vtable. **Failure modes to expect**: - The worker entries dispatch via vtable slots 35/36/37/38 of the ANON_Class — those slots also need to be populated. Audit-067 static analysis shows the vtable has 7 entries; the worker entries use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable) per `sub_825070F0.md` line 32-37. So we'll need a parent class / derived class layout. - The ctx object also has refcount/header fields that must be initialized — see AUDIT-068 Session 3 finding of 12-byte struct copy `{vptr, self, self}` followed by refcount=1. **LOC budget**: 80-150 LOC ours-side; 0 LOC canary. **Read-only fallback**: if force-spawn fails immediately, we've still captured the failure mode, which is informative. **Risk**: high — this is structurally a hack. Acceptable as a diagnostic. ### Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC) **Action**: with Step 1's crowbar enabled, ours reaches the post-wedge code path. Compare ours and canary on what `import.call` (kernel API) sequence the **caller** of `sub_824FD240` makes immediately before the POD-copy install. The caller chain (per AUDIT-064/068) is: ``` sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0] ``` So `sub_824F7800` calls `sub_824FD240` at offset `+0x38`, BEFORE it calls `sub_825070F0` at offset `+0x320`. Question: what does `sub_824F8398`'s caller (one level up, `sub_821B55D8`) pass as arguments, and what kernel APIs run in between? We need to trace tid=6's events in canary in the wallclock window [9.4 s, 9.6 s] — the install epoch. **LOC budget**: 0. Pure event-stream analysis on captured canary jsonl (we already have `canary-jitter-1.jsonl`, 18.7M events). **Output**: an ordered list of kernel calls just before `sub_824FD240+0x24` fires. If any are missing in ours, that's a candidate gap. ### Step 3 — Mirror the trigger in ours (variable LOC) Once Step 2 names the missing kernel call(s), implement them in ours following Phase C cadence (verify per-call return values match canary; add diff-tool tests; document in memory). **LOC budget**: depends on what's missing. Could be 10–500 LOC. ### Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC) With Step 3's fix in place, remove `--force-spawn-workers`. Re-run ours. If the natural bootstrap chain runs and `draws ≥ 1, swaps ≥ 2`, we've fixed the bug. If progression still fails without the crowbar, there's another gap; re-enter at Step 2 with a refined trigger search. ### Step 5 — Validate gameplay frame parity (~0–50 LOC) Capture renderer-thread VdSwap counts at 90 s wallclock in both engines. Target: ours's renderer emits within ±30% of canary's 12,092 VdSwap/90s. If yes: first-draw is reached and sustained. If ours's renderer emits but at a much lower rate, that's a follow-up performance issue, not a correctness one. Defer. ## Expected progression per step | Step | Expected `swaps` | Expected `draws` | Expected `unique_render_targets` | LOC delta | |---|---:|---:|---:|---:| | Pre-roadmap | 1 | 0 | 0 | — | | Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 | | Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 | | Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 | | Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) | | Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 | ## What's NOT on this path (explicitly deferred) 1. **Host-audio bridge / XAudio resume**: the XAudio thread tids 14/15 spawning suspended-and-never-resumed in ours is real but parallel to the worker-cluster wedge. In canary, both threads run; in ours, neither runs. Pursuing XAudio fixes does not address the graphics-blocking wedge. Defer to a separate "post-first-draw" audit cluster. 2. **HID / controller**: Sylpheed's intro movie / title screen play without user input. HID is irrelevant for first-draw. 3. **XAM content / save games**: irrelevant for first-draw; the intro/title screens don't require save-game enumeration. 4. **Scheduler determinism** (per `scheduler_determinism_plan` / Phase D Stages 0-4): null result, off-path. The wedge is upstream of any contention. Defer indefinitely or close. 5. **Diff-tool canonicalization** (Phase C-style fixes): saturated on moving matched-prefix without moving progression. **Halt** further work in this class until Step 4 lands and re-baselines the diff workload. 6. **AUDIT-068 host-side install probes**: superseded by AUDIT-068 Session 4 (writer identified at GUEST PC `sub_824FD240+0x24`). The remaining question is *what triggers* `sub_824FD240`, which Step 2 addresses. ## Alternative path (rejected) **Skip the crowbar; do the trigger investigation cold.** Read canary source for `sub_824FD240` callers, walk upward, identify the trigger. Why rejected: `sub_824FD240` is GAME code, not canary engine code — the file we'd "read" is the disassembly of the XEX. We'd need to disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per sylpheed.db, `sub_824FD240`'s static caller is `sub_824F7800+0x38` (in line with AUDIT-064). But what guest *call* causes `sub_824F7800` to be invoked is itself a multi-fn upstream investigation that returns to the same wedge cycle. The crowbar bypasses this paradox. ## Risk assessment - **Step 1 catastrophic failure**: ours's emulator panics or segfaults when the force-spawn workers run. Mitigation: gate behind `--debug-only` cvar; ensure ours's CPU executes the worker entries in normal sandboxed PPC JIT; if they fault on missing guest state, log and exit cleanly. - **Step 1 "succeeds but draws=0 anyway"**: the workers run but ours's tid=13 still doesn't unblock — there's an unmodelled state beyond just the missing thread spawns. Mitigation: log every event the new workers emit; compare with canary's tid=27/28/29 streams in `canary-jitter-1.jsonl`. - **Step 3 LOC explosion**: the trigger turns out to be a large subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to a stub that returns "canary-equivalent" values without full implementation. ## Confidence levels - Step 1 unblocks the wedge if executed correctly: **MEDIUM** (60%). Honest assessment: 25 prior audits have not unblocked it through natural fixes, so the crowbar approach is novel and the failure mode may not match expectations. - Step 2 identifies a trigger in ≤1 session: **HIGH** (85%) — the canary jsonl already has the data; analysis is mechanical. - Step 3 LOC budget ≤500: **MEDIUM** (50%) — depends entirely on Step 2's answer. - Step 4 natural bootstrap works post-Step-3: **MEDIUM** (50%) — there may be additional gaps the crowbar masked. ## Memory hygiene After Step 1 lands (crowbar binary in place), check that `xenia-rs/target/release/xenia-rs` builds cleanly with the new cvar. Verify Phase B `image_canonical_sha256` is updated (the crowbar changes engine LOC); document the new baseline. Confirm 3× cold runs produce identical digests with the crowbar enabled. ## What "winning" looks like `xenia-rs check --stable-digest -n 50000000` (or higher cap, e.g. `-n 500000000` to reach 30 s wallclock) outputs: ```json { "instructions": 50000007, "imports": 40390+, "draws": >= 1, "swaps": >= 2, "unique_render_targets": >= 1, "shader_blobs_live": >= 1, "texture_cache_entries": >= 1 } ``` …and the value is reproducible across 3 cold runs. A non-zero `draws` value means at least one PM4_TYPE3 DRAW_INDX packet was emitted by the renderer thread.