Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
11 KiB
Shortest-path-to-first-gameplay-draw roadmap
Date: 2026-05-21
Read-only investigation; no LOC changes proposed.
Premise: 25+ iterates have advanced matched-prefix 102,168 →
105,128 (+2,960 events) but draws=0, swaps=1, render_targets=0
have not moved. This roadmap proposes a non-canonicalization path
forward.
Definitions
- First gameplay draw = the first
VdSwapcall by ours's renderer (the thread spawned at entry0x822F1EE0, ours's tid analog of canary tid=13) that emits at least onePM4_TYPE3 DRAW_INDXpacket into the ringbuffer. - Observable success criterion:
draws ≥ 1, swaps ≥ 2, unique_render_targets ≥ 1inxenia-rs check --stable-digestoutput. At least one frame from the renderer thread (not the boot-init swap that ours already emits).
Why current iteration has stalled
The wedge has been mapped and remapped 20+ times. Every audit correctly identifies symptoms; every fix correctly canonicalizes a diff-tool divergence. But the wedge is structurally cyclic: the worker cluster that signals the wait is downstream of the wait completing. Standard "find the divergent kernel call, mirror canary's semantics" has saturated.
Two strategies remain that have NOT been tried at full scope:
-
(A) Decouple the cycle by faking the worker activation: directly call
sub_825070F0from a host shim, or directly spawn the 4 worker threads with the right ctx, sidestepping the activation chain. This is a crowbar: it doesn't fix the underlying bootstrap bug, but it tests "are the workers functionally correct IF activated." If they signal the wedge and ours then reaches first draw, we know the bug is exclusively in the activation gate, and we can attack just that. -
(B) Find what triggers
sub_824FD240+0x24's POD-copy in canary. AUDIT-068 Session 4 pinned the install epoch of vtable0x8200A1E8to this writer site. But the caller ofsub_824FD240— what guest call leads to it firing — is unidentified. In ours,sub_824FD240fires 0× because the call chainsub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240is downstream of the tid=13 wedge. So we have circular reasoning again — UNLESS Strategy A is applied first.
The roadmap below uses Strategy A as a wedge-crowbar and Strategy B as the principled fix that follows.
Roadmap
Step 1 — Crowbar: force-spawn the sub_825070F0 workers (~80–150 LOC)
Action: in xenia-rs add a debug-only cvar
--force-spawn-workers that, when set, after some bootstrap
checkpoint (e.g., first VdInitializeRingBuffer return), manually
spawns 4 ExCreateThread-equivalent guest threads with:
- entries
0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8 - ctx_ptr = run-determined; allocate a fresh
ANON_Class_713383D7-shaped object on the unified heap and write vtable0x8200A1E8to slot 0 (mirror the POD-copy atsub_824FD240+0x24) - stack_size 65536, suspended=True initially, then NtResumeThread
Expected effect:
- If the workers run correctly and signal the wedge: ours's tid=13
unblocks, tid=1's join completes, normal game-loop begins.
draws ≥ 1, swaps ≥ 2. - If the workers fail (e.g., faulting because the ctx object's other fields aren't initialized): we learn what else needs to be installed alongside the vtable.
Failure modes to expect:
- The worker entries dispatch via vtable slots 35/36/37/38 of the
ANON_Class — those slots also need to be populated. Audit-067
static analysis shows the vtable has 7 entries; the worker entries
use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable)
per
sub_825070F0.mdline 32-37. So we'll need a parent class / derived class layout. - The ctx object also has refcount/header fields that must be
initialized — see AUDIT-068 Session 3 finding of 12-byte struct
copy
{vptr, self, self}followed by refcount=1.
LOC budget: 80-150 LOC ours-side; 0 LOC canary. Read-only fallback: if force-spawn fails immediately, we've still captured the failure mode, which is informative. Risk: high — this is structurally a hack. Acceptable as a diagnostic.
Step 2 — Identify what triggers sub_824FD240+0x24 in canary (~0 LOC)
Action: with Step 1's crowbar enabled, ours reaches the
post-wedge code path. Compare ours and canary on what import.call
(kernel API) sequence the caller of sub_824FD240 makes
immediately before the POD-copy install.
The caller chain (per AUDIT-064/068) is:
sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]
So sub_824F7800 calls sub_824FD240 at offset +0x38, BEFORE it
calls sub_825070F0 at offset +0x320.
Question: what does sub_824F8398's caller (one level up,
sub_821B55D8) pass as arguments, and what kernel APIs run in
between? We need to trace tid=6's events in canary in the wallclock
window [9.4 s, 9.6 s] — the install epoch.
LOC budget: 0. Pure event-stream analysis on captured canary
jsonl (we already have canary-jitter-1.jsonl, 18.7M events).
Output: an ordered list of kernel calls just before
sub_824FD240+0x24 fires. If any are missing in ours, that's a
candidate gap.
Step 3 — Mirror the trigger in ours (variable LOC)
Once Step 2 names the missing kernel call(s), implement them in ours following Phase C cadence (verify per-call return values match canary; add diff-tool tests; document in memory).
LOC budget: depends on what's missing. Could be 10–500 LOC.
Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)
With Step 3's fix in place, remove --force-spawn-workers. Re-run
ours. If the natural bootstrap chain runs and draws ≥ 1, swaps ≥ 2,
we've fixed the bug.
If progression still fails without the crowbar, there's another gap; re-enter at Step 2 with a refined trigger search.
Step 5 — Validate gameplay frame parity (~0–50 LOC)
Capture renderer-thread VdSwap counts at 90 s wallclock in both engines. Target: ours's renderer emits within ±30% of canary's 12,092 VdSwap/90s. If yes: first-draw is reached and sustained.
If ours's renderer emits but at a much lower rate, that's a follow-up performance issue, not a correctness one. Defer.
Expected progression per step
| Step | Expected swaps |
Expected draws |
Expected unique_render_targets |
LOC delta |
|---|---|---|---|---|
| Pre-roadmap | 1 | 0 | 0 | — |
| Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 |
| Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 |
| Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 |
| Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) |
| Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 |
What's NOT on this path (explicitly deferred)
- Host-audio bridge / XAudio resume: the XAudio thread tids 14/15 spawning suspended-and-never-resumed in ours is real but parallel to the worker-cluster wedge. In canary, both threads run; in ours, neither runs. Pursuing XAudio fixes does not address the graphics-blocking wedge. Defer to a separate "post-first-draw" audit cluster.
- HID / controller: Sylpheed's intro movie / title screen play without user input. HID is irrelevant for first-draw.
- XAM content / save games: irrelevant for first-draw; the intro/title screens don't require save-game enumeration.
- Scheduler determinism (per
scheduler_determinism_plan/ Phase D Stages 0-4): null result, off-path. The wedge is upstream of any contention. Defer indefinitely or close. - Diff-tool canonicalization (Phase C-style fixes): saturated on moving matched-prefix without moving progression. Halt further work in this class until Step 4 lands and re-baselines the diff workload.
- AUDIT-068 host-side install probes: superseded by AUDIT-068
Session 4 (writer identified at GUEST PC
sub_824FD240+0x24). The remaining question is what triggerssub_824FD240, which Step 2 addresses.
Alternative path (rejected)
Skip the crowbar; do the trigger investigation cold. Read canary
source for sub_824FD240 callers, walk upward, identify the trigger.
Why rejected: sub_824FD240 is GAME code, not canary engine code —
the file we'd "read" is the disassembly of the XEX. We'd need to
disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per
sylpheed.db, sub_824FD240's static caller is sub_824F7800+0x38
(in line with AUDIT-064). But what guest call causes sub_824F7800
to be invoked is itself a multi-fn upstream investigation that
returns to the same wedge cycle. The crowbar bypasses this paradox.
Risk assessment
- Step 1 catastrophic failure: ours's emulator panics or
segfaults when the force-spawn workers run. Mitigation: gate
behind
--debug-onlycvar; ensure ours's CPU executes the worker entries in normal sandboxed PPC JIT; if they fault on missing guest state, log and exit cleanly. - Step 1 "succeeds but draws=0 anyway": the workers run but
ours's tid=13 still doesn't unblock — there's an unmodelled state
beyond just the missing thread spawns. Mitigation: log every event
the new workers emit; compare with canary's tid=27/28/29 streams in
canary-jitter-1.jsonl. - Step 3 LOC explosion: the trigger turns out to be a large subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to a stub that returns "canary-equivalent" values without full implementation.
Confidence levels
- Step 1 unblocks the wedge if executed correctly: MEDIUM (60%). Honest assessment: 25 prior audits have not unblocked it through natural fixes, so the crowbar approach is novel and the failure mode may not match expectations.
- Step 2 identifies a trigger in ≤1 session: HIGH (85%) — the canary jsonl already has the data; analysis is mechanical.
- Step 3 LOC budget ≤500: MEDIUM (50%) — depends entirely on Step 2's answer.
- Step 4 natural bootstrap works post-Step-3: MEDIUM (50%) — there may be additional gaps the crowbar masked.
Memory hygiene
After Step 1 lands (crowbar binary in place), check that
xenia-rs/target/release/xenia-rs builds cleanly with the new cvar.
Verify Phase B image_canonical_sha256 is updated (the crowbar
changes engine LOC); document the new baseline. Confirm 3× cold
runs produce identical digests with the crowbar enabled.
What "winning" looks like
xenia-rs check --stable-digest -n 50000000 (or higher cap, e.g.
-n 500000000 to reach 30 s wallclock) outputs:
{
"instructions": 50000007,
"imports": 40390+,
"draws": >= 1,
"swaps": >= 2,
"unique_render_targets": >= 1,
"shader_blobs_live": >= 1,
"texture_cache_entries": >= 1
}
…and the value is reproducible across 3 cold runs. A non-zero
draws value means at least one PM4_TYPE3 DRAW_INDX packet was
emitted by the renderer thread.