Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
254 lines
11 KiB
Markdown
254 lines
11 KiB
Markdown
# Shortest-path-to-first-gameplay-draw roadmap
|
||
|
||
**Date**: 2026-05-21
|
||
**Read-only investigation; no LOC changes proposed.**
|
||
**Premise**: 25+ iterates have advanced matched-prefix 102,168 →
|
||
105,128 (+2,960 events) but `draws=0, swaps=1, render_targets=0`
|
||
have not moved. This roadmap proposes a non-canonicalization path
|
||
forward.
|
||
|
||
## Definitions
|
||
|
||
- **First gameplay draw** = the first `VdSwap` call by ours's
|
||
renderer (the thread spawned at entry `0x822F1EE0`, ours's tid
|
||
analog of canary tid=13) that emits at least one `PM4_TYPE3
|
||
DRAW_INDX` packet into the ringbuffer.
|
||
- **Observable success criterion**: `draws ≥ 1, swaps ≥ 2,
|
||
unique_render_targets ≥ 1` in `xenia-rs check --stable-digest`
|
||
output. At least one frame from the **renderer thread** (not the
|
||
boot-init swap that ours already emits).
|
||
|
||
## Why current iteration has stalled
|
||
|
||
The wedge has been mapped and remapped 20+ times. Every audit
|
||
correctly identifies symptoms; every fix correctly canonicalizes a
|
||
diff-tool divergence. But the wedge is **structurally cyclic**: the
|
||
worker cluster that signals the wait is downstream of the wait
|
||
completing. Standard "find the divergent kernel call, mirror canary's
|
||
semantics" has saturated.
|
||
|
||
Two strategies remain that have NOT been tried at full scope:
|
||
|
||
1. **(A) Decouple the cycle by faking the worker activation**:
|
||
directly call `sub_825070F0` from a host shim, or directly spawn
|
||
the 4 worker threads with the right ctx, sidestepping the
|
||
activation chain. This is a *crowbar*: it doesn't fix the
|
||
underlying bootstrap bug, but it tests "are the workers
|
||
functionally correct IF activated." If they signal the wedge and
|
||
ours then reaches first draw, we know the bug is *exclusively* in
|
||
the activation gate, and we can attack just that.
|
||
|
||
2. **(B) Find what triggers `sub_824FD240+0x24`'s POD-copy in canary**.
|
||
AUDIT-068 Session 4 pinned the install epoch of vtable
|
||
`0x8200A1E8` to this writer site. But the *caller* of
|
||
`sub_824FD240` — what guest call leads to it firing — is
|
||
unidentified. In ours, `sub_824FD240` fires 0× because the call
|
||
chain `sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240`
|
||
is downstream of the tid=13 wedge. So we have circular reasoning
|
||
again — UNLESS Strategy A is applied first.
|
||
|
||
The roadmap below uses Strategy A as a wedge-crowbar and Strategy B
|
||
as the principled fix that follows.
|
||
|
||
## Roadmap
|
||
|
||
### Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80–150 LOC)
|
||
|
||
**Action**: in `xenia-rs` add a debug-only cvar
|
||
`--force-spawn-workers` that, when set, after some bootstrap
|
||
checkpoint (e.g., first `VdInitializeRingBuffer` return), manually
|
||
spawns 4 ExCreateThread-equivalent guest threads with:
|
||
|
||
- entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8`
|
||
- ctx_ptr = run-determined; allocate a fresh
|
||
`ANON_Class_713383D7`-shaped object on the unified heap and write
|
||
vtable `0x8200A1E8` to slot 0 (mirror the POD-copy at
|
||
`sub_824FD240+0x24`)
|
||
- stack_size 65536, suspended=True initially, then NtResumeThread
|
||
|
||
**Expected effect**:
|
||
|
||
- If the workers run correctly and signal the wedge: ours's tid=13
|
||
unblocks, tid=1's join completes, normal game-loop begins.
|
||
`draws ≥ 1, swaps ≥ 2`.
|
||
- If the workers fail (e.g., faulting because the ctx object's other
|
||
fields aren't initialized): we learn what *else* needs to be
|
||
installed alongside the vtable.
|
||
|
||
**Failure modes to expect**:
|
||
|
||
- The worker entries dispatch via vtable slots 35/36/37/38 of the
|
||
ANON_Class — those slots also need to be populated. Audit-067
|
||
static analysis shows the vtable has 7 entries; the worker entries
|
||
use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable)
|
||
per `sub_825070F0.md` line 32-37. So we'll need a parent class /
|
||
derived class layout.
|
||
- The ctx object also has refcount/header fields that must be
|
||
initialized — see AUDIT-068 Session 3 finding of 12-byte struct
|
||
copy `{vptr, self, self}` followed by refcount=1.
|
||
|
||
**LOC budget**: 80-150 LOC ours-side; 0 LOC canary.
|
||
**Read-only fallback**: if force-spawn fails immediately, we've still
|
||
captured the failure mode, which is informative.
|
||
**Risk**: high — this is structurally a hack. Acceptable as a
|
||
diagnostic.
|
||
|
||
### Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC)
|
||
|
||
**Action**: with Step 1's crowbar enabled, ours reaches the
|
||
post-wedge code path. Compare ours and canary on what `import.call`
|
||
(kernel API) sequence the **caller** of `sub_824FD240` makes
|
||
immediately before the POD-copy install.
|
||
|
||
The caller chain (per AUDIT-064/068) is:
|
||
|
||
```
|
||
sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]
|
||
```
|
||
|
||
So `sub_824F7800` calls `sub_824FD240` at offset `+0x38`, BEFORE it
|
||
calls `sub_825070F0` at offset `+0x320`.
|
||
|
||
Question: what does `sub_824F8398`'s caller (one level up,
|
||
`sub_821B55D8`) pass as arguments, and what kernel APIs run in
|
||
between? We need to trace tid=6's events in canary in the wallclock
|
||
window [9.4 s, 9.6 s] — the install epoch.
|
||
|
||
**LOC budget**: 0. Pure event-stream analysis on captured canary
|
||
jsonl (we already have `canary-jitter-1.jsonl`, 18.7M events).
|
||
**Output**: an ordered list of kernel calls just before
|
||
`sub_824FD240+0x24` fires. If any are missing in ours, that's a
|
||
candidate gap.
|
||
|
||
### Step 3 — Mirror the trigger in ours (variable LOC)
|
||
|
||
Once Step 2 names the missing kernel call(s), implement them in ours
|
||
following Phase C cadence (verify per-call return values match canary;
|
||
add diff-tool tests; document in memory).
|
||
|
||
**LOC budget**: depends on what's missing. Could be 10–500 LOC.
|
||
|
||
### Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)
|
||
|
||
With Step 3's fix in place, remove `--force-spawn-workers`. Re-run
|
||
ours. If the natural bootstrap chain runs and `draws ≥ 1, swaps ≥ 2`,
|
||
we've fixed the bug.
|
||
|
||
If progression still fails without the crowbar, there's another gap;
|
||
re-enter at Step 2 with a refined trigger search.
|
||
|
||
### Step 5 — Validate gameplay frame parity (~0–50 LOC)
|
||
|
||
Capture renderer-thread VdSwap counts at 90 s wallclock in both
|
||
engines. Target: ours's renderer emits within ±30% of canary's
|
||
12,092 VdSwap/90s. If yes: first-draw is reached and sustained.
|
||
|
||
If ours's renderer emits but at a much lower rate, that's a follow-up
|
||
performance issue, not a correctness one. Defer.
|
||
|
||
## Expected progression per step
|
||
|
||
| Step | Expected `swaps` | Expected `draws` | Expected `unique_render_targets` | LOC delta |
|
||
|---|---:|---:|---:|---:|
|
||
| Pre-roadmap | 1 | 0 | 0 | — |
|
||
| Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 |
|
||
| Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 |
|
||
| Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 |
|
||
| Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) |
|
||
| Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 |
|
||
|
||
## What's NOT on this path (explicitly deferred)
|
||
|
||
1. **Host-audio bridge / XAudio resume**: the XAudio thread tids 14/15
|
||
spawning suspended-and-never-resumed in ours is real but parallel
|
||
to the worker-cluster wedge. In canary, both threads run; in ours,
|
||
neither runs. Pursuing XAudio fixes does not address the
|
||
graphics-blocking wedge. Defer to a separate
|
||
"post-first-draw" audit cluster.
|
||
2. **HID / controller**: Sylpheed's intro movie / title screen play
|
||
without user input. HID is irrelevant for first-draw.
|
||
3. **XAM content / save games**: irrelevant for first-draw; the
|
||
intro/title screens don't require save-game enumeration.
|
||
4. **Scheduler determinism** (per `scheduler_determinism_plan` /
|
||
Phase D Stages 0-4): null result, off-path. The wedge is upstream
|
||
of any contention. Defer indefinitely or close.
|
||
5. **Diff-tool canonicalization** (Phase C-style fixes): saturated on
|
||
moving matched-prefix without moving progression. **Halt** further
|
||
work in this class until Step 4 lands and re-baselines the diff
|
||
workload.
|
||
6. **AUDIT-068 host-side install probes**: superseded by AUDIT-068
|
||
Session 4 (writer identified at GUEST PC `sub_824FD240+0x24`).
|
||
The remaining question is *what triggers* `sub_824FD240`, which
|
||
Step 2 addresses.
|
||
|
||
## Alternative path (rejected)
|
||
|
||
**Skip the crowbar; do the trigger investigation cold.** Read canary
|
||
source for `sub_824FD240` callers, walk upward, identify the trigger.
|
||
Why rejected: `sub_824FD240` is GAME code, not canary engine code —
|
||
the file we'd "read" is the disassembly of the XEX. We'd need to
|
||
disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per
|
||
sylpheed.db, `sub_824FD240`'s static caller is `sub_824F7800+0x38`
|
||
(in line with AUDIT-064). But what guest *call* causes `sub_824F7800`
|
||
to be invoked is itself a multi-fn upstream investigation that
|
||
returns to the same wedge cycle. The crowbar bypasses this paradox.
|
||
|
||
## Risk assessment
|
||
|
||
- **Step 1 catastrophic failure**: ours's emulator panics or
|
||
segfaults when the force-spawn workers run. Mitigation: gate
|
||
behind `--debug-only` cvar; ensure ours's CPU executes the worker
|
||
entries in normal sandboxed PPC JIT; if they fault on missing
|
||
guest state, log and exit cleanly.
|
||
- **Step 1 "succeeds but draws=0 anyway"**: the workers run but
|
||
ours's tid=13 still doesn't unblock — there's an unmodelled state
|
||
beyond just the missing thread spawns. Mitigation: log every event
|
||
the new workers emit; compare with canary's tid=27/28/29 streams in
|
||
`canary-jitter-1.jsonl`.
|
||
- **Step 3 LOC explosion**: the trigger turns out to be a large
|
||
subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to
|
||
a stub that returns "canary-equivalent" values without full
|
||
implementation.
|
||
|
||
## Confidence levels
|
||
|
||
- Step 1 unblocks the wedge if executed correctly: **MEDIUM** (60%).
|
||
Honest assessment: 25 prior audits have not unblocked it through
|
||
natural fixes, so the crowbar approach is novel and the failure
|
||
mode may not match expectations.
|
||
- Step 2 identifies a trigger in ≤1 session: **HIGH** (85%) — the
|
||
canary jsonl already has the data; analysis is mechanical.
|
||
- Step 3 LOC budget ≤500: **MEDIUM** (50%) — depends entirely on Step
|
||
2's answer.
|
||
- Step 4 natural bootstrap works post-Step-3: **MEDIUM** (50%) —
|
||
there may be additional gaps the crowbar masked.
|
||
|
||
## Memory hygiene
|
||
|
||
After Step 1 lands (crowbar binary in place), check that
|
||
`xenia-rs/target/release/xenia-rs` builds cleanly with the new cvar.
|
||
Verify Phase B `image_canonical_sha256` is updated (the crowbar
|
||
changes engine LOC); document the new baseline. Confirm 3× cold
|
||
runs produce identical digests with the crowbar enabled.
|
||
|
||
## What "winning" looks like
|
||
|
||
`xenia-rs check --stable-digest -n 50000000` (or higher cap, e.g.
|
||
`-n 500000000` to reach 30 s wallclock) outputs:
|
||
|
||
```json
|
||
{
|
||
"instructions": 50000007,
|
||
"imports": 40390+,
|
||
"draws": >= 1,
|
||
"swaps": >= 2,
|
||
"unique_render_targets": >= 1,
|
||
"shader_blobs_live": >= 1,
|
||
"texture_cache_entries": >= 1
|
||
}
|
||
```
|
||
|
||
…and the value is reproducible across 3 cold runs. A non-zero
|
||
`draws` value means at least one PM4_TYPE3 DRAW_INDX packet was
|
||
emitted by the renderer thread.
|