Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

11 KiB

Raw Blame History

Shortest-path-to-first-gameplay-draw roadmap

Date: 2026-05-21 Read-only investigation; no LOC changes proposed. Premise: 25+ iterates have advanced matched-prefix 102,168 → 105,128 (+2,960 events) but draws=0, swaps=1, render_targets=0 have not moved. This roadmap proposes a non-canonicalization path forward.

Definitions

First gameplay draw = the first VdSwap call by ours's renderer (the thread spawned at entry 0x822F1EE0, ours's tid analog of canary tid=13) that emits at least one PM4_TYPE3 DRAW_INDX packet into the ringbuffer.
Observable success criterion: draws ≥ 1, swaps ≥ 2, unique_render_targets ≥ 1 in xenia-rs check --stable-digest output. At least one frame from the renderer thread (not the boot-init swap that ours already emits).

Why current iteration has stalled

The wedge has been mapped and remapped 20+ times. Every audit correctly identifies symptoms; every fix correctly canonicalizes a diff-tool divergence. But the wedge is structurally cyclic: the worker cluster that signals the wait is downstream of the wait completing. Standard "find the divergent kernel call, mirror canary's semantics" has saturated.

Two strategies remain that have NOT been tried at full scope:

(A) Decouple the cycle by faking the worker activation: directly call sub_825070F0 from a host shim, or directly spawn the 4 worker threads with the right ctx, sidestepping the activation chain. This is a crowbar: it doesn't fix the underlying bootstrap bug, but it tests "are the workers functionally correct IF activated." If they signal the wedge and ours then reaches first draw, we know the bug is exclusively in the activation gate, and we can attack just that.
(B) Find what triggers sub_824FD240+0x24's POD-copy in canary. AUDIT-068 Session 4 pinned the install epoch of vtable 0x8200A1E8 to this writer site. But the caller of sub_824FD240 — what guest call leads to it firing — is unidentified. In ours, sub_824FD240 fires 0× because the call chain sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240 is downstream of the tid=13 wedge. So we have circular reasoning again — UNLESS Strategy A is applied first.

The roadmap below uses Strategy A as a wedge-crowbar and Strategy B as the principled fix that follows.

Roadmap

Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80–150 LOC)

Action: in xenia-rs add a debug-only cvar --force-spawn-workers that, when set, after some bootstrap checkpoint (e.g., first VdInitializeRingBuffer return), manually spawns 4 ExCreateThread-equivalent guest threads with:

entries 0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8
ctx_ptr = run-determined; allocate a fresh ANON_Class_713383D7-shaped object on the unified heap and write vtable 0x8200A1E8 to slot 0 (mirror the POD-copy at sub_824FD240+0x24)
stack_size 65536, suspended=True initially, then NtResumeThread

Expected effect:

If the workers run correctly and signal the wedge: ours's tid=13 unblocks, tid=1's join completes, normal game-loop begins. draws ≥ 1, swaps ≥ 2.
If the workers fail (e.g., faulting because the ctx object's other fields aren't initialized): we learn what else needs to be installed alongside the vtable.

Failure modes to expect:

The worker entries dispatch via vtable slots 35/36/37/38 of the ANON_Class — those slots also need to be populated. Audit-067 static analysis shows the vtable has 7 entries; the worker entries use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable) per sub_825070F0.md line 32-37. So we'll need a parent class / derived class layout.
The ctx object also has refcount/header fields that must be initialized — see AUDIT-068 Session 3 finding of 12-byte struct copy {vptr, self, self} followed by refcount=1.

LOC budget: 80-150 LOC ours-side; 0 LOC canary. Read-only fallback: if force-spawn fails immediately, we've still captured the failure mode, which is informative. Risk: high — this is structurally a hack. Acceptable as a diagnostic.

Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC)

Action: with Step 1's crowbar enabled, ours reaches the post-wedge code path. Compare ours and canary on what import.call (kernel API) sequence the caller of sub_824FD240 makes immediately before the POD-copy install.

The caller chain (per AUDIT-064/068) is:

sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]

So sub_824F7800 calls sub_824FD240 at offset +0x38, BEFORE it calls sub_825070F0 at offset +0x320.

Question: what does sub_824F8398's caller (one level up, sub_821B55D8) pass as arguments, and what kernel APIs run in between? We need to trace tid=6's events in canary in the wallclock window [9.4 s, 9.6 s] — the install epoch.

LOC budget: 0. Pure event-stream analysis on captured canary jsonl (we already have canary-jitter-1.jsonl, 18.7M events). Output: an ordered list of kernel calls just before sub_824FD240+0x24 fires. If any are missing in ours, that's a candidate gap.

Step 3 — Mirror the trigger in ours (variable LOC)

Once Step 2 names the missing kernel call(s), implement them in ours following Phase C cadence (verify per-call return values match canary; add diff-tool tests; document in memory).

LOC budget: depends on what's missing. Could be 10–500 LOC.

Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)

With Step 3's fix in place, remove --force-spawn-workers. Re-run ours. If the natural bootstrap chain runs and draws ≥ 1, swaps ≥ 2, we've fixed the bug.

If progression still fails without the crowbar, there's another gap; re-enter at Step 2 with a refined trigger search.

Step 5 — Validate gameplay frame parity (~0–50 LOC)

Capture renderer-thread VdSwap counts at 90 s wallclock in both engines. Target: ours's renderer emits within ±30% of canary's 12,092 VdSwap/90s. If yes: first-draw is reached and sustained.

If ours's renderer emits but at a much lower rate, that's a follow-up performance issue, not a correctness one. Defer.

Expected progression per step

Step	Expected `swaps`	Expected `draws`	Expected `unique_render_targets`	LOC delta
Pre-roadmap	1	0	0	—
Step 1 (crowbar)	2-N	1-N	1+	~150
Step 2 (trigger ID)	(unchanged)	(unchanged)	(unchanged)	0
Step 3 (mirror)	2-N	1-N	1+	10-500
Step 4 (decrowbar)	2-N	1-N	1+	-150 (remove)
Step 5 (parity)	100+	100+	1-5	0-50

What's NOT on this path (explicitly deferred)

Host-audio bridge / XAudio resume: the XAudio thread tids 14/15 spawning suspended-and-never-resumed in ours is real but parallel to the worker-cluster wedge. In canary, both threads run; in ours, neither runs. Pursuing XAudio fixes does not address the graphics-blocking wedge. Defer to a separate "post-first-draw" audit cluster.
HID / controller: Sylpheed's intro movie / title screen play without user input. HID is irrelevant for first-draw.
XAM content / save games: irrelevant for first-draw; the intro/title screens don't require save-game enumeration.
Scheduler determinism (per scheduler_determinism_plan / Phase D Stages 0-4): null result, off-path. The wedge is upstream of any contention. Defer indefinitely or close.
Diff-tool canonicalization (Phase C-style fixes): saturated on moving matched-prefix without moving progression. Halt further work in this class until Step 4 lands and re-baselines the diff workload.
AUDIT-068 host-side install probes: superseded by AUDIT-068 Session 4 (writer identified at GUEST PC sub_824FD240+0x24). The remaining question is what triggers sub_824FD240, which Step 2 addresses.

Alternative path (rejected)

Skip the crowbar; do the trigger investigation cold. Read canary source for sub_824FD240 callers, walk upward, identify the trigger. Why rejected: sub_824FD240 is GAME code, not canary engine code — the file we'd "read" is the disassembly of the XEX. We'd need to disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per sylpheed.db, sub_824FD240's static caller is sub_824F7800+0x38 (in line with AUDIT-064). But what guest call causes sub_824F7800 to be invoked is itself a multi-fn upstream investigation that returns to the same wedge cycle. The crowbar bypasses this paradox.

Risk assessment

Step 1 catastrophic failure: ours's emulator panics or segfaults when the force-spawn workers run. Mitigation: gate behind --debug-only cvar; ensure ours's CPU executes the worker entries in normal sandboxed PPC JIT; if they fault on missing guest state, log and exit cleanly.
Step 1 "succeeds but draws=0 anyway": the workers run but ours's tid=13 still doesn't unblock — there's an unmodelled state beyond just the missing thread spawns. Mitigation: log every event the new workers emit; compare with canary's tid=27/28/29 streams in canary-jitter-1.jsonl.
Step 3 LOC explosion: the trigger turns out to be a large subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to a stub that returns "canary-equivalent" values without full implementation.

Confidence levels

Step 1 unblocks the wedge if executed correctly: MEDIUM (60%). Honest assessment: 25 prior audits have not unblocked it through natural fixes, so the crowbar approach is novel and the failure mode may not match expectations.
Step 2 identifies a trigger in ≤1 session: HIGH (85%) — the canary jsonl already has the data; analysis is mechanical.
Step 3 LOC budget ≤500: MEDIUM (50%) — depends entirely on Step 2's answer.
Step 4 natural bootstrap works post-Step-3: MEDIUM (50%) — there may be additional gaps the crowbar masked.

Memory hygiene

After Step 1 lands (crowbar binary in place), check that xenia-rs/target/release/xenia-rs builds cleanly with the new cvar. Verify Phase B image_canonical_sha256 is updated (the crowbar changes engine LOC); document the new baseline. Confirm 3× cold runs produce identical digests with the crowbar enabled.

What "winning" looks like

xenia-rs check --stable-digest -n 50000000 (or higher cap, e.g. -n 500000000 to reach 30 s wallclock) outputs:

{
  "instructions": 50000007,
  "imports": 40390+,
  "draws": >= 1,
  "swaps": >= 2,
  "unique_render_targets": >= 1,
  "shader_blobs_live": >= 1,
  "texture_cache_entries": >= 1
}

…and the value is reproducible across 3 cold runs. A non-zero draws value means at least one PM4_TYPE3 DRAW_INDX packet was emitted by the renderer thread.

11 KiB Raw Blame History Unescape Escape