Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

14 KiB

Raw Blame History

Review A — boot-state review and shortest-path roadmap

Session type: PLAN-only. No engine LOC changes; no canary instrumentation changes. Read-only investigation across the existing audit chain artifacts. Date: 2026-05-21 Companion documents (in this directory):

canary-boot-trajectory.md — canary's call chain from entry_point to first gameplay draw, with wallclock timestamps.
ours-wedge-localization.md — precise where-ours-stops, in graph terms.
shortest-path-roadmap.md — 3-5 step roadmap with expected progression delta per step.
methodology-assessment.md — alternative metric proposal.

This plan.md summarizes the five framing questions with answers backed by file:line citations.

Q1 — What is "first draw" in canary's Sylpheed boot?

Two distinct "draws" must be disambiguated.

Q1.a: First boot-init `VdSwap` (the swap=1 event)

Canary's tid=6 (guest main) emits one VdSwap at ~9.5 s wallclock, immediately after the GPU subsystem init sequence VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback → VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer. This swap publishes the boot framebuffer and contains no draw packets.

Ours also reaches this swap — visible in phase-w-wedge-reattack/ours-postfix.jsonl at idx 105283 (host_ns 496,276,229). This is what produces ours's swaps=1 metric.

Both engines reach this point. It is NOT the gate.

Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event)

Canary's renderer tid=13 (entry 0x822F1EE0, spawned suspended at 1.671 s) wakes after the sub_825070F0 worker fan-out at host_ns ≈ 10.383 s and begins emitting VdGetSystemCommandBuffer / VdSwap pairs at ~150 fps. Canary's tid=13 emits 12,092 VdSwap calls in the 90-s window (per phase-nonmatch-investigation/canary-tid-profiles.md:21).

The first of these is the first gameplay draw, fired at ~10.7 s wallclock — about 1.2 s after the sub_825070F0 fan-out triggers the worker cluster.

Pre-conditions canary establishes before this point (per canary-boot-trajectory.md):

Vtable 0x8200A1E8 of ANON_Class_713383D7 installed at host_ns ≈ 9.4-9.6 s via POD-copy at GUEST PC sub_824FD240+0x24 (per project_audit_068_session4_2026_05_20).
Activation chain sub_822F1AA8 → sub_82173990 → sub_821746B0 → sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800 → bctrl vtable[1] = sub_825070F0 fires on tid=6.
sub_825070F0 spawns 4 worker threads with entries 0x82506528/58/88/B8 and shared ctx 0xBCE251C0.
Workers (canary tids 27/28/29) emit signals that unwedge the sub_821CB030 Event waits across the cache-file IO completion chain.
Renderer tid=13's body (entered earlier but blocked on a tid=14/15 XAudio-coordinated event) unblocks; per-frame VdGetSystemCommandBuffer / VdSwap loop begins.

Q2 — What is ours's actual progress, and what's the wedge root cause?

Ours stops at the first wait in the activation chain. Specifically:

tid=1 (main) wedged at sub_82173990+0x2D4 (PC 0x824ac578 = do_wait_single) on handle 0x12c8 = Thread(id=13) — waiting for the renderer's thread handle to signal (which happens only when tid=13 calls ExTerminateThread).
tid=13 (renderer / cache-IO worker) wedged at sub_821CB030+0x1B0 on handle 0x12d0 = Event/Auto, created by itself via NtCreateEvent at sub_821CB030+0x128. signals=0, wakes=0 — <NO_SIGNALS_DESPITE_WAITS>.
sub_825070F0 fires 0× at any horizon probed.

Citation: phase-w-wedge-reattack/halt-on-deadlock-dump.txt + phase-w-wedge-reattack/current-state.md.

Root cause (at one structural level deeper than the wedge symptom)

Per AUDIT-069 Session 5 (the most recent measurement):

Canary fires 414 NtReleaseSemaphore calls on the work-queue semaphore in the 90-s window.
Ours fires 99 (24%).
Breakdown: Worker (382 vs 90), Main (7 vs 8), Other producers (25 vs 1).

The "other producers (25 vs 1)" gap is the load-bearing discrepancy. Canary has 24 additional thread sources releasing the work semaphore during bootstrap that ours does not have. These correspond to:

The 4 sub_825070F0 workers (canary tids 27/28/29 + 1) — absent in ours.
XAudio render threads (canary tids 14/15, spawned suspended in both engines, resumed only in canary).
The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) — 8 helpers including file-IO and NtWaitForMultipleObjectsEx workers — absent in ours.

The ONE structural issue

Ours never reaches sub_825070F0 because the activation chain that calls it is downstream of tid=13's wedge; and tid=13's wedge is downstream of the worker cluster activation; and the worker cluster activation is sub_825070F0. This is a self-referential lock.

Canary breaks the lock because some part of the bootstrap pre-activates the producers (probably via XAudio thread resume at 1.726 s, which then runs ahead, populates the work queue, signals events, etc.). Ours never resumes the XAudio threads — they're spawned suspended and stay that way.

The single highest-leverage gap is the XAudio thread resume, because (a) it happens early (1.726 s in canary vs. ours's wedge which fixes around 1.4 s — i.e. the resume should happen before the wedge), (b) it activates the dominant event producers, and (c) AUDIT-069 S5's "other producers 25 vs 1" finding implicates exactly this class of thread.

Q3 — Shortest-path-to-first-draw roadmap

Three to four steps (full detail in shortest-path-roadmap.md):

Step 1 (~80-150 LOC, ours-side): add --force-spawn-workers cvar that crowbars sub_825070F0 activation by directly spawning the 4 worker threads with the right ctx after VdInitializeRingBuffer returns. Tests "are the workers functionally correct if activated" and "does activating them unwedge sub_821CB030."
Step 2 (~0 LOC): with Step 1 active, mine the canary jsonl for the kernel-call sequence on tid=6 in the wallclock window [9.4 s, 9.6 s] (the install epoch). Identify what guest call triggers sub_824FD240+0x24's POD-copy of the vtable in canary.
Step 3 (~10-500 LOC, depending on what Step 2 finds): mirror that trigger in ours — likely a missing kernel-import return value or a missing post-condition that the trigger inspects.
Step 4 (~0 LOC; remove crowbar): re-test ours without --force-spawn-workers. Verify natural bootstrap reaches sub_825070F0 activation.
Step 5 (~0-50 LOC): measure renderer-thread VdSwap rate over 90 s wallclock; target ±30% of canary's 12,092 calls.

Expected delta:

After step	`swaps`	`draws`	`unique_render_targets`
Pre	1	0	0
Step 1 (crowbar)	2+	1+	1+
Step 4 (decrowbar)	2+	1+	1+
Step 5 (parity)	100+	100+	1-5

Q4 — What's NOT on the shortest path

Explicitly deferred (full rationale in shortest-path-roadmap.md):

Audio (host-audio- / XAudio implementation)* — even though XAudio thread resume MAY be the trigger from Q2, ours's existing XAudio shim is sufficient for the workers to bootstrap if they receive the right kernel-call sequence. Full XAudio implementation is beyond first-draw scope.
HID — Sylpheed's intro/title screens are auto-advance; no input needed.
XAM content / save games — not on first-draw path.
Scheduler determinism work (Phase D Stages 0-4 and beyond) — null result; the wedge is upstream of contention scheduling. Close or indefinitely defer.
Diff-tool canonicalization (Phase C+N for N > 25) — saturated on matched-prefix without progression; halt this work class until Step 4 lands and the workload re-baselines.
AUDIT-068 host-side install probes — superseded by AUDIT-068 Session 4 finding (writer is GUEST PC, not host). The followup question is what triggers the guest code path, which Step 2 addresses through cheaper means.

Q5 — Methodology assessment

Current methodology relied on matched-prefix as a progression proxy. This assumption is now empirically falsified: +2,960 events of matched-prefix advancement produced 0 units of progression (swaps=1, draws=0 across 25+ iterates).

Proposed alternative metric

Option 6 (composite progression_score):

progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
                  + 0.001 * matched_prefix

Continuous gradient; honest about wedge-solving vs. canonicalization priority. Requires ~10 LOC to add to digest.json.

Discipline: tag every iterate as either "canonicalization only — no progression" or "progression". Cap at 5 consecutive canonicalization-only iterates before mandatory pivot to wedge-attack work.

New reading-error #39

#39 (matched-prefix as progression proxy): matched-prefix measures engine-to-engine divergence point, NOT game-to-game functional gap. When the wedge is on a different thread than the matched-prefix anchor thread, advancing matched-prefix is orthogonal to unwedging. Future audits MUST distinguish "ours's tid-X diverges from canary's tid-Y" from "ours's tid-X is blocked because tid-Z is wedged", and target the wedge directly when present.

Counterintuitive findings (anti-anchoring)

Per Tripstones in the task brief:

1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap.

The shared boot-init VdSwap fires in both. Ours's swaps=1 metric is "achieved, just at the same point canary also did it". The divergence is NOT "ours can't do the first swap"; it's "ours can't do the SECOND through Nth swap (the gameplay loop)".

2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.

canary-jitter-1.jsonl shows 12,092 VdSwap calls on canary tid=13 in 90 s wallclock — definitively in the gameplay rendering loop, not pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80 events total before wedging — definitively before gameplay starts. The "both engines pre-first-draw" hypothesis is FALSE.

3. The matched-prefix metric is on the WRONG thread.

Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main threads. But the wedge is on tid=13 in both engines — the renderer thread. Tid=1's matched-prefix can advance 105,128 events without ever touching the wedge.

4. The "boot-state-machine" framing is misleading.

There's no monolithic boot state machine. There are ~28 threads in canary, each running their own lifecycle, communicating via shared kernel objects. The bottleneck isn't a state transition; it's a THREAD ACTIVATION GAP.

5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.

The vtable install IS interesting but it's downstream of the producer gap. Producers must be running to populate the work queue, which gets the worker to do its thing, which signals the wedge, which lets the activation chain continue, which calls sub_824FD240+0x24, which writes the vtable. Fixing the vtable install in isolation (e.g., via a host-side mem-write hack) doesn't help if no producer is feeding work to the workers.

Cascade prediction confidence

A — canary boot trajectory characterized: DONE, HIGH (canary-jitter-1.jsonl provides direct evidence).
B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": DONE, MEDIUM-HIGH (AUDIT-069 S5 "other producers 25 vs 1" finding).
C — shortest-path roadmap with ≤5 steps: DONE, MEDIUM (5 steps; Step 1 confidence ~60%).
D — alternative metric proposed: DONE, HIGH (Option 6 composite, plus reading-error #39).

Open questions / known unknowns

What is the bootstrap trigger for canary's sub_824FD240+0x24? Roadmap Step 2 addresses. Could be answered in <1 session of canary jsonl analysis.
Does Step 1's crowbar produce a clean wedge-unblock, or does it reveal additional unmodelled state in the ctx object? Empirical; testable in one session.
Are canary's XAudio threads (tids 14/15) the actual missing producer, or are they downstream of the same trigger? Worth a targeted probe before Step 1; ~50 LOC ours-side to log NtResumeThread on the XAudio entry PCs.
Will the AUDIT-067 "vtable install is host-side" finding resurface? No — AUDIT-068 S4 falsified this; the writer is GUEST PC sub_824FD240+0x24. The "host-side" framing was a mis-read of the POD-copy semantics (reading-error #36).

Recommended next action

Dispatch a "progression iterate" implementing Step 1 of the roadmap (--force-spawn-workers crowbar, ~80-150 LOC ours-side). This is a high-variance, high-reward iterate; expected outcome is either swaps ≥ 2, draws ≥ 1 (success — wedge structurally isolated to thread activation) or an informative failure mode (e.g., worker faults at first vtable bctrl indicating additional state needed in ctx object). Time-box: 1 session, max 2h.

If Step 1 succeeds in ANY way (even if draws stays 0), the next iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl). This step has minimal risk and uses existing tooling.

If Step 1 fails completely (panic / segfault unrecoverable), revert the crowbar and reframe: the wedge may be in ours's kernel-handler implementations themselves, not just bootstrap activation. At that point a deeper Path β engine investigation is unavoidable.

Memory hygiene note

This review is read-only. xenia-rs HEAD unchanged. canary HEAD unchanged. sylpheed.db unchanged. No new artifacts beyond this directory.

After dispatching Step 1, future memory entries should adopt the new progression_score + tagging discipline outlined in methodology-assessment.md.

14 KiB Raw Blame History Unescape Escape