Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
14 KiB
Review A — boot-state review and shortest-path roadmap
Session type: PLAN-only. No engine LOC changes; no canary instrumentation changes. Read-only investigation across the existing audit chain artifacts. Date: 2026-05-21 Companion documents (in this directory):
canary-boot-trajectory.md— canary's call chain from entry_point to first gameplay draw, with wallclock timestamps.ours-wedge-localization.md— precise where-ours-stops, in graph terms.shortest-path-roadmap.md— 3-5 step roadmap with expected progression delta per step.methodology-assessment.md— alternative metric proposal.
This plan.md summarizes the five framing questions with answers
backed by file:line citations.
Q1 — What is "first draw" in canary's Sylpheed boot?
Two distinct "draws" must be disambiguated.
Q1.a: First boot-init VdSwap (the swap=1 event)
Canary's tid=6 (guest main) emits one VdSwap at ~9.5 s
wallclock, immediately after the GPU subsystem init sequence
VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback → VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer.
This swap publishes the boot framebuffer and contains no draw packets.
Ours also reaches this swap — visible in
phase-w-wedge-reattack/ours-postfix.jsonl at idx 105283 (host_ns
496,276,229). This is what produces ours's swaps=1 metric.
Both engines reach this point. It is NOT the gate.
Q1.b: First gameplay VdSwap (the swap≥2 / draws≥1 event)
Canary's renderer tid=13 (entry 0x822F1EE0, spawned suspended at
1.671 s) wakes after the sub_825070F0 worker fan-out at host_ns
≈ 10.383 s and begins emitting VdGetSystemCommandBuffer /
VdSwap pairs at ~150 fps. Canary's tid=13 emits 12,092
VdSwap calls in the 90-s window (per
phase-nonmatch-investigation/canary-tid-profiles.md:21).
The first of these is the first gameplay draw, fired at ~10.7 s
wallclock — about 1.2 s after the sub_825070F0 fan-out triggers
the worker cluster.
Pre-conditions canary establishes before this point (per
canary-boot-trajectory.md):
- Vtable
0x8200A1E8ofANON_Class_713383D7installed at host_ns ≈ 9.4-9.6 s via POD-copy at GUEST PCsub_824FD240+0x24(perproject_audit_068_session4_2026_05_20). - Activation chain
sub_822F1AA8 → sub_82173990 → sub_821746B0 → sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800 → bctrl vtable[1] = sub_825070F0fires on tid=6. sub_825070F0spawns 4 worker threads with entries0x82506528/58/88/B8and shared ctx0xBCE251C0.- Workers (canary tids 27/28/29) emit signals that unwedge the
sub_821CB030Event waits across the cache-file IO completion chain. - Renderer tid=13's body (entered earlier but blocked on a
tid=14/15 XAudio-coordinated event) unblocks; per-frame
VdGetSystemCommandBuffer/VdSwaploop begins.
Q2 — What is ours's actual progress, and what's the wedge root cause?
Ours stops at the first wait in the activation chain. Specifically:
- tid=1 (main) wedged at
sub_82173990+0x2D4(PC0x824ac578=do_wait_single) on handle0x12c8=Thread(id=13)— waiting for the renderer's thread handle to signal (which happens only when tid=13 callsExTerminateThread). - tid=13 (renderer / cache-IO worker) wedged at
sub_821CB030+0x1B0on handle0x12d0=Event/Auto, created by itself viaNtCreateEventatsub_821CB030+0x128.signals=0, wakes=0—<NO_SIGNALS_DESPITE_WAITS>. sub_825070F0fires 0× at any horizon probed.
Citation: phase-w-wedge-reattack/halt-on-deadlock-dump.txt +
phase-w-wedge-reattack/current-state.md.
Root cause (at one structural level deeper than the wedge symptom)
Per AUDIT-069 Session 5 (the most recent measurement):
- Canary fires 414
NtReleaseSemaphorecalls on the work-queue semaphore in the 90-s window. - Ours fires 99 (24%).
- Breakdown: Worker (382 vs 90), Main (7 vs 8), Other producers (25 vs 1).
The "other producers (25 vs 1)" gap is the load-bearing discrepancy. Canary has 24 additional thread sources releasing the work semaphore during bootstrap that ours does not have. These correspond to:
- The 4
sub_825070F0workers (canary tids 27/28/29 + 1) — absent in ours. - XAudio render threads (canary tids 14/15, spawned suspended in both engines, resumed only in canary).
- The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) — 8 helpers including file-IO and NtWaitForMultipleObjectsEx workers — absent in ours.
The ONE structural issue
Ours never reaches
sub_825070F0because the activation chain that calls it is downstream of tid=13's wedge; and tid=13's wedge is downstream of the worker cluster activation; and the worker cluster activation issub_825070F0. This is a self-referential lock.
Canary breaks the lock because some part of the bootstrap pre-activates the producers (probably via XAudio thread resume at 1.726 s, which then runs ahead, populates the work queue, signals events, etc.). Ours never resumes the XAudio threads — they're spawned suspended and stay that way.
The single highest-leverage gap is the XAudio thread resume, because (a) it happens early (1.726 s in canary vs. ours's wedge which fixes around 1.4 s — i.e. the resume should happen before the wedge), (b) it activates the dominant event producers, and (c) AUDIT-069 S5's "other producers 25 vs 1" finding implicates exactly this class of thread.
Q3 — Shortest-path-to-first-draw roadmap
Three to four steps (full detail in shortest-path-roadmap.md):
- Step 1 (~80-150 LOC, ours-side): add
--force-spawn-workerscvar that crowbarssub_825070F0activation by directly spawning the 4 worker threads with the right ctx afterVdInitializeRingBufferreturns. Tests "are the workers functionally correct if activated" and "does activating them unwedge sub_821CB030." - Step 2 (~0 LOC): with Step 1 active, mine the canary jsonl for
the kernel-call sequence on tid=6 in the wallclock window [9.4 s,
9.6 s] (the install epoch). Identify what guest call triggers
sub_824FD240+0x24's POD-copy of the vtable in canary. - Step 3 (~10-500 LOC, depending on what Step 2 finds): mirror that trigger in ours — likely a missing kernel-import return value or a missing post-condition that the trigger inspects.
- Step 4 (~0 LOC; remove crowbar): re-test ours without
--force-spawn-workers. Verify natural bootstrap reachessub_825070F0activation. - Step 5 (~0-50 LOC): measure renderer-thread VdSwap rate over 90 s wallclock; target ±30% of canary's 12,092 calls.
Expected delta:
| After step | swaps |
draws |
unique_render_targets |
|---|---|---|---|
| Pre | 1 | 0 | 0 |
| Step 1 (crowbar) | 2+ | 1+ | 1+ |
| Step 4 (decrowbar) | 2+ | 1+ | 1+ |
| Step 5 (parity) | 100+ | 100+ | 1-5 |
Q4 — What's NOT on the shortest path
Explicitly deferred (full rationale in shortest-path-roadmap.md):
- Audio (host-audio- / XAudio implementation)* — even though XAudio thread resume MAY be the trigger from Q2, ours's existing XAudio shim is sufficient for the workers to bootstrap if they receive the right kernel-call sequence. Full XAudio implementation is beyond first-draw scope.
- HID — Sylpheed's intro/title screens are auto-advance; no input needed.
- XAM content / save games — not on first-draw path.
- Scheduler determinism work (Phase D Stages 0-4 and beyond) — null result; the wedge is upstream of contention scheduling. Close or indefinitely defer.
- Diff-tool canonicalization (Phase C+N for N > 25) — saturated on matched-prefix without progression; halt this work class until Step 4 lands and the workload re-baselines.
- AUDIT-068 host-side install probes — superseded by AUDIT-068 Session 4 finding (writer is GUEST PC, not host). The followup question is what triggers the guest code path, which Step 2 addresses through cheaper means.
Q5 — Methodology assessment
Current methodology relied on matched-prefix as a progression
proxy. This assumption is now empirically falsified: +2,960
events of matched-prefix advancement produced 0 units of progression
(swaps=1, draws=0 across 25+ iterates).
Proposed alternative metric
Option 6 (composite progression_score):
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+ 0.001 * matched_prefix
Continuous gradient; honest about wedge-solving vs. canonicalization
priority. Requires ~10 LOC to add to digest.json.
Discipline: tag every iterate as either "canonicalization only — no progression" or "progression". Cap at 5 consecutive canonicalization-only iterates before mandatory pivot to wedge-attack work.
New reading-error #39
#39 (matched-prefix as progression proxy): matched-prefix measures engine-to-engine divergence point, NOT game-to-game functional gap. When the wedge is on a different thread than the matched-prefix anchor thread, advancing matched-prefix is orthogonal to unwedging. Future audits MUST distinguish "ours's tid-X diverges from canary's tid-Y" from "ours's tid-X is blocked because tid-Z is wedged", and target the wedge directly when present.
Counterintuitive findings (anti-anchoring)
Per Tripstones in the task brief:
1. Both engines reach swaps=1; ours is NOT behind on the boot swap.
The shared boot-init VdSwap fires in both. Ours's swaps=1 metric
is "achieved, just at the same point canary also did it". The
divergence is NOT "ours can't do the first swap"; it's "ours can't do
the SECOND through Nth swap (the gameplay loop)".
2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.
canary-jitter-1.jsonl shows 12,092 VdSwap calls on canary tid=13 in
90 s wallclock — definitively in the gameplay rendering loop, not
pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80
events total before wedging — definitively before gameplay starts.
The "both engines pre-first-draw" hypothesis is FALSE.
3. The matched-prefix metric is on the WRONG thread.
Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main threads. But the wedge is on tid=13 in both engines — the renderer thread. Tid=1's matched-prefix can advance 105,128 events without ever touching the wedge.
4. The "boot-state-machine" framing is misleading.
There's no monolithic boot state machine. There are ~28 threads in canary, each running their own lifecycle, communicating via shared kernel objects. The bottleneck isn't a state transition; it's a THREAD ACTIVATION GAP.
5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.
The vtable install IS interesting but it's downstream of the producer
gap. Producers must be running to populate the work queue, which
gets the worker to do its thing, which signals the wedge, which lets
the activation chain continue, which calls sub_824FD240+0x24,
which writes the vtable. Fixing the vtable install in isolation
(e.g., via a host-side mem-write hack) doesn't help if no producer
is feeding work to the workers.
Cascade prediction confidence
- A — canary boot trajectory characterized: DONE, HIGH (canary-jitter-1.jsonl provides direct evidence).
- B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": DONE, MEDIUM-HIGH (AUDIT-069 S5 "other producers 25 vs 1" finding).
- C — shortest-path roadmap with ≤5 steps: DONE, MEDIUM (5 steps; Step 1 confidence ~60%).
- D — alternative metric proposed: DONE, HIGH (Option 6 composite, plus reading-error #39).
Open questions / known unknowns
- What is the bootstrap trigger for canary's
sub_824FD240+0x24? Roadmap Step 2 addresses. Could be answered in <1 session of canary jsonl analysis. - Does Step 1's crowbar produce a clean wedge-unblock, or does it reveal additional unmodelled state in the ctx object? Empirical; testable in one session.
- Are canary's XAudio threads (tids 14/15) the actual missing producer, or are they downstream of the same trigger? Worth a targeted probe before Step 1; ~50 LOC ours-side to log NtResumeThread on the XAudio entry PCs.
- Will the AUDIT-067 "vtable install is host-side" finding
resurface? No — AUDIT-068 S4 falsified this; the writer is
GUEST PC
sub_824FD240+0x24. The "host-side" framing was a mis-read of the POD-copy semantics (reading-error #36).
Recommended next action
Dispatch a "progression iterate" implementing Step 1 of the
roadmap (--force-spawn-workers crowbar, ~80-150 LOC ours-side).
This is a high-variance, high-reward iterate; expected outcome is
either swaps ≥ 2, draws ≥ 1 (success — wedge structurally
isolated to thread activation) or an informative failure mode (e.g.,
worker faults at first vtable bctrl indicating additional state
needed in ctx object). Time-box: 1 session, max 2h.
If Step 1 succeeds in ANY way (even if draws stays 0), the next iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl). This step has minimal risk and uses existing tooling.
If Step 1 fails completely (panic / segfault unrecoverable), revert the crowbar and reframe: the wedge may be in ours's kernel-handler implementations themselves, not just bootstrap activation. At that point a deeper Path β engine investigation is unavoidable.
Memory hygiene note
This review is read-only. xenia-rs HEAD unchanged. canary HEAD unchanged. sylpheed.db unchanged. No new artifacts beyond this directory.
After dispatching Step 1, future memory entries should adopt the
new progression_score + tagging discipline outlined in
methodology-assessment.md.