Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
334 lines
14 KiB
Markdown
334 lines
14 KiB
Markdown
# Review A — boot-state review and shortest-path roadmap
|
||
|
||
**Session type**: PLAN-only. No engine LOC changes; no canary
|
||
instrumentation changes. Read-only investigation across the
|
||
existing audit chain artifacts.
|
||
**Date**: 2026-05-21
|
||
**Companion documents** (in this directory):
|
||
- `canary-boot-trajectory.md` — canary's call chain from entry_point
|
||
to first gameplay draw, with wallclock timestamps.
|
||
- `ours-wedge-localization.md` — precise where-ours-stops, in graph
|
||
terms.
|
||
- `shortest-path-roadmap.md` — 3-5 step roadmap with expected
|
||
progression delta per step.
|
||
- `methodology-assessment.md` — alternative metric proposal.
|
||
|
||
This `plan.md` summarizes the five framing questions with answers
|
||
backed by file:line citations.
|
||
|
||
---
|
||
|
||
## Q1 — What is "first draw" in canary's Sylpheed boot?
|
||
|
||
**Two distinct "draws" must be disambiguated.**
|
||
|
||
### Q1.a: First boot-init `VdSwap` (the swap=1 event)
|
||
|
||
Canary's tid=6 (guest main) emits **one** `VdSwap` at ~9.5 s
|
||
wallclock, immediately after the GPU subsystem init sequence
|
||
`VdInitializeEngines → VdInitializeRingBuffer →
|
||
VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback →
|
||
VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer`.
|
||
This swap publishes the boot framebuffer and contains no draw packets.
|
||
|
||
**Ours also reaches this swap** — visible in
|
||
`phase-w-wedge-reattack/ours-postfix.jsonl` at idx 105283 (host_ns
|
||
496,276,229). This is what produces ours's `swaps=1` metric.
|
||
|
||
Both engines reach this point. **It is NOT the gate.**
|
||
|
||
### Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event)
|
||
|
||
Canary's renderer tid=13 (entry `0x822F1EE0`, spawned suspended at
|
||
1.671 s) wakes after the `sub_825070F0` worker fan-out at host_ns
|
||
≈ 10.383 s and begins emitting `VdGetSystemCommandBuffer` /
|
||
`VdSwap` pairs at ~150 fps. Canary's tid=13 emits **12,092
|
||
VdSwap calls in the 90-s window** (per
|
||
`phase-nonmatch-investigation/canary-tid-profiles.md:21`).
|
||
|
||
The first of these is the **first gameplay draw**, fired at ~10.7 s
|
||
wallclock — about 1.2 s after the `sub_825070F0` fan-out triggers
|
||
the worker cluster.
|
||
|
||
**Pre-conditions canary establishes before this point** (per
|
||
`canary-boot-trajectory.md`):
|
||
|
||
1. Vtable `0x8200A1E8` of `ANON_Class_713383D7` installed at host_ns
|
||
≈ 9.4-9.6 s via POD-copy at GUEST PC `sub_824FD240+0x24`
|
||
(per `project_audit_068_session4_2026_05_20`).
|
||
2. Activation chain `sub_822F1AA8 → sub_82173990 → sub_821746B0 →
|
||
sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 →
|
||
sub_824F7800 → bctrl vtable[1] = sub_825070F0` fires on tid=6.
|
||
3. `sub_825070F0` spawns 4 worker threads with entries
|
||
`0x82506528/58/88/B8` and shared ctx `0xBCE251C0`.
|
||
4. Workers (canary tids 27/28/29) emit signals that unwedge the
|
||
`sub_821CB030` Event waits across the cache-file IO completion
|
||
chain.
|
||
5. Renderer tid=13's body (entered earlier but blocked on a
|
||
tid=14/15 XAudio-coordinated event) unblocks; per-frame
|
||
`VdGetSystemCommandBuffer` / `VdSwap` loop begins.
|
||
|
||
---
|
||
|
||
## Q2 — What is ours's actual progress, and what's the wedge root cause?
|
||
|
||
**Ours stops at the first wait in the activation chain.** Specifically:
|
||
|
||
- **tid=1 (main)** wedged at `sub_82173990+0x2D4` (PC `0x824ac578` =
|
||
`do_wait_single`) on handle `0x12c8` = `Thread(id=13)` — waiting
|
||
for the renderer's thread handle to signal (which happens only when
|
||
tid=13 calls `ExTerminateThread`).
|
||
- **tid=13 (renderer / cache-IO worker)** wedged at
|
||
`sub_821CB030+0x1B0` on handle `0x12d0` = `Event/Auto`, created by
|
||
itself via `NtCreateEvent` at `sub_821CB030+0x128`. `signals=0,
|
||
wakes=0` — `<NO_SIGNALS_DESPITE_WAITS>`.
|
||
- **`sub_825070F0` fires 0×** at any horizon probed.
|
||
|
||
Citation: `phase-w-wedge-reattack/halt-on-deadlock-dump.txt` +
|
||
`phase-w-wedge-reattack/current-state.md`.
|
||
|
||
### Root cause (at one structural level deeper than the wedge symptom)
|
||
|
||
**Per AUDIT-069 Session 5 (the most recent measurement):**
|
||
|
||
- Canary fires 414 `NtReleaseSemaphore` calls on the work-queue
|
||
semaphore in the 90-s window.
|
||
- Ours fires 99 (24%).
|
||
- Breakdown: Worker (382 vs 90), Main (7 vs 8), **Other producers
|
||
(25 vs 1)**.
|
||
|
||
The "**other producers (25 vs 1)**" gap is the load-bearing
|
||
discrepancy. Canary has **24 additional thread sources** releasing
|
||
the work semaphore during bootstrap that ours does not have. These
|
||
correspond to:
|
||
|
||
1. The 4 `sub_825070F0` workers (canary tids 27/28/29 + 1) — absent
|
||
in ours.
|
||
2. XAudio render threads (canary tids 14/15, spawned suspended in
|
||
both engines, **resumed only in canary**).
|
||
3. The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) —
|
||
8 helpers including file-IO and NtWaitForMultipleObjectsEx workers
|
||
— absent in ours.
|
||
|
||
### The ONE structural issue
|
||
|
||
> **Ours never reaches `sub_825070F0` because the activation chain
|
||
> that calls it is downstream of tid=13's wedge; and tid=13's wedge
|
||
> is downstream of the worker cluster activation; and the worker
|
||
> cluster activation is `sub_825070F0`. This is a self-referential
|
||
> lock.**
|
||
|
||
Canary breaks the lock because some part of the bootstrap
|
||
*pre-activates* the producers (probably via XAudio thread resume at
|
||
1.726 s, which then runs ahead, populates the work queue, signals
|
||
events, etc.). Ours never resumes the XAudio threads — they're
|
||
spawned suspended and stay that way.
|
||
|
||
**The single highest-leverage gap is the XAudio thread resume,**
|
||
because (a) it happens early (1.726 s in canary vs. ours's wedge
|
||
which fixes around 1.4 s — i.e. the resume should happen before the
|
||
wedge), (b) it activates the dominant event producers, and (c) AUDIT-069
|
||
S5's "other producers 25 vs 1" finding implicates exactly this class
|
||
of thread.
|
||
|
||
---
|
||
|
||
## Q3 — Shortest-path-to-first-draw roadmap
|
||
|
||
Three to four steps (full detail in `shortest-path-roadmap.md`):
|
||
|
||
- **Step 1 (~80-150 LOC, ours-side)**: add `--force-spawn-workers`
|
||
cvar that crowbars `sub_825070F0` activation by directly spawning
|
||
the 4 worker threads with the right ctx after `VdInitializeRingBuffer`
|
||
returns. Tests "are the workers functionally correct if activated"
|
||
and "does activating them unwedge sub_821CB030."
|
||
- **Step 2 (~0 LOC)**: with Step 1 active, mine the canary jsonl for
|
||
the kernel-call sequence on tid=6 in the wallclock window [9.4 s,
|
||
9.6 s] (the install epoch). Identify what guest call triggers
|
||
`sub_824FD240+0x24`'s POD-copy of the vtable in canary.
|
||
- **Step 3 (~10-500 LOC, depending on what Step 2 finds)**: mirror
|
||
that trigger in ours — likely a missing kernel-import return value
|
||
or a missing post-condition that the trigger inspects.
|
||
- **Step 4 (~0 LOC; remove crowbar)**: re-test ours without
|
||
`--force-spawn-workers`. Verify natural bootstrap reaches
|
||
`sub_825070F0` activation.
|
||
- **Step 5 (~0-50 LOC)**: measure renderer-thread VdSwap rate over 90 s
|
||
wallclock; target ±30% of canary's 12,092 calls.
|
||
|
||
Expected delta:
|
||
|
||
| After step | `swaps` | `draws` | `unique_render_targets` |
|
||
|---|---:|---:|---:|
|
||
| Pre | 1 | 0 | 0 |
|
||
| Step 1 (crowbar) | 2+ | 1+ | 1+ |
|
||
| Step 4 (decrowbar) | 2+ | 1+ | 1+ |
|
||
| Step 5 (parity) | 100+ | 100+ | 1-5 |
|
||
|
||
---
|
||
|
||
## Q4 — What's NOT on the shortest path
|
||
|
||
Explicitly deferred (full rationale in `shortest-path-roadmap.md`):
|
||
|
||
- **Audio (host-audio-* / XAudio implementation)** — even though
|
||
XAudio thread resume MAY be the trigger from Q2, ours's existing
|
||
XAudio shim is sufficient for the workers to bootstrap if they
|
||
receive the right kernel-call sequence. Full XAudio
|
||
implementation is beyond first-draw scope.
|
||
- **HID** — Sylpheed's intro/title screens are auto-advance; no
|
||
input needed.
|
||
- **XAM content / save games** — not on first-draw path.
|
||
- **Scheduler determinism work** (Phase D Stages 0-4 and beyond) —
|
||
null result; the wedge is upstream of contention scheduling.
|
||
Close or indefinitely defer.
|
||
- **Diff-tool canonicalization** (Phase C+N for N > 25) — saturated
|
||
on matched-prefix without progression; halt this work class until
|
||
Step 4 lands and the workload re-baselines.
|
||
- **AUDIT-068 host-side install probes** — superseded by AUDIT-068
|
||
Session 4 finding (writer is GUEST PC, not host). The followup
|
||
question is what *triggers* the guest code path, which Step 2
|
||
addresses through cheaper means.
|
||
|
||
---
|
||
|
||
## Q5 — Methodology assessment
|
||
|
||
**Current methodology relied on matched-prefix as a progression
|
||
proxy. This assumption is now empirically falsified**: +2,960
|
||
events of matched-prefix advancement produced 0 units of progression
|
||
(`swaps=1, draws=0` across 25+ iterates).
|
||
|
||
### Proposed alternative metric
|
||
|
||
**Option 6 (composite `progression_score`)**:
|
||
|
||
```
|
||
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
|
||
+ 0.001 * matched_prefix
|
||
```
|
||
|
||
Continuous gradient; honest about wedge-solving vs. canonicalization
|
||
priority. Requires ~10 LOC to add to `digest.json`.
|
||
|
||
Discipline: tag every iterate as either
|
||
"**canonicalization only — no progression**" or
|
||
"**progression**". Cap at 5 consecutive canonicalization-only
|
||
iterates before mandatory pivot to wedge-attack work.
|
||
|
||
### New reading-error #39
|
||
|
||
> **#39 (matched-prefix as progression proxy)**: matched-prefix
|
||
> measures engine-to-engine divergence point, NOT game-to-game
|
||
> functional gap. When the wedge is on a different thread than the
|
||
> matched-prefix anchor thread, advancing matched-prefix is
|
||
> orthogonal to unwedging. Future audits MUST distinguish "ours's
|
||
> tid-X diverges from canary's tid-Y" from "ours's tid-X is *blocked
|
||
> because tid-Z is wedged*", and target the wedge directly when
|
||
> present.
|
||
|
||
---
|
||
|
||
## Counterintuitive findings (anti-anchoring)
|
||
|
||
Per Tripstones in the task brief:
|
||
|
||
### 1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap.
|
||
|
||
The shared boot-init `VdSwap` fires in both. Ours's `swaps=1` metric
|
||
is "achieved, just at the same point canary also did it". The
|
||
divergence is NOT "ours can't do the first swap"; it's "ours can't do
|
||
the SECOND through Nth swap (the gameplay loop)".
|
||
|
||
### 2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.
|
||
|
||
`canary-jitter-1.jsonl` shows 12,092 VdSwap calls on canary tid=13 in
|
||
90 s wallclock — definitively in the gameplay rendering loop, not
|
||
pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80
|
||
events total before wedging — definitively before gameplay starts.
|
||
The "both engines pre-first-draw" hypothesis is FALSE.
|
||
|
||
### 3. The matched-prefix metric is on the WRONG thread.
|
||
|
||
Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main
|
||
threads. But the wedge is on **tid=13 in both engines** — the
|
||
renderer thread. Tid=1's matched-prefix can advance 105,128 events
|
||
without ever touching the wedge.
|
||
|
||
### 4. The "boot-state-machine" framing is misleading.
|
||
|
||
There's no monolithic boot state machine. There are ~28 threads in
|
||
canary, each running their own lifecycle, communicating via shared
|
||
kernel objects. The bottleneck isn't a state transition; it's a
|
||
THREAD ACTIVATION GAP.
|
||
|
||
### 5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.
|
||
|
||
The vtable install IS interesting but it's downstream of the producer
|
||
gap. Producers must be running to populate the work queue, which
|
||
gets the worker to do its thing, which signals the wedge, which lets
|
||
the activation chain continue, which calls `sub_824FD240+0x24`,
|
||
which writes the vtable. Fixing the vtable install in isolation
|
||
(e.g., via a host-side mem-write hack) doesn't help if no producer
|
||
is feeding work to the workers.
|
||
|
||
---
|
||
|
||
## Cascade prediction confidence
|
||
|
||
- A — canary boot trajectory characterized: **DONE, HIGH** (canary-jitter-1.jsonl provides direct evidence).
|
||
- B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": **DONE, MEDIUM-HIGH** (AUDIT-069 S5 "other producers 25 vs 1" finding).
|
||
- C — shortest-path roadmap with ≤5 steps: **DONE, MEDIUM** (5 steps; Step 1 confidence ~60%).
|
||
- D — alternative metric proposed: **DONE, HIGH** (Option 6 composite, plus reading-error #39).
|
||
|
||
---
|
||
|
||
## Open questions / known unknowns
|
||
|
||
1. **What is the bootstrap trigger for canary's `sub_824FD240+0x24`?**
|
||
Roadmap Step 2 addresses. Could be answered in <1 session of
|
||
canary jsonl analysis.
|
||
2. **Does Step 1's crowbar produce a clean wedge-unblock, or does it
|
||
reveal additional unmodelled state in the ctx object?** Empirical;
|
||
testable in one session.
|
||
3. **Are canary's XAudio threads (tids 14/15) the actual missing
|
||
producer, or are they downstream of the same trigger?** Worth a
|
||
targeted probe before Step 1; ~50 LOC ours-side to log
|
||
NtResumeThread on the XAudio entry PCs.
|
||
4. **Will the AUDIT-067 "vtable install is host-side" finding
|
||
resurface?** No — AUDIT-068 S4 falsified this; the writer is
|
||
GUEST PC `sub_824FD240+0x24`. The "host-side" framing was a
|
||
mis-read of the POD-copy semantics (reading-error #36).
|
||
|
||
---
|
||
|
||
## Recommended next action
|
||
|
||
**Dispatch a "progression iterate" implementing Step 1 of the
|
||
roadmap** (`--force-spawn-workers` crowbar, ~80-150 LOC ours-side).
|
||
This is a high-variance, high-reward iterate; expected outcome is
|
||
either `swaps ≥ 2, draws ≥ 1` (success — wedge structurally
|
||
isolated to thread activation) or an informative failure mode (e.g.,
|
||
worker faults at first vtable bctrl indicating additional state
|
||
needed in ctx object). Time-box: 1 session, max 2h.
|
||
|
||
If Step 1 succeeds in ANY way (even if draws stays 0), the next
|
||
iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl).
|
||
This step has minimal risk and uses existing tooling.
|
||
|
||
If Step 1 fails completely (panic / segfault unrecoverable), revert
|
||
the crowbar and reframe: the wedge may be in ours's kernel-handler
|
||
implementations themselves, not just bootstrap activation. At that
|
||
point a deeper Path β engine investigation is unavoidable.
|
||
|
||
---
|
||
|
||
## Memory hygiene note
|
||
|
||
This review is read-only. xenia-rs HEAD unchanged. canary HEAD
|
||
unchanged. sylpheed.db unchanged. No new artifacts beyond this
|
||
directory.
|
||
|
||
After dispatching Step 1, future memory entries should adopt the
|
||
new `progression_score` + tagging discipline outlined in
|
||
`methodology-assessment.md`.
|