handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
121
audit-runs/review-a-boot-state/canary-boot-trajectory.md
Normal file
121
audit-runs/review-a-boot-state/canary-boot-trajectory.md
Normal file
@@ -0,0 +1,121 @@
|
||||
# Canary boot-to-first-draw trajectory
|
||||
|
||||
**Source data:** `xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl`
|
||||
(4.4 GB, 18.7M events, 90s wallclock, cold run). Profile builder at
|
||||
`xenia-rs/audit-runs/phase-nonmatch-investigation/build_profiles.py`.
|
||||
|
||||
## TL;DR
|
||||
|
||||
- **First boot-time `VdSwap` fires on canary's tid=6 (guest main) at
|
||||
~9.5 s wallclock**, immediately after the rendering subsystem is
|
||||
initialized. This is the *empty / system-command-buffer* swap that
|
||||
ours also reaches (ours's metric `swaps=1` is this swap).
|
||||
- **First gameplay `VdSwap` (intro-movie frame) fires on canary's
|
||||
tid=13 (renderer) starting at ~10.7 s wallclock**, after the
|
||||
`sub_825070F0` worker fan-out at host_ns ≈ 10.382-10.384 s. Canary
|
||||
tid=13 emits **12,092** `VdSwap` + `VdGetSystemCommandBuffer` calls
|
||||
in the 90-s window, i.e. ~150 fps sustained.
|
||||
- The gating event between "boot swap" and "first gameplay swap" is
|
||||
the 4-worker fan-out spawned by `sub_825070F0` at PCs `0x82506528 /
|
||||
0x82506558 / 0x82506588 / 0x825065B8` with ctx `0xBCE251C0`. Three
|
||||
of the four workers begin emitting events at host_ns ≈ 10.705 s
|
||||
(tids 27/28/29 — see `canary-tid-profiles.md` row 33-35).
|
||||
|
||||
## Phase-by-phase trajectory
|
||||
|
||||
| t (host_ns) | Phase | What | Citation |
|
||||
|------:|-------|------|----------|
|
||||
| 0–660 ms | XEX load / startup | `XexLoadImage`, ELF→guest init, kernel-state ctor. Spawn tid=6 ("guest main") at host_ns=660 ms. | `phase-nonmatch-investigation/canary-tid-profiles.md:14` |
|
||||
| 660 ms–1.42 s | **Pre-spawn init** | tid=6 sets up TLS, runs CRT init. Establishes vtables / globals. *Sylpheed-specific*: writes `0x8200A1E8` (vtable for `ANON_Class_713383D7`) at the install-epoch host_ns ≈ 9.4–9.6 s via a 12-byte POD struct copy `{vptr, self, self}` (see `project_audit_068_session3`). **Critical**: this is the vtable whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
|
||||
| 1.42–1.94 s | **Main init burst** | 10 thread spawns (tids 8–17) by tid=6. Ours matches this 1:1. Entries include `0x82181830`, `0x8245A5D0`, `0x82450A28`, `0x82457EF0`, `0x824CD458`, **`0x822F1EE0` (renderer, susp=T)**, `0x824D2878/0x824D2940` (XAudio, susp=T), `0x82178950` (XMA), `0x821748F0` (file IO spawner, susp=T). | `canary-tid-profiles.md:42-55` |
|
||||
| 1.671 s | **Renderer spawn** | tid=6 calls `ExCreateThread` with entry `0x822F1EE0`, ctx `0xBCE24A40`, suspended=True. Becomes canary tid=13. | `canary-tid-profiles.md:21,49` |
|
||||
| 1.726–1.728 s | **XAudio spawn** | tids 14/15 (XAudio voice-mask poll + sister) spawned suspended. Will dominate event volume (~11M events combined). | `canary-tid-profiles.md:50-51` |
|
||||
| 1.94–2.15 s | **Secondary init burst** | 8 more spawns (tids 18–25), file-IO + XAM helpers. **Ours emits 0** here — already wedged. | `result.md:48` |
|
||||
| 9.4–9.6 s | **vtable install epoch** | Host-side POD struct copy installs `0x8200A1E8` at run-specific arena address (`0xBCE25340` or `0xBCE251C0` per arena drift). This is the ANON_Class_713383D7 instance whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
|
||||
| ~9.5 s | **Boot-init `VdSwap` (on tid=6)** | After `VdInitializeEngines + VdShutdownEngines + VdInitializeEngines + VdSetGraphicsInterruptCallback + VdSetSystemCommandBufferGpuIdentifierAddress + VdInitializeRingBuffer + VdEnableRingBufferRPtrWriteBack + VdGetSystemCommandBuffer`, tid=6 emits **one** `VdSwap` to publish the boot framebuffer. draws=0 still (no PM4 draw packets). | Mirror of `ours-postfix.jsonl` idx 105044-105285; canary same shape. |
|
||||
| 10.080 s | tid=26 second-call helper | `0x821748F0` second invocation. | `canary-tid-profiles.md:32` |
|
||||
| **10.383 s** | **sub_825070F0 worker fan-out** | **Four `ExCreateThread` calls in 1 ms** spawn entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8` all sharing ctx `0xBCE251C0` (the ANON_Class instance). These are the workers that consume cache-file IO and signal the wedge event(s) that AUDIT-049 found dangling in ours. | `canary-tid-profiles.md:63-66`, `sub_825070F0.md` |
|
||||
| 10.7 s | **Worker resume / first events** | tids 27, 28, 29 emit their first events. tid=28 dominates (3.26M events) doing file IO (`530× NtReadFile` of `cache:\…`), heavy CS contention (1.07M RtlEnterCS), and signaling the wedge events. | `canary-tid-profiles.md:33-35`, `sub_82452DC0.md` |
|
||||
| ~10.7+ s | **Renderer wakes** | Once `sub_825070F0` workers begin, the events that canary's tid=13 was waiting on get signaled. tid=13 transitions Blocked→Running, starts producing `VdGetSystemCommandBuffer`/`VdSwap` pairs at ~150 fps. | `canary-tid-profiles.md:21`, `result.md:30-39` |
|
||||
| ~10.7–90 s | **Sustained rendering** | tid=13 emits 12,092 `VdSwap` calls. Intro movie ⇒ title screen ⇒ gameplay (depends on user input). In an unattended cold run, canary likely plateaus on the title screen but is genuinely rendering. | `canary-tid-profiles.md:21` |
|
||||
|
||||
## Canary call-chain from entry_point to first gameplay draw
|
||||
|
||||
```
|
||||
canary tid=6 (guest main)
|
||||
entry_point
|
||||
→ sub_8216EA68 (post-init dispatcher)
|
||||
→ sub_822F1AA8 (game-loop dispatcher) (sub_822F1AA8.md)
|
||||
→ bctrl vtable[0]({sub_82175330 → tail → sub_82173990})
|
||||
→ sub_82173990 (sync task-spawn-and-join) (sub_82173990.md)
|
||||
→ bl sub_821746B0 (alloc task + spawn worker thid=17, F8000094)
|
||||
[worker thid=17 runs body sub_821748F0
|
||||
→ sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08
|
||||
→ sub_821CB030 (creates Event, submits work via sub_82452DC0)
|
||||
→ … cache file loads (cache:\aab216c3\..., cache:\87719002\..., etc.)
|
||||
→ spawns child workers via ExCreateThread(...,821C4AD0,...)
|
||||
→ eventually ExTerminateThread(0)]
|
||||
→ KeWaitForSingleObject(thid=17.handle) INFINITE
|
||||
[blocks ~445 log lines wallclock; completes when thid=17 terminates]
|
||||
← returns
|
||||
← returns to sub_822F1AA8 outer loop
|
||||
→ iterates sub_821741C8 → sub_82172BA0 → bctrl vtable[6]
|
||||
→ sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800
|
||||
→ bctrl vtable[1] = sub_825070F0 (sub_825070F0.md)
|
||||
→ 4× ExCreateThread(...,0x82506528/58/88/B8, ctx=0xBCE25xxx, susp=T)
|
||||
→ 4× NtResumeThread / scheduler enables the workers
|
||||
[workers tids 27/28/29/+1 begin executing]
|
||||
→ outer loop continues
|
||||
→ KeWaitForSingleObject (4040×/60 s = ~67 fps frame-pacing wait)
|
||||
→ bctrl vtable[2] → various per-frame work
|
||||
→ tid=6's main loop produces no VdSwap directly past the init swap
|
||||
canary tid=13 (renderer; spawned by tid=6 at 0x822F1EE0)
|
||||
[stays suspended OR Blocked-on-event until worker fan-out at 10.38 s]
|
||||
→ after wake, enters render loop:
|
||||
while (running) {
|
||||
VdGetSystemCommandBuffer(...) ; 12,092× / 90 s
|
||||
… build per-frame command buffer …
|
||||
VdSwap(buffer_ptr, fetch_ptr, …) ; 12,092× / 90 s
|
||||
}
|
||||
```
|
||||
|
||||
## Pre-conditions canary establishes before first gameplay draw
|
||||
|
||||
In time order, all must hold:
|
||||
|
||||
1. **GPU subsystem initialized**: `VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback`. Ours: ✓ (idx 105044-105117).
|
||||
2. **Renderer thread alive**: tid=13 created suspended via `ExCreateThread(entry=0x822F1EE0, susp=T)`. Ours: ✓ (idx 105348).
|
||||
3. **Worker-cluster activation**: 4 workers spawned by `sub_825070F0` consuming `sub_82452DC0` work. Ours: **✗ 0 fires**.
|
||||
4. **`sub_821CB030`'s Event signaled**: the per-load completion event created at `sub_821CB030+0x128` and waited at `+0x1AC` must be signaled by a `sub_825070F0` worker. Ours: **✗ `NO_SIGNALS_DESPITE_WAITS` on handle 0x12d0**.
|
||||
5. **`sub_82173990`'s join-wait completes**: tid=6's wait at `sub_82173990+0x2D0` on the thid=17 thread handle. Ours: **✗ tid=1 stuck on handle 0x12c8 (= tid=13's thread handle)**.
|
||||
6. **Renderer wakes**: per AUDIT-049, the worker-cluster must signal whatever guards tid=13's body. Canary: ✓. Ours: **✗ tid=13 itself wedges in sub_821CB030**.
|
||||
|
||||
## Numerical signature of canary at ~50 s wallclock (for reference)
|
||||
|
||||
- 18.7 M events / 28 tids.
|
||||
- Renderer tid=13: 594 k events, including 12,092 VdSwap.
|
||||
- Worker tid=28 (sub_825070F0 worker 0): 3.26 M events.
|
||||
- XAudio tid=14/15: 6.15 M / 4.78 M events.
|
||||
- ours at 50 M-instr / ~3 s wallclock: 121 k events / 13 tids. Renderer
|
||||
tid=13 in ours: ~80 events (wedged).
|
||||
- The order of magnitude differs by ~150× because ours wedges ~7 s before
|
||||
canary's `sub_825070F0` fan-out fires.
|
||||
|
||||
## Uncertainty / open questions
|
||||
|
||||
- **What is the precise host-side install of the `ANON_Class_713383D7`
|
||||
vtable `0x8200A1E8`?** AUDIT-068 sessions 1–4 localized this to a
|
||||
POD struct copy in the install epoch [9.4 s, 9.6 s], with the writer
|
||||
identified at GUEST PPC `sub_824FD240+0x24` (NOT a host-side kernel
|
||||
import as initially feared). But in ours, `sub_824FD240` and its
|
||||
callers `sub_824F7800/CD0/8398` fire 0× because that chain is
|
||||
downstream of the tid=13 wedge. See `project_audit_068_session4`.
|
||||
- **First "gameplay draw" precisely**: the first VdSwap that emits PM4
|
||||
draw packets (e.g. `PM4_TYPE3 DRAW_INDX`) into the ringbuffer. Need
|
||||
to inspect canary's PM4 ring at host_ns ≈ 10.7 s to confirm. AUDIT
|
||||
history hasn't disambiguated boot/empty-swap from gameplay-swap at
|
||||
the PM4-packet level. This is a methodology gap.
|
||||
- **What unwedges canary's worker-cluster activation chain?** AUDIT-068
|
||||
pinned the install epoch but not the **trigger** — what guest call
|
||||
causes `sub_824FD240+0x24`'s POD-copy to fire? Identifying the
|
||||
trigger and replaying it in ours is the unanswered Path β attack.
|
||||
193
audit-runs/review-a-boot-state/methodology-assessment.md
Normal file
193
audit-runs/review-a-boot-state/methodology-assessment.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Methodology assessment
|
||||
|
||||
## The matched-prefix metric: load-bearing or load-shedding?
|
||||
|
||||
Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
|
||||
Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
|
||||
matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
|
||||
advanced:
|
||||
|
||||
| Phase | Matched-prefix | Δ |
|
||||
|---|---:|---:|
|
||||
| Phase B baseline (pre-C+1) | ~102,168 | — |
|
||||
| Phase D D-extension landing | 104,607 → 105,046 | +439 |
|
||||
| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
|
||||
| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
|
||||
|
||||
| Phase | `swaps` | `draws` | `unique_render_targets` |
|
||||
|---|---:|---:|---:|
|
||||
| Phase B baseline | 1 | 0 | 0 |
|
||||
| Phase W | 1 | 0 | 0 |
|
||||
| Phase C+25 | 1 | 0 | 0 |
|
||||
|
||||
**The two metrics are decoupled.** Matched-prefix is moving along
|
||||
ENGINE-internal divergences (kernel-call return values, thread IDs,
|
||||
heap arena base addresses). The progression metric is gated by
|
||||
boot-state activation, which lives one or more layers above the diff
|
||||
points.
|
||||
|
||||
## Why the decoupling happened
|
||||
|
||||
Three reading-errors compound:
|
||||
|
||||
1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
|
||||
default-scheduling produces different *intra-thread* event ordering
|
||||
than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21,
|
||||
D-extension) correctly hide this jitter — but they hide *real
|
||||
bootstrap-time divergences too*. Phase W explicitly noted: "If
|
||||
ours's worker fails to enqueue something canary's worker awaits,
|
||||
we'd never see the gap because the matched-prefix isn't on the
|
||||
worker tid in the first place."
|
||||
2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
|
||||
process-global dispatchers (e.g., the work-queue semaphore at
|
||||
handle `0xF800003C` in canary). But the wedge handle `0x12d0`
|
||||
uses a per-tid create-site SID that does NOT match across engines.
|
||||
So even when the same logical event exists in both engines, the
|
||||
diff harness reports SID mismatch and absorbs OR diverges
|
||||
incorrectly.
|
||||
3. **#38 (cross-spawn producer paths)**: static reachability (the
|
||||
sylpheed.db `xrefs` table) misses producer paths that cross
|
||||
thread-spawn boundaries. The result.md from Phase Non-match shows
|
||||
canary's tid=14 (XAudio voice-mask poll) communicates with
|
||||
downstream code via a path that has no static `bl` edge — it
|
||||
crosses via guest kernel APIs.
|
||||
|
||||
## Alternative metric proposals
|
||||
|
||||
### Option 1 — `draws ≥ 1` (sharp gate)
|
||||
|
||||
**Pros**: directly measures the target. Boolean. Reproducible.
|
||||
**Cons**: gives no signal during iteration — every iterate before the
|
||||
breakthrough is `draws = 0`. Loss function is non-smooth.
|
||||
|
||||
### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
|
||||
|
||||
**Pros**: still sharp; one bit looser than draws. Distinguishes
|
||||
boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
|
||||
(`swaps≥2`).
|
||||
**Cons**: same non-smooth loss. Achievable in principle by a crowbar
|
||||
without solving the underlying bug.
|
||||
|
||||
### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
|
||||
|
||||
Compute: events emitted on the thread spawned at entry `0x822F1EE0`
|
||||
in any 90-s wallclock window. Canary: 594,000. Ours: ~0.
|
||||
|
||||
**Pros**: smooth-ish (event count can move slowly). Directly measures
|
||||
"is the renderer running." Bypasses the diff-tool jitter problem
|
||||
because it's a per-engine internal count.
|
||||
**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
|
||||
ceiling). Could be gamed by a crowbar that resumes the renderer
|
||||
without unblocking the wedge.
|
||||
|
||||
### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
|
||||
|
||||
Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
|
||||
Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
|
||||
plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids.
|
||||
|
||||
**Pros**: directly measures the AUDIT-057 thread-gap. Smooth metric:
|
||||
each unwedged thread adds 1 to the count.
|
||||
**Cons**: requires 90-s wallclock runs — ours can't reach this
|
||||
without solving the wedge first, so it's pre-requisite-equivalent to
|
||||
Option 3.
|
||||
|
||||
### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
|
||||
|
||||
Compute: how many `NtReleaseSemaphore` calls on the work semaphore
|
||||
(handle `0xF800003C` in canary, equivalent in ours) over 90 s
|
||||
wallclock. Canary: 414. Ours: 99 (24%).
|
||||
|
||||
**Pros**: pinpoints the under-production directly. Mechanically
|
||||
measurable. Already instrumented in canary (audit_70_semaphore_release_watch).
|
||||
**Cons**: same wallclock requirement; same gameability.
|
||||
|
||||
### Option 6 — composite: `progression_score`
|
||||
|
||||
Define:
|
||||
|
||||
```
|
||||
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
|
||||
+ 0.001 * matched_prefix
|
||||
```
|
||||
|
||||
This recovers signal during iteration (matched-prefix moves)
|
||||
without pretending it's progression. The 1000:1 weight ratio
|
||||
matches the bug-class severity.
|
||||
|
||||
**Pros**: continuous gradient over both wedge-solving and
|
||||
canonicalization work. Honest about which is more important.
|
||||
**Cons**: arbitrary weights. Composite metrics drift in meaning.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Adopt Option 6 (composite progression_score) as the primary
|
||||
methodology metric**, with a hard secondary gate of "Option 2
|
||||
(`swaps ≥ 2`) is what matters; everything else is fitness."
|
||||
|
||||
Concrete proposal:
|
||||
|
||||
1. The `digest.json` output gains a `progression_score` field
|
||||
computed from the existing fields (zero new instrumentation).
|
||||
2. Every iterate must report Δprogression_score in its
|
||||
re-validation.md.
|
||||
3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
|
||||
= (small) × Δmatched_prefix) MUST be tagged in their memory entry
|
||||
as "**canonicalization only — no progression**" and counted
|
||||
against a *budget*: max 5 consecutive iterates in this class
|
||||
before mandatory pivot to wedge-attack work.
|
||||
4. Audits that move `swaps` or `draws` (the high-weight terms) are
|
||||
tagged "**progression**" and given priority for resource
|
||||
allocation.
|
||||
|
||||
This methodology change costs ~10 LOC in the digest output and
|
||||
imposes a discipline cap of 5 canonicalization-only audits between
|
||||
progression attempts.
|
||||
|
||||
## Falsification of the matched-prefix-as-proxy belief
|
||||
|
||||
Phase C through C+25 explicitly assumed that matched-prefix is a
|
||||
**proxy** for progression. This assumption is now empirically
|
||||
falsified:
|
||||
|
||||
> +2,960 events of matched-prefix advancement produced exactly
|
||||
> ZERO units of progression.
|
||||
|
||||
Reading-error #39 (newly registered by this review):
|
||||
|
||||
> **#39 (matched-prefix as progression proxy)**: matched-prefix
|
||||
> measures *engine-to-engine divergence point*, not *game-to-game
|
||||
> functional gap*. When the wedge is on a different thread than the
|
||||
> matched-prefix anchor thread, advancing matched-prefix is orthogonal
|
||||
> to unwedging. Future audits MUST distinguish "ours's tid-X main
|
||||
> thread diverges from canary's tid-Y" from "ours's tid-X main thread
|
||||
> is *blocked because tid-Z is wedged*", and target the wedge directly
|
||||
> when present.
|
||||
|
||||
## What "progression discipline" looks like in practice
|
||||
|
||||
For the next 3 iterates:
|
||||
|
||||
- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar). No
|
||||
diff-tool work. Target: `swaps ≥ 2`.
|
||||
- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
|
||||
analysis). No engine LOC. Target: identification of the missing
|
||||
kernel call(s).
|
||||
- Iterate N+3: **Step 3 of roadmap** (mirror the trigger). Target:
|
||||
ours unblocks without the crowbar.
|
||||
|
||||
Each iterate must produce a `progression_score` delta report. If
|
||||
3 iterates in a row produce Δprogression_score ≤ ε (where
|
||||
ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
|
||||
again before continuing — this would mean even the crowbar approach
|
||||
failed and a deeper rethink is needed.
|
||||
|
||||
## Closing note
|
||||
|
||||
The user's instinct in calling this strategic pause and review was
|
||||
correct. The matched-prefix-only chain was producing real
|
||||
canonicalization work but had ceased producing progression. The
|
||||
roadmap above is one principled attempt at breaking the cycle; if it
|
||||
fails, the next-level fallback is to formally accept Sylpheed's
|
||||
boot-state as currently unreachable in ours and pivot to a different
|
||||
title for the methodology demonstration.
|
||||
205
audit-runs/review-a-boot-state/ours-wedge-localization.md
Normal file
205
audit-runs/review-a-boot-state/ours-wedge-localization.md
Normal file
@@ -0,0 +1,205 @@
|
||||
# Ours wedge localization
|
||||
|
||||
**Source data**: `phase-w-wedge-reattack/ours-postfix.jsonl` (50M-instr
|
||||
cold run, ~3 s wallclock, 121,569 events, 13 tids).
|
||||
`phase-w-wedge-reattack/halt-on-deadlock-dump.txt` (per-tid state @
|
||||
deadlock).
|
||||
|
||||
## TL;DR
|
||||
|
||||
Ours's wedge is **structurally identical** to AUDIT-049 (first found
|
||||
2026-05-10). Across 25+ subsequent iterates (Phase C+1 … Phase C+25,
|
||||
Phase D, AUDIT-049 .. AUDIT-069), the wedge has **never moved**:
|
||||
|
||||
- **tid=1 (main)** wedges at `sub_82173990+0x2D4` (PC `0x824ac578`,
|
||||
`do_wait_single`) on **handle `0x12c8`** = `Thread(id=13)` — the
|
||||
renderer thread's join handle.
|
||||
- **tid=13 (renderer / cache-IO worker)** wedges at
|
||||
`sub_821CB030+0x1B0` (PC `0x824ac578`, `do_wait_single`) on
|
||||
**handle `0x12d0`** = `Event/Auto`, created by tid=13 itself at
|
||||
`sub_821CB030+0x128` via `NtCreateEvent`. `<NO_SIGNALS_DESPITE_WAITS>`.
|
||||
- **`sub_825070F0` fires 0×** at any horizon probed (50M, 500M, ∞
|
||||
wallclock). The 4 workers (entries `0x82506528/58/88/B8`) never
|
||||
spawn in ours.
|
||||
|
||||
This is what audits 049/058/059/060/062/063/064/065/066/067/068/069
|
||||
collectively call "the wedge."
|
||||
|
||||
## Graph view: ours's actual reachable subgraph vs canary's
|
||||
|
||||
### What runs in BOTH engines (matched-prefix 105,128)
|
||||
|
||||
```
|
||||
entry_point
|
||||
└─ early CRT init ✓ ours ✓ canary
|
||||
└─ subsystem init ✓
|
||||
├─ VdInitializeEngines (×2, then VdShutdownEngines, then again)
|
||||
├─ VdInitializeRingBuffer
|
||||
├─ VdEnableRingBufferRPtrWriteBack
|
||||
├─ VdSetGraphicsInterruptCallback
|
||||
└─ VdSetSystemCommandBufferGpuIdentifierAddress
|
||||
└─ 10× ExCreateThread (the matched first spawn burst)
|
||||
├─ 0x82181830 / 0x8245A5D0 / 0x82450A28 ✓ ✓
|
||||
├─ 0x82457EF0 (spawned by tid=10 → tid=11) ✓ ✓
|
||||
├─ 0x824CD458 (KeWait worker, susp=F) ✓ ✓
|
||||
├─ 0x822F1EE0 (renderer, susp=T) ✓ ✓
|
||||
├─ 0x824D2878 / 0x824D2940 (XAudio, susp=T) ✓ ✓
|
||||
├─ 0x82178950 (XMA, susp=F) ✓ ✓
|
||||
└─ 0x821748F0 (file IO spawner, susp=T) ✓ ✓
|
||||
└─ 1× boot-init VdSwap ✓ swaps=1
|
||||
└─ tid=1 enters sub_8216EA68 → sub_822F1AA8
|
||||
└─ bctrl vtable[0] of *(0x828E1F08)
|
||||
└─ sub_82175330 → tail → sub_82173990
|
||||
└─ sub_821746B0 → spawn worker (= ours tid=13, susp=F)
|
||||
└─ KeWaitForSingleObject INFINITE on tid=13.handle ← WEDGE
|
||||
```
|
||||
|
||||
### What runs ONLY in canary (the missing subgraph)
|
||||
|
||||
```
|
||||
After tid=6's tid=17 worker (= ours's tid=13) terminates:
|
||||
sub_82173990 returns to sub_822F1AA8's outer loop
|
||||
└─ iterates sub_821741C8 → sub_82172BA0 → vtable[6] = sub_821B55D8
|
||||
→ sub_824F8398 → sub_824F7CD0 → sub_824F7800 → vtable[1] = sub_825070F0
|
||||
└─ 4× ExCreateThread(entry=0x82506528/58/88/B8, susp=T)
|
||||
├─ Worker 0 → tid=28 (file IO, 3.26M events)
|
||||
├─ Worker 1 → tid=27 (36k events)
|
||||
├─ Worker 2 → tid=29 (91k events)
|
||||
└─ Worker 3 (0x825065B8 — never resumed in jitter-1 run)
|
||||
|
||||
After workers come online:
|
||||
Canary's secondary spawn burst (1.94–2.15 s) — 8 helpers (tids 18–25)
|
||||
Canary's tid=14/15 XAudio resumes (~ms after tid=6 spawns them in
|
||||
susp=T; ours also spawns them susp=T but never resumes them)
|
||||
Renderer tid=13 unblocks, starts emitting VdSwap at ~150 fps
|
||||
Per-frame game loop: tid=6 emits `0x822F1BCC` 4040× / 60 s
|
||||
```
|
||||
|
||||
## The wedge dependency graph (cyclic)
|
||||
|
||||
```
|
||||
[tid=1 (main) wedge]
|
||||
│
|
||||
▼
|
||||
wait on handle 0x12c8 (= tid=13.thread_handle)
|
||||
│
|
||||
▼
|
||||
only signaled when tid=13 calls ExTerminateThread
|
||||
│
|
||||
▼
|
||||
tid=13 needs to complete sub_821CB030 body
|
||||
│
|
||||
▼
|
||||
sub_821CB030 waits on event 0x12d0
|
||||
│
|
||||
▼
|
||||
only signaled by sub_825070F0 worker cluster
|
||||
│
|
||||
▼
|
||||
sub_825070F0 never fires in ours
|
||||
│
|
||||
▼
|
||||
sub_825070F0 is reached via:
|
||||
sub_82172BA0 → ... → sub_824F7800 → bctrl vtable[1]
|
||||
↑↑↑ which is downstream of sub_822F1AA8's outer loop
|
||||
which is downstream of sub_82173990 returning
|
||||
which is downstream of tid=1's wait completing
|
||||
← BACK TO TOP
|
||||
```
|
||||
|
||||
This is the **AUDIT-063 self-referential lock**: the activation chain
|
||||
that produces the signal that unwedges the wait is itself downstream
|
||||
of the wait completing. In canary, the lock resolves because the
|
||||
tid=17 worker (= ours tid=13's analog) calls `ExTerminateThread`
|
||||
**by completing** its `sub_821CB030` body — and that completion is
|
||||
fed by some OTHER signal source that ours doesn't replicate.
|
||||
|
||||
## Where the "other signal source" lives (the actual root cause)
|
||||
|
||||
From AUDIT-069 Session 5 (work-semaphore release-rate diff):
|
||||
|
||||
> Canary 414 release events vs ours 99 (24% rate). Worker (tid=10/5):
|
||||
> 382 vs 90. Main (tid=6/1): 7 vs 8. **Other producers: 25 vs 1**.
|
||||
|
||||
The discrepancy in "other producers" (25 producers vs 1) is the key.
|
||||
**Canary has multiple non-worker threads that release the work
|
||||
semaphore during bootstrap — releasing this semaphore is what feeds
|
||||
the worker-side wait that eventually causes sub_821CB030's event to
|
||||
be signaled.** Ours has only one (tid=13 itself, before it wedges).
|
||||
|
||||
From AUDIT-069 Session 4 (`sub_82450A68` dispatch loop):
|
||||
|
||||
> Ours r3=0x1 (semaphore acquired) 91/91 captures (100%); canary
|
||||
> r3=0x102 (TIMEOUT) 3/4 (75%).
|
||||
|
||||
**Ours's work-semaphore has count > 0 every time tid=5 checks; canary's
|
||||
times out 75% of the time.** This is a *paradox at face value*: how
|
||||
can ours have MORE semaphore signals available but still process
|
||||
LESS work? The S5 reframe resolves it: ours's worker self-releases
|
||||
the work semaphore from `sub_82450B68+0xCDC/+0xD28` MORE OFTEN than
|
||||
it consumes, because the consume path early-exits when the dispatch
|
||||
table doesn't have an entry to process — and the dispatch table
|
||||
doesn't have entries because the producers (canary's "other 25 tids")
|
||||
aren't running.
|
||||
|
||||
## Bootstrap divergence (when does ours first diverge from canary?)
|
||||
|
||||
Per the AUDIT-069 H3 framing: somewhere in the *bootstrap* of the
|
||||
worker-cluster, a producer thread that should be alive in canary
|
||||
isn't alive in ours. Candidates:
|
||||
|
||||
1. **XAudio render thread (canary tid=14/15)**: spawned suspended in
|
||||
ours, **never resumed**. Canary resumes within ~1 ms of spawn at
|
||||
1.726 s. Canary's tid=14 calls `XAudioGetVoiceCategoryVolumeChangeMask`
|
||||
26,126× and is one of the top event producers. This thread runs
|
||||
the host-audio bridge feed loop — *if it isn't running, downstream
|
||||
producers expecting audio cues block.*
|
||||
2. **XMA decoder (tid=16, entry `0x82178950`)**: spawned non-suspended
|
||||
in both; ours emits 0 events from this thread because it presumably
|
||||
waits on a kernel object that's never signaled.
|
||||
3. **NtWaitForMultipleObjectsEx worker (canary tid=21, entry
|
||||
`0x824563E0`)**: 1M events in canary; absent in ours (canary's
|
||||
second spawn burst doesn't happen).
|
||||
4. **The "tid=10 helper" (canary tid=10, entry `0x82450A28`)**: ours
|
||||
has this thread (ours tid=5), but it's running the dispatch loop
|
||||
`sub_82450A68` in a degenerate fast-path mode (S4 finding).
|
||||
|
||||
The most defensible single-root claim:
|
||||
|
||||
> **Ours never resumes the XAudio threads (tid=14/15), because the
|
||||
> guest API call that triggers their resume in canary doesn't fire in
|
||||
> ours, and as a knock-on the worker cluster never gets the bootstrap
|
||||
> producer it expects.**
|
||||
|
||||
But this claim is not yet proven; AUDIT-068/069 stopped short of
|
||||
identifying the resume trigger.
|
||||
|
||||
## Verified-but-doesn't-help LOC budget across recent audits
|
||||
|
||||
(For methodology context — every recent audit landed correctness or
|
||||
diagnostic LOC but moved progression 0%.)
|
||||
|
||||
| Audit / Phase | LOC added | Component | Effect on progression |
|
||||
|---|---:|---|---|
|
||||
| AUDIT-067 vptr-mem-watch | +422 (canary) | Mem-watch diagnostic | 0 |
|
||||
| AUDIT-068 S1-S4 | +520 cumul (canary) | Host-side write hooks | 0 (writer identified at guest PC) |
|
||||
| AUDIT-069 S1-S5 | +60 (canary), 0 (ours) | Wait/release watch | 0 (counts diverge, no fix) |
|
||||
| Phase D Stages 0-4 | +450-500 (ours+tools) | Contention manifest | 0 (104,607 cap unbroken) |
|
||||
| Phase D D-extension | +95 (tool) | Nested-CS absorber | +439 matched-prefix only |
|
||||
| Phase C+1 .. C+25 | varies | Allocator/event/thread shims | 0 (matched-prefix only) |
|
||||
| Phase W | +20 (ours) | VdInitializeEngines r3=1 | +66 matched-prefix only |
|
||||
| **Total to break wedge: 0 LOC of any kind** | | | |
|
||||
|
||||
This is the single most striking pattern from the audit chain: **every
|
||||
honest correctness fix advances matched-prefix; none move
|
||||
`draws / swaps / unique_render_targets`.**
|
||||
|
||||
## Falsification budget for the wedge framing
|
||||
|
||||
The wedge framing IS robust (no audit has falsified it since AUDIT-049).
|
||||
But it has limited explanatory power: it tells us *what is blocked*,
|
||||
not *what should unblock it*. Reading-error #38 (cross-spawn producer
|
||||
paths missed by static reachability) and #36 (POD struct copy bypass)
|
||||
both proved that the install / wake mechanism in canary involves paths
|
||||
guest static analysis cannot see. This is a methodology constraint,
|
||||
not an unsolvable problem.
|
||||
333
audit-runs/review-a-boot-state/plan.md
Normal file
333
audit-runs/review-a-boot-state/plan.md
Normal file
@@ -0,0 +1,333 @@
|
||||
# Review A — boot-state review and shortest-path roadmap
|
||||
|
||||
**Session type**: PLAN-only. No engine LOC changes; no canary
|
||||
instrumentation changes. Read-only investigation across the
|
||||
existing audit chain artifacts.
|
||||
**Date**: 2026-05-21
|
||||
**Companion documents** (in this directory):
|
||||
- `canary-boot-trajectory.md` — canary's call chain from entry_point
|
||||
to first gameplay draw, with wallclock timestamps.
|
||||
- `ours-wedge-localization.md` — precise where-ours-stops, in graph
|
||||
terms.
|
||||
- `shortest-path-roadmap.md` — 3-5 step roadmap with expected
|
||||
progression delta per step.
|
||||
- `methodology-assessment.md` — alternative metric proposal.
|
||||
|
||||
This `plan.md` summarizes the five framing questions with answers
|
||||
backed by file:line citations.
|
||||
|
||||
---
|
||||
|
||||
## Q1 — What is "first draw" in canary's Sylpheed boot?
|
||||
|
||||
**Two distinct "draws" must be disambiguated.**
|
||||
|
||||
### Q1.a: First boot-init `VdSwap` (the swap=1 event)
|
||||
|
||||
Canary's tid=6 (guest main) emits **one** `VdSwap` at ~9.5 s
|
||||
wallclock, immediately after the GPU subsystem init sequence
|
||||
`VdInitializeEngines → VdInitializeRingBuffer →
|
||||
VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback →
|
||||
VdSetSystemCommandBufferGpuIdentifierAddress → VdGetSystemCommandBuffer`.
|
||||
This swap publishes the boot framebuffer and contains no draw packets.
|
||||
|
||||
**Ours also reaches this swap** — visible in
|
||||
`phase-w-wedge-reattack/ours-postfix.jsonl` at idx 105283 (host_ns
|
||||
496,276,229). This is what produces ours's `swaps=1` metric.
|
||||
|
||||
Both engines reach this point. **It is NOT the gate.**
|
||||
|
||||
### Q1.b: First gameplay `VdSwap` (the swap≥2 / draws≥1 event)
|
||||
|
||||
Canary's renderer tid=13 (entry `0x822F1EE0`, spawned suspended at
|
||||
1.671 s) wakes after the `sub_825070F0` worker fan-out at host_ns
|
||||
≈ 10.383 s and begins emitting `VdGetSystemCommandBuffer` /
|
||||
`VdSwap` pairs at ~150 fps. Canary's tid=13 emits **12,092
|
||||
VdSwap calls in the 90-s window** (per
|
||||
`phase-nonmatch-investigation/canary-tid-profiles.md:21`).
|
||||
|
||||
The first of these is the **first gameplay draw**, fired at ~10.7 s
|
||||
wallclock — about 1.2 s after the `sub_825070F0` fan-out triggers
|
||||
the worker cluster.
|
||||
|
||||
**Pre-conditions canary establishes before this point** (per
|
||||
`canary-boot-trajectory.md`):
|
||||
|
||||
1. Vtable `0x8200A1E8` of `ANON_Class_713383D7` installed at host_ns
|
||||
≈ 9.4-9.6 s via POD-copy at GUEST PC `sub_824FD240+0x24`
|
||||
(per `project_audit_068_session4_2026_05_20`).
|
||||
2. Activation chain `sub_822F1AA8 → sub_82173990 → sub_821746B0 →
|
||||
sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 →
|
||||
sub_824F7800 → bctrl vtable[1] = sub_825070F0` fires on tid=6.
|
||||
3. `sub_825070F0` spawns 4 worker threads with entries
|
||||
`0x82506528/58/88/B8` and shared ctx `0xBCE251C0`.
|
||||
4. Workers (canary tids 27/28/29) emit signals that unwedge the
|
||||
`sub_821CB030` Event waits across the cache-file IO completion
|
||||
chain.
|
||||
5. Renderer tid=13's body (entered earlier but blocked on a
|
||||
tid=14/15 XAudio-coordinated event) unblocks; per-frame
|
||||
`VdGetSystemCommandBuffer` / `VdSwap` loop begins.
|
||||
|
||||
---
|
||||
|
||||
## Q2 — What is ours's actual progress, and what's the wedge root cause?
|
||||
|
||||
**Ours stops at the first wait in the activation chain.** Specifically:
|
||||
|
||||
- **tid=1 (main)** wedged at `sub_82173990+0x2D4` (PC `0x824ac578` =
|
||||
`do_wait_single`) on handle `0x12c8` = `Thread(id=13)` — waiting
|
||||
for the renderer's thread handle to signal (which happens only when
|
||||
tid=13 calls `ExTerminateThread`).
|
||||
- **tid=13 (renderer / cache-IO worker)** wedged at
|
||||
`sub_821CB030+0x1B0` on handle `0x12d0` = `Event/Auto`, created by
|
||||
itself via `NtCreateEvent` at `sub_821CB030+0x128`. `signals=0,
|
||||
wakes=0` — `<NO_SIGNALS_DESPITE_WAITS>`.
|
||||
- **`sub_825070F0` fires 0×** at any horizon probed.
|
||||
|
||||
Citation: `phase-w-wedge-reattack/halt-on-deadlock-dump.txt` +
|
||||
`phase-w-wedge-reattack/current-state.md`.
|
||||
|
||||
### Root cause (at one structural level deeper than the wedge symptom)
|
||||
|
||||
**Per AUDIT-069 Session 5 (the most recent measurement):**
|
||||
|
||||
- Canary fires 414 `NtReleaseSemaphore` calls on the work-queue
|
||||
semaphore in the 90-s window.
|
||||
- Ours fires 99 (24%).
|
||||
- Breakdown: Worker (382 vs 90), Main (7 vs 8), **Other producers
|
||||
(25 vs 1)**.
|
||||
|
||||
The "**other producers (25 vs 1)**" gap is the load-bearing
|
||||
discrepancy. Canary has **24 additional thread sources** releasing
|
||||
the work semaphore during bootstrap that ours does not have. These
|
||||
correspond to:
|
||||
|
||||
1. The 4 `sub_825070F0` workers (canary tids 27/28/29 + 1) — absent
|
||||
in ours.
|
||||
2. XAudio render threads (canary tids 14/15, spawned suspended in
|
||||
both engines, **resumed only in canary**).
|
||||
3. The secondary spawn burst at 1.94-2.15 s (canary tids 18-25) —
|
||||
8 helpers including file-IO and NtWaitForMultipleObjectsEx workers
|
||||
— absent in ours.
|
||||
|
||||
### The ONE structural issue
|
||||
|
||||
> **Ours never reaches `sub_825070F0` because the activation chain
|
||||
> that calls it is downstream of tid=13's wedge; and tid=13's wedge
|
||||
> is downstream of the worker cluster activation; and the worker
|
||||
> cluster activation is `sub_825070F0`. This is a self-referential
|
||||
> lock.**
|
||||
|
||||
Canary breaks the lock because some part of the bootstrap
|
||||
*pre-activates* the producers (probably via XAudio thread resume at
|
||||
1.726 s, which then runs ahead, populates the work queue, signals
|
||||
events, etc.). Ours never resumes the XAudio threads — they're
|
||||
spawned suspended and stay that way.
|
||||
|
||||
**The single highest-leverage gap is the XAudio thread resume,**
|
||||
because (a) it happens early (1.726 s in canary vs. ours's wedge
|
||||
which fixes around 1.4 s — i.e. the resume should happen before the
|
||||
wedge), (b) it activates the dominant event producers, and (c) AUDIT-069
|
||||
S5's "other producers 25 vs 1" finding implicates exactly this class
|
||||
of thread.
|
||||
|
||||
---
|
||||
|
||||
## Q3 — Shortest-path-to-first-draw roadmap
|
||||
|
||||
Three to four steps (full detail in `shortest-path-roadmap.md`):
|
||||
|
||||
- **Step 1 (~80-150 LOC, ours-side)**: add `--force-spawn-workers`
|
||||
cvar that crowbars `sub_825070F0` activation by directly spawning
|
||||
the 4 worker threads with the right ctx after `VdInitializeRingBuffer`
|
||||
returns. Tests "are the workers functionally correct if activated"
|
||||
and "does activating them unwedge sub_821CB030."
|
||||
- **Step 2 (~0 LOC)**: with Step 1 active, mine the canary jsonl for
|
||||
the kernel-call sequence on tid=6 in the wallclock window [9.4 s,
|
||||
9.6 s] (the install epoch). Identify what guest call triggers
|
||||
`sub_824FD240+0x24`'s POD-copy of the vtable in canary.
|
||||
- **Step 3 (~10-500 LOC, depending on what Step 2 finds)**: mirror
|
||||
that trigger in ours — likely a missing kernel-import return value
|
||||
or a missing post-condition that the trigger inspects.
|
||||
- **Step 4 (~0 LOC; remove crowbar)**: re-test ours without
|
||||
`--force-spawn-workers`. Verify natural bootstrap reaches
|
||||
`sub_825070F0` activation.
|
||||
- **Step 5 (~0-50 LOC)**: measure renderer-thread VdSwap rate over 90 s
|
||||
wallclock; target ±30% of canary's 12,092 calls.
|
||||
|
||||
Expected delta:
|
||||
|
||||
| After step | `swaps` | `draws` | `unique_render_targets` |
|
||||
|---|---:|---:|---:|
|
||||
| Pre | 1 | 0 | 0 |
|
||||
| Step 1 (crowbar) | 2+ | 1+ | 1+ |
|
||||
| Step 4 (decrowbar) | 2+ | 1+ | 1+ |
|
||||
| Step 5 (parity) | 100+ | 100+ | 1-5 |
|
||||
|
||||
---
|
||||
|
||||
## Q4 — What's NOT on the shortest path
|
||||
|
||||
Explicitly deferred (full rationale in `shortest-path-roadmap.md`):
|
||||
|
||||
- **Audio (host-audio-* / XAudio implementation)** — even though
|
||||
XAudio thread resume MAY be the trigger from Q2, ours's existing
|
||||
XAudio shim is sufficient for the workers to bootstrap if they
|
||||
receive the right kernel-call sequence. Full XAudio
|
||||
implementation is beyond first-draw scope.
|
||||
- **HID** — Sylpheed's intro/title screens are auto-advance; no
|
||||
input needed.
|
||||
- **XAM content / save games** — not on first-draw path.
|
||||
- **Scheduler determinism work** (Phase D Stages 0-4 and beyond) —
|
||||
null result; the wedge is upstream of contention scheduling.
|
||||
Close or indefinitely defer.
|
||||
- **Diff-tool canonicalization** (Phase C+N for N > 25) — saturated
|
||||
on matched-prefix without progression; halt this work class until
|
||||
Step 4 lands and the workload re-baselines.
|
||||
- **AUDIT-068 host-side install probes** — superseded by AUDIT-068
|
||||
Session 4 finding (writer is GUEST PC, not host). The followup
|
||||
question is what *triggers* the guest code path, which Step 2
|
||||
addresses through cheaper means.
|
||||
|
||||
---
|
||||
|
||||
## Q5 — Methodology assessment
|
||||
|
||||
**Current methodology relied on matched-prefix as a progression
|
||||
proxy. This assumption is now empirically falsified**: +2,960
|
||||
events of matched-prefix advancement produced 0 units of progression
|
||||
(`swaps=1, draws=0` across 25+ iterates).
|
||||
|
||||
### Proposed alternative metric
|
||||
|
||||
**Option 6 (composite `progression_score`)**:
|
||||
|
||||
```
|
||||
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
|
||||
+ 0.001 * matched_prefix
|
||||
```
|
||||
|
||||
Continuous gradient; honest about wedge-solving vs. canonicalization
|
||||
priority. Requires ~10 LOC to add to `digest.json`.
|
||||
|
||||
Discipline: tag every iterate as either
|
||||
"**canonicalization only — no progression**" or
|
||||
"**progression**". Cap at 5 consecutive canonicalization-only
|
||||
iterates before mandatory pivot to wedge-attack work.
|
||||
|
||||
### New reading-error #39
|
||||
|
||||
> **#39 (matched-prefix as progression proxy)**: matched-prefix
|
||||
> measures engine-to-engine divergence point, NOT game-to-game
|
||||
> functional gap. When the wedge is on a different thread than the
|
||||
> matched-prefix anchor thread, advancing matched-prefix is
|
||||
> orthogonal to unwedging. Future audits MUST distinguish "ours's
|
||||
> tid-X diverges from canary's tid-Y" from "ours's tid-X is *blocked
|
||||
> because tid-Z is wedged*", and target the wedge directly when
|
||||
> present.
|
||||
|
||||
---
|
||||
|
||||
## Counterintuitive findings (anti-anchoring)
|
||||
|
||||
Per Tripstones in the task brief:
|
||||
|
||||
### 1. Both engines reach `swaps=1`; ours is NOT behind on the boot swap.
|
||||
|
||||
The shared boot-init `VdSwap` fires in both. Ours's `swaps=1` metric
|
||||
is "achieved, just at the same point canary also did it". The
|
||||
divergence is NOT "ours can't do the first swap"; it's "ours can't do
|
||||
the SECOND through Nth swap (the gameplay loop)".
|
||||
|
||||
### 2. Tripstone 4 verified: canary does reach gameplay draws, ours does not.
|
||||
|
||||
`canary-jitter-1.jsonl` shows 12,092 VdSwap calls on canary tid=13 in
|
||||
90 s wallclock — definitively in the gameplay rendering loop, not
|
||||
pre-first-draw. Ours's tid analogous to canary tid=13 emits ~80
|
||||
events total before wedging — definitively before gameplay starts.
|
||||
The "both engines pre-first-draw" hypothesis is FALSE.
|
||||
|
||||
### 3. The matched-prefix metric is on the WRONG thread.
|
||||
|
||||
Matched-prefix tracks tid=6 (canary) vs tid=1 (ours), the main
|
||||
threads. But the wedge is on **tid=13 in both engines** — the
|
||||
renderer thread. Tid=1's matched-prefix can advance 105,128 events
|
||||
without ever touching the wedge.
|
||||
|
||||
### 4. The "boot-state-machine" framing is misleading.
|
||||
|
||||
There's no monolithic boot state machine. There are ~28 threads in
|
||||
canary, each running their own lifecycle, communicating via shared
|
||||
kernel objects. The bottleneck isn't a state transition; it's a
|
||||
THREAD ACTIVATION GAP.
|
||||
|
||||
### 5. AUDIT-069 Session 5's "other producers 25 vs 1" is the key forensic discovery, more than AUDIT-068's vtable install epoch.
|
||||
|
||||
The vtable install IS interesting but it's downstream of the producer
|
||||
gap. Producers must be running to populate the work queue, which
|
||||
gets the worker to do its thing, which signals the wedge, which lets
|
||||
the activation chain continue, which calls `sub_824FD240+0x24`,
|
||||
which writes the vtable. Fixing the vtable install in isolation
|
||||
(e.g., via a host-side mem-write hack) doesn't help if no producer
|
||||
is feeding work to the workers.
|
||||
|
||||
---
|
||||
|
||||
## Cascade prediction confidence
|
||||
|
||||
- A — canary boot trajectory characterized: **DONE, HIGH** (canary-jitter-1.jsonl provides direct evidence).
|
||||
- B — ours's wedge root-cause localized deeper than "sub_821CB030 waits": **DONE, MEDIUM-HIGH** (AUDIT-069 S5 "other producers 25 vs 1" finding).
|
||||
- C — shortest-path roadmap with ≤5 steps: **DONE, MEDIUM** (5 steps; Step 1 confidence ~60%).
|
||||
- D — alternative metric proposed: **DONE, HIGH** (Option 6 composite, plus reading-error #39).
|
||||
|
||||
---
|
||||
|
||||
## Open questions / known unknowns
|
||||
|
||||
1. **What is the bootstrap trigger for canary's `sub_824FD240+0x24`?**
|
||||
Roadmap Step 2 addresses. Could be answered in <1 session of
|
||||
canary jsonl analysis.
|
||||
2. **Does Step 1's crowbar produce a clean wedge-unblock, or does it
|
||||
reveal additional unmodelled state in the ctx object?** Empirical;
|
||||
testable in one session.
|
||||
3. **Are canary's XAudio threads (tids 14/15) the actual missing
|
||||
producer, or are they downstream of the same trigger?** Worth a
|
||||
targeted probe before Step 1; ~50 LOC ours-side to log
|
||||
NtResumeThread on the XAudio entry PCs.
|
||||
4. **Will the AUDIT-067 "vtable install is host-side" finding
|
||||
resurface?** No — AUDIT-068 S4 falsified this; the writer is
|
||||
GUEST PC `sub_824FD240+0x24`. The "host-side" framing was a
|
||||
mis-read of the POD-copy semantics (reading-error #36).
|
||||
|
||||
---
|
||||
|
||||
## Recommended next action
|
||||
|
||||
**Dispatch a "progression iterate" implementing Step 1 of the
|
||||
roadmap** (`--force-spawn-workers` crowbar, ~80-150 LOC ours-side).
|
||||
This is a high-variance, high-reward iterate; expected outcome is
|
||||
either `swaps ≥ 2, draws ≥ 1` (success — wedge structurally
|
||||
isolated to thread activation) or an informative failure mode (e.g.,
|
||||
worker faults at first vtable bctrl indicating additional state
|
||||
needed in ctx object). Time-box: 1 session, max 2h.
|
||||
|
||||
If Step 1 succeeds in ANY way (even if draws stays 0), the next
|
||||
iterate is Step 2 (kernel-call sequence mining in canary-jitter-1.jsonl).
|
||||
This step has minimal risk and uses existing tooling.
|
||||
|
||||
If Step 1 fails completely (panic / segfault unrecoverable), revert
|
||||
the crowbar and reframe: the wedge may be in ours's kernel-handler
|
||||
implementations themselves, not just bootstrap activation. At that
|
||||
point a deeper Path β engine investigation is unavoidable.
|
||||
|
||||
---
|
||||
|
||||
## Memory hygiene note
|
||||
|
||||
This review is read-only. xenia-rs HEAD unchanged. canary HEAD
|
||||
unchanged. sylpheed.db unchanged. No new artifacts beyond this
|
||||
directory.
|
||||
|
||||
After dispatching Step 1, future memory entries should adopt the
|
||||
new `progression_score` + tagging discipline outlined in
|
||||
`methodology-assessment.md`.
|
||||
253
audit-runs/review-a-boot-state/shortest-path-roadmap.md
Normal file
253
audit-runs/review-a-boot-state/shortest-path-roadmap.md
Normal file
@@ -0,0 +1,253 @@
|
||||
# Shortest-path-to-first-gameplay-draw roadmap
|
||||
|
||||
**Date**: 2026-05-21
|
||||
**Read-only investigation; no LOC changes proposed.**
|
||||
**Premise**: 25+ iterates have advanced matched-prefix 102,168 →
|
||||
105,128 (+2,960 events) but `draws=0, swaps=1, render_targets=0`
|
||||
have not moved. This roadmap proposes a non-canonicalization path
|
||||
forward.
|
||||
|
||||
## Definitions
|
||||
|
||||
- **First gameplay draw** = the first `VdSwap` call by ours's
|
||||
renderer (the thread spawned at entry `0x822F1EE0`, ours's tid
|
||||
analog of canary tid=13) that emits at least one `PM4_TYPE3
|
||||
DRAW_INDX` packet into the ringbuffer.
|
||||
- **Observable success criterion**: `draws ≥ 1, swaps ≥ 2,
|
||||
unique_render_targets ≥ 1` in `xenia-rs check --stable-digest`
|
||||
output. At least one frame from the **renderer thread** (not the
|
||||
boot-init swap that ours already emits).
|
||||
|
||||
## Why current iteration has stalled
|
||||
|
||||
The wedge has been mapped and remapped 20+ times. Every audit
|
||||
correctly identifies symptoms; every fix correctly canonicalizes a
|
||||
diff-tool divergence. But the wedge is **structurally cyclic**: the
|
||||
worker cluster that signals the wait is downstream of the wait
|
||||
completing. Standard "find the divergent kernel call, mirror canary's
|
||||
semantics" has saturated.
|
||||
|
||||
Two strategies remain that have NOT been tried at full scope:
|
||||
|
||||
1. **(A) Decouple the cycle by faking the worker activation**:
|
||||
directly call `sub_825070F0` from a host shim, or directly spawn
|
||||
the 4 worker threads with the right ctx, sidestepping the
|
||||
activation chain. This is a *crowbar*: it doesn't fix the
|
||||
underlying bootstrap bug, but it tests "are the workers
|
||||
functionally correct IF activated." If they signal the wedge and
|
||||
ours then reaches first draw, we know the bug is *exclusively* in
|
||||
the activation gate, and we can attack just that.
|
||||
|
||||
2. **(B) Find what triggers `sub_824FD240+0x24`'s POD-copy in canary**.
|
||||
AUDIT-068 Session 4 pinned the install epoch of vtable
|
||||
`0x8200A1E8` to this writer site. But the *caller* of
|
||||
`sub_824FD240` — what guest call leads to it firing — is
|
||||
unidentified. In ours, `sub_824FD240` fires 0× because the call
|
||||
chain `sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240`
|
||||
is downstream of the tid=13 wedge. So we have circular reasoning
|
||||
again — UNLESS Strategy A is applied first.
|
||||
|
||||
The roadmap below uses Strategy A as a wedge-crowbar and Strategy B
|
||||
as the principled fix that follows.
|
||||
|
||||
## Roadmap
|
||||
|
||||
### Step 1 — Crowbar: force-spawn the `sub_825070F0` workers (~80–150 LOC)
|
||||
|
||||
**Action**: in `xenia-rs` add a debug-only cvar
|
||||
`--force-spawn-workers` that, when set, after some bootstrap
|
||||
checkpoint (e.g., first `VdInitializeRingBuffer` return), manually
|
||||
spawns 4 ExCreateThread-equivalent guest threads with:
|
||||
|
||||
- entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8`
|
||||
- ctx_ptr = run-determined; allocate a fresh
|
||||
`ANON_Class_713383D7`-shaped object on the unified heap and write
|
||||
vtable `0x8200A1E8` to slot 0 (mirror the POD-copy at
|
||||
`sub_824FD240+0x24`)
|
||||
- stack_size 65536, suspended=True initially, then NtResumeThread
|
||||
|
||||
**Expected effect**:
|
||||
|
||||
- If the workers run correctly and signal the wedge: ours's tid=13
|
||||
unblocks, tid=1's join completes, normal game-loop begins.
|
||||
`draws ≥ 1, swaps ≥ 2`.
|
||||
- If the workers fail (e.g., faulting because the ctx object's other
|
||||
fields aren't initialized): we learn what *else* needs to be
|
||||
installed alongside the vtable.
|
||||
|
||||
**Failure modes to expect**:
|
||||
|
||||
- The worker entries dispatch via vtable slots 35/36/37/38 of the
|
||||
ANON_Class — those slots also need to be populated. Audit-067
|
||||
static analysis shows the vtable has 7 entries; the worker entries
|
||||
use offsets 140/144/148/152 (= slots 35/36/37/38 of a wider vtable)
|
||||
per `sub_825070F0.md` line 32-37. So we'll need a parent class /
|
||||
derived class layout.
|
||||
- The ctx object also has refcount/header fields that must be
|
||||
initialized — see AUDIT-068 Session 3 finding of 12-byte struct
|
||||
copy `{vptr, self, self}` followed by refcount=1.
|
||||
|
||||
**LOC budget**: 80-150 LOC ours-side; 0 LOC canary.
|
||||
**Read-only fallback**: if force-spawn fails immediately, we've still
|
||||
captured the failure mode, which is informative.
|
||||
**Risk**: high — this is structurally a hack. Acceptable as a
|
||||
diagnostic.
|
||||
|
||||
### Step 2 — Identify what triggers `sub_824FD240+0x24` in canary (~0 LOC)
|
||||
|
||||
**Action**: with Step 1's crowbar enabled, ours reaches the
|
||||
post-wedge code path. Compare ours and canary on what `import.call`
|
||||
(kernel API) sequence the **caller** of `sub_824FD240` makes
|
||||
immediately before the POD-copy install.
|
||||
|
||||
The caller chain (per AUDIT-064/068) is:
|
||||
|
||||
```
|
||||
sub_824F8398 → sub_824F7CD0 → sub_824F7800 → [bl at +0x38 = sub_824FD240] / [bctrl at +0x320 = sub_825070F0]
|
||||
```
|
||||
|
||||
So `sub_824F7800` calls `sub_824FD240` at offset `+0x38`, BEFORE it
|
||||
calls `sub_825070F0` at offset `+0x320`.
|
||||
|
||||
Question: what does `sub_824F8398`'s caller (one level up,
|
||||
`sub_821B55D8`) pass as arguments, and what kernel APIs run in
|
||||
between? We need to trace tid=6's events in canary in the wallclock
|
||||
window [9.4 s, 9.6 s] — the install epoch.
|
||||
|
||||
**LOC budget**: 0. Pure event-stream analysis on captured canary
|
||||
jsonl (we already have `canary-jitter-1.jsonl`, 18.7M events).
|
||||
**Output**: an ordered list of kernel calls just before
|
||||
`sub_824FD240+0x24` fires. If any are missing in ours, that's a
|
||||
candidate gap.
|
||||
|
||||
### Step 3 — Mirror the trigger in ours (variable LOC)
|
||||
|
||||
Once Step 2 names the missing kernel call(s), implement them in ours
|
||||
following Phase C cadence (verify per-call return values match canary;
|
||||
add diff-tool tests; document in memory).
|
||||
|
||||
**LOC budget**: depends on what's missing. Could be 10–500 LOC.
|
||||
|
||||
### Step 4 — Remove the crowbar; verify natural bootstrap (~0 LOC)
|
||||
|
||||
With Step 3's fix in place, remove `--force-spawn-workers`. Re-run
|
||||
ours. If the natural bootstrap chain runs and `draws ≥ 1, swaps ≥ 2`,
|
||||
we've fixed the bug.
|
||||
|
||||
If progression still fails without the crowbar, there's another gap;
|
||||
re-enter at Step 2 with a refined trigger search.
|
||||
|
||||
### Step 5 — Validate gameplay frame parity (~0–50 LOC)
|
||||
|
||||
Capture renderer-thread VdSwap counts at 90 s wallclock in both
|
||||
engines. Target: ours's renderer emits within ±30% of canary's
|
||||
12,092 VdSwap/90s. If yes: first-draw is reached and sustained.
|
||||
|
||||
If ours's renderer emits but at a much lower rate, that's a follow-up
|
||||
performance issue, not a correctness one. Defer.
|
||||
|
||||
## Expected progression per step
|
||||
|
||||
| Step | Expected `swaps` | Expected `draws` | Expected `unique_render_targets` | LOC delta |
|
||||
|---|---:|---:|---:|---:|
|
||||
| Pre-roadmap | 1 | 0 | 0 | — |
|
||||
| Step 1 (crowbar) | 2-N | 1-N | 1+ | ~150 |
|
||||
| Step 2 (trigger ID) | (unchanged) | (unchanged) | (unchanged) | 0 |
|
||||
| Step 3 (mirror) | 2-N | 1-N | 1+ | 10-500 |
|
||||
| Step 4 (decrowbar) | 2-N | 1-N | 1+ | -150 (remove) |
|
||||
| Step 5 (parity) | 100+ | 100+ | 1-5 | 0-50 |
|
||||
|
||||
## What's NOT on this path (explicitly deferred)
|
||||
|
||||
1. **Host-audio bridge / XAudio resume**: the XAudio thread tids 14/15
|
||||
spawning suspended-and-never-resumed in ours is real but parallel
|
||||
to the worker-cluster wedge. In canary, both threads run; in ours,
|
||||
neither runs. Pursuing XAudio fixes does not address the
|
||||
graphics-blocking wedge. Defer to a separate
|
||||
"post-first-draw" audit cluster.
|
||||
2. **HID / controller**: Sylpheed's intro movie / title screen play
|
||||
without user input. HID is irrelevant for first-draw.
|
||||
3. **XAM content / save games**: irrelevant for first-draw; the
|
||||
intro/title screens don't require save-game enumeration.
|
||||
4. **Scheduler determinism** (per `scheduler_determinism_plan` /
|
||||
Phase D Stages 0-4): null result, off-path. The wedge is upstream
|
||||
of any contention. Defer indefinitely or close.
|
||||
5. **Diff-tool canonicalization** (Phase C-style fixes): saturated on
|
||||
moving matched-prefix without moving progression. **Halt** further
|
||||
work in this class until Step 4 lands and re-baselines the diff
|
||||
workload.
|
||||
6. **AUDIT-068 host-side install probes**: superseded by AUDIT-068
|
||||
Session 4 (writer identified at GUEST PC `sub_824FD240+0x24`).
|
||||
The remaining question is *what triggers* `sub_824FD240`, which
|
||||
Step 2 addresses.
|
||||
|
||||
## Alternative path (rejected)
|
||||
|
||||
**Skip the crowbar; do the trigger investigation cold.** Read canary
|
||||
source for `sub_824FD240` callers, walk upward, identify the trigger.
|
||||
Why rejected: `sub_824FD240` is GAME code, not canary engine code —
|
||||
the file we'd "read" is the disassembly of the XEX. We'd need to
|
||||
disassemble Sylpheed's RE'd PE and trace the call graph by hand. Per
|
||||
sylpheed.db, `sub_824FD240`'s static caller is `sub_824F7800+0x38`
|
||||
(in line with AUDIT-064). But what guest *call* causes `sub_824F7800`
|
||||
to be invoked is itself a multi-fn upstream investigation that
|
||||
returns to the same wedge cycle. The crowbar bypasses this paradox.
|
||||
|
||||
## Risk assessment
|
||||
|
||||
- **Step 1 catastrophic failure**: ours's emulator panics or
|
||||
segfaults when the force-spawn workers run. Mitigation: gate
|
||||
behind `--debug-only` cvar; ensure ours's CPU executes the worker
|
||||
entries in normal sandboxed PPC JIT; if they fault on missing
|
||||
guest state, log and exit cleanly.
|
||||
- **Step 1 "succeeds but draws=0 anyway"**: the workers run but
|
||||
ours's tid=13 still doesn't unblock — there's an unmodelled state
|
||||
beyond just the missing thread spawns. Mitigation: log every event
|
||||
the new workers emit; compare with canary's tid=27/28/29 streams in
|
||||
`canary-jitter-1.jsonl`.
|
||||
- **Step 3 LOC explosion**: the trigger turns out to be a large
|
||||
subsystem (XAM content, XCONFIG, etc.). Mitigation: scope-cut to
|
||||
a stub that returns "canary-equivalent" values without full
|
||||
implementation.
|
||||
|
||||
## Confidence levels
|
||||
|
||||
- Step 1 unblocks the wedge if executed correctly: **MEDIUM** (60%).
|
||||
Honest assessment: 25 prior audits have not unblocked it through
|
||||
natural fixes, so the crowbar approach is novel and the failure
|
||||
mode may not match expectations.
|
||||
- Step 2 identifies a trigger in ≤1 session: **HIGH** (85%) — the
|
||||
canary jsonl already has the data; analysis is mechanical.
|
||||
- Step 3 LOC budget ≤500: **MEDIUM** (50%) — depends entirely on Step
|
||||
2's answer.
|
||||
- Step 4 natural bootstrap works post-Step-3: **MEDIUM** (50%) —
|
||||
there may be additional gaps the crowbar masked.
|
||||
|
||||
## Memory hygiene
|
||||
|
||||
After Step 1 lands (crowbar binary in place), check that
|
||||
`xenia-rs/target/release/xenia-rs` builds cleanly with the new cvar.
|
||||
Verify Phase B `image_canonical_sha256` is updated (the crowbar
|
||||
changes engine LOC); document the new baseline. Confirm 3× cold
|
||||
runs produce identical digests with the crowbar enabled.
|
||||
|
||||
## What "winning" looks like
|
||||
|
||||
`xenia-rs check --stable-digest -n 50000000` (or higher cap, e.g.
|
||||
`-n 500000000` to reach 30 s wallclock) outputs:
|
||||
|
||||
```json
|
||||
{
|
||||
"instructions": 50000007,
|
||||
"imports": 40390+,
|
||||
"draws": >= 1,
|
||||
"swaps": >= 2,
|
||||
"unique_render_targets": >= 1,
|
||||
"shader_blobs_live": >= 1,
|
||||
"texture_cache_entries": >= 1
|
||||
}
|
||||
```
|
||||
|
||||
…and the value is reproducible across 3 cold runs. A non-zero
|
||||
`draws` value means at least one PM4_TYPE3 DRAW_INDX packet was
|
||||
emitted by the renderer thread.
|
||||
Reference in New Issue
Block a user