handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,121 @@
# Canary boot-to-first-draw trajectory
**Source data:** `xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl`
(4.4 GB, 18.7M events, 90s wallclock, cold run). Profile builder at
`xenia-rs/audit-runs/phase-nonmatch-investigation/build_profiles.py`.
## TL;DR
- **First boot-time `VdSwap` fires on canary's tid=6 (guest main) at
~9.5 s wallclock**, immediately after the rendering subsystem is
initialized. This is the *empty / system-command-buffer* swap that
ours also reaches (ours's metric `swaps=1` is this swap).
- **First gameplay `VdSwap` (intro-movie frame) fires on canary's
tid=13 (renderer) starting at ~10.7 s wallclock**, after the
`sub_825070F0` worker fan-out at host_ns ≈ 10.382-10.384 s. Canary
tid=13 emits **12,092** `VdSwap` + `VdGetSystemCommandBuffer` calls
in the 90-s window, i.e. ~150 fps sustained.
- The gating event between "boot swap" and "first gameplay swap" is
the 4-worker fan-out spawned by `sub_825070F0` at PCs `0x82506528 /
0x82506558 / 0x82506588 / 0x825065B8` with ctx `0xBCE251C0`. Three
of the four workers begin emitting events at host_ns ≈ 10.705 s
(tids 27/28/29 — see `canary-tid-profiles.md` row 33-35).
## Phase-by-phase trajectory
| t (host_ns) | Phase | What | Citation |
|------:|-------|------|----------|
| 0660 ms | XEX load / startup | `XexLoadImage`, ELF→guest init, kernel-state ctor. Spawn tid=6 ("guest main") at host_ns=660 ms. | `phase-nonmatch-investigation/canary-tid-profiles.md:14` |
| 660 ms1.42 s | **Pre-spawn init** | tid=6 sets up TLS, runs CRT init. Establishes vtables / globals. *Sylpheed-specific*: writes `0x8200A1E8` (vtable for `ANON_Class_713383D7`) at the install-epoch host_ns ≈ 9.49.6 s via a 12-byte POD struct copy `{vptr, self, self}` (see `project_audit_068_session3`). **Critical**: this is the vtable whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
| 1.421.94 s | **Main init burst** | 10 thread spawns (tids 817) by tid=6. Ours matches this 1:1. Entries include `0x82181830`, `0x8245A5D0`, `0x82450A28`, `0x82457EF0`, `0x824CD458`, **`0x822F1EE0` (renderer, susp=T)**, `0x824D2878/0x824D2940` (XAudio, susp=T), `0x82178950` (XMA), `0x821748F0` (file IO spawner, susp=T). | `canary-tid-profiles.md:42-55` |
| 1.671 s | **Renderer spawn** | tid=6 calls `ExCreateThread` with entry `0x822F1EE0`, ctx `0xBCE24A40`, suspended=True. Becomes canary tid=13. | `canary-tid-profiles.md:21,49` |
| 1.7261.728 s | **XAudio spawn** | tids 14/15 (XAudio voice-mask poll + sister) spawned suspended. Will dominate event volume (~11M events combined). | `canary-tid-profiles.md:50-51` |
| 1.942.15 s | **Secondary init burst** | 8 more spawns (tids 1825), file-IO + XAM helpers. **Ours emits 0** here — already wedged. | `result.md:48` |
| 9.49.6 s | **vtable install epoch** | Host-side POD struct copy installs `0x8200A1E8` at run-specific arena address (`0xBCE25340` or `0xBCE251C0` per arena drift). This is the ANON_Class_713383D7 instance whose slot 1 = `sub_825070F0`. | `project_audit_068_session3_2026_05_20.md` |
| ~9.5 s | **Boot-init `VdSwap` (on tid=6)** | After `VdInitializeEngines + VdShutdownEngines + VdInitializeEngines + VdSetGraphicsInterruptCallback + VdSetSystemCommandBufferGpuIdentifierAddress + VdInitializeRingBuffer + VdEnableRingBufferRPtrWriteBack + VdGetSystemCommandBuffer`, tid=6 emits **one** `VdSwap` to publish the boot framebuffer. draws=0 still (no PM4 draw packets). | Mirror of `ours-postfix.jsonl` idx 105044-105285; canary same shape. |
| 10.080 s | tid=26 second-call helper | `0x821748F0` second invocation. | `canary-tid-profiles.md:32` |
| **10.383 s** | **sub_825070F0 worker fan-out** | **Four `ExCreateThread` calls in 1 ms** spawn entries `0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8` all sharing ctx `0xBCE251C0` (the ANON_Class instance). These are the workers that consume cache-file IO and signal the wedge event(s) that AUDIT-049 found dangling in ours. | `canary-tid-profiles.md:63-66`, `sub_825070F0.md` |
| 10.7 s | **Worker resume / first events** | tids 27, 28, 29 emit their first events. tid=28 dominates (3.26M events) doing file IO (`530× NtReadFile` of `cache:\…`), heavy CS contention (1.07M RtlEnterCS), and signaling the wedge events. | `canary-tid-profiles.md:33-35`, `sub_82452DC0.md` |
| ~10.7+ s | **Renderer wakes** | Once `sub_825070F0` workers begin, the events that canary's tid=13 was waiting on get signaled. tid=13 transitions Blocked→Running, starts producing `VdGetSystemCommandBuffer`/`VdSwap` pairs at ~150 fps. | `canary-tid-profiles.md:21`, `result.md:30-39` |
| ~10.790 s | **Sustained rendering** | tid=13 emits 12,092 `VdSwap` calls. Intro movie ⇒ title screen ⇒ gameplay (depends on user input). In an unattended cold run, canary likely plateaus on the title screen but is genuinely rendering. | `canary-tid-profiles.md:21` |
## Canary call-chain from entry_point to first gameplay draw
```
canary tid=6 (guest main)
entry_point
→ sub_8216EA68 (post-init dispatcher)
→ sub_822F1AA8 (game-loop dispatcher) (sub_822F1AA8.md)
→ bctrl vtable[0]({sub_82175330 → tail → sub_82173990})
→ sub_82173990 (sync task-spawn-and-join) (sub_82173990.md)
→ bl sub_821746B0 (alloc task + spawn worker thid=17, F8000094)
[worker thid=17 runs body sub_821748F0
→ sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08
→ sub_821CB030 (creates Event, submits work via sub_82452DC0)
→ … cache file loads (cache:\aab216c3\..., cache:\87719002\..., etc.)
→ spawns child workers via ExCreateThread(...,821C4AD0,...)
→ eventually ExTerminateThread(0)]
→ KeWaitForSingleObject(thid=17.handle) INFINITE
[blocks ~445 log lines wallclock; completes when thid=17 terminates]
← returns
← returns to sub_822F1AA8 outer loop
→ iterates sub_821741C8 → sub_82172BA0 → bctrl vtable[6]
→ sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800
→ bctrl vtable[1] = sub_825070F0 (sub_825070F0.md)
→ 4× ExCreateThread(...,0x82506528/58/88/B8, ctx=0xBCE25xxx, susp=T)
→ 4× NtResumeThread / scheduler enables the workers
[workers tids 27/28/29/+1 begin executing]
→ outer loop continues
→ KeWaitForSingleObject (4040×/60 s = ~67 fps frame-pacing wait)
→ bctrl vtable[2] → various per-frame work
→ tid=6's main loop produces no VdSwap directly past the init swap
canary tid=13 (renderer; spawned by tid=6 at 0x822F1EE0)
[stays suspended OR Blocked-on-event until worker fan-out at 10.38 s]
→ after wake, enters render loop:
while (running) {
VdGetSystemCommandBuffer(...) ; 12,092× / 90 s
… build per-frame command buffer …
VdSwap(buffer_ptr, fetch_ptr, …) ; 12,092× / 90 s
}
```
## Pre-conditions canary establishes before first gameplay draw
In time order, all must hold:
1. **GPU subsystem initialized**: `VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback`. Ours: ✓ (idx 105044-105117).
2. **Renderer thread alive**: tid=13 created suspended via `ExCreateThread(entry=0x822F1EE0, susp=T)`. Ours: ✓ (idx 105348).
3. **Worker-cluster activation**: 4 workers spawned by `sub_825070F0` consuming `sub_82452DC0` work. Ours: **✗ 0 fires**.
4. **`sub_821CB030`'s Event signaled**: the per-load completion event created at `sub_821CB030+0x128` and waited at `+0x1AC` must be signaled by a `sub_825070F0` worker. Ours: **`NO_SIGNALS_DESPITE_WAITS` on handle 0x12d0**.
5. **`sub_82173990`'s join-wait completes**: tid=6's wait at `sub_82173990+0x2D0` on the thid=17 thread handle. Ours: **✗ tid=1 stuck on handle 0x12c8 (= tid=13's thread handle)**.
6. **Renderer wakes**: per AUDIT-049, the worker-cluster must signal whatever guards tid=13's body. Canary: ✓. Ours: **✗ tid=13 itself wedges in sub_821CB030**.
## Numerical signature of canary at ~50 s wallclock (for reference)
- 18.7 M events / 28 tids.
- Renderer tid=13: 594 k events, including 12,092 VdSwap.
- Worker tid=28 (sub_825070F0 worker 0): 3.26 M events.
- XAudio tid=14/15: 6.15 M / 4.78 M events.
- ours at 50 M-instr / ~3 s wallclock: 121 k events / 13 tids. Renderer
tid=13 in ours: ~80 events (wedged).
- The order of magnitude differs by ~150× because ours wedges ~7 s before
canary's `sub_825070F0` fan-out fires.
## Uncertainty / open questions
- **What is the precise host-side install of the `ANON_Class_713383D7`
vtable `0x8200A1E8`?** AUDIT-068 sessions 14 localized this to a
POD struct copy in the install epoch [9.4 s, 9.6 s], with the writer
identified at GUEST PPC `sub_824FD240+0x24` (NOT a host-side kernel
import as initially feared). But in ours, `sub_824FD240` and its
callers `sub_824F7800/CD0/8398` fire 0× because that chain is
downstream of the tid=13 wedge. See `project_audit_068_session4`.
- **First "gameplay draw" precisely**: the first VdSwap that emits PM4
draw packets (e.g. `PM4_TYPE3 DRAW_INDX`) into the ringbuffer. Need
to inspect canary's PM4 ring at host_ns ≈ 10.7 s to confirm. AUDIT
history hasn't disambiguated boot/empty-swap from gameplay-swap at
the PM4-packet level. This is a methodology gap.
- **What unwedges canary's worker-cluster activation chain?** AUDIT-068
pinned the install epoch but not the **trigger** — what guest call
causes `sub_824FD240+0x24`'s POD-copy to fire? Identifying the
trigger and replaying it in ours is the unanswered Path β attack.