Files
xenia-rs/audit-runs/review-a-boot-state/canary-boot-trajectory.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

9.6 KiB
Raw Blame History

Canary boot-to-first-draw trajectory

Source data: xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl (4.4 GB, 18.7M events, 90s wallclock, cold run). Profile builder at xenia-rs/audit-runs/phase-nonmatch-investigation/build_profiles.py.

TL;DR

  • First boot-time VdSwap fires on canary's tid=6 (guest main) at ~9.5 s wallclock, immediately after the rendering subsystem is initialized. This is the empty / system-command-buffer swap that ours also reaches (ours's metric swaps=1 is this swap).
  • First gameplay VdSwap (intro-movie frame) fires on canary's tid=13 (renderer) starting at ~10.7 s wallclock, after the sub_825070F0 worker fan-out at host_ns ≈ 10.382-10.384 s. Canary tid=13 emits 12,092 VdSwap + VdGetSystemCommandBuffer calls in the 90-s window, i.e. ~150 fps sustained.
  • The gating event between "boot swap" and "first gameplay swap" is the 4-worker fan-out spawned by sub_825070F0 at PCs 0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8 with ctx 0xBCE251C0. Three of the four workers begin emitting events at host_ns ≈ 10.705 s (tids 27/28/29 — see canary-tid-profiles.md row 33-35).

Phase-by-phase trajectory

t (host_ns) Phase What Citation
0660 ms XEX load / startup XexLoadImage, ELF→guest init, kernel-state ctor. Spawn tid=6 ("guest main") at host_ns=660 ms. phase-nonmatch-investigation/canary-tid-profiles.md:14
660 ms1.42 s Pre-spawn init tid=6 sets up TLS, runs CRT init. Establishes vtables / globals. Sylpheed-specific: writes 0x8200A1E8 (vtable for ANON_Class_713383D7) at the install-epoch host_ns ≈ 9.49.6 s via a 12-byte POD struct copy {vptr, self, self} (see project_audit_068_session3). Critical: this is the vtable whose slot 1 = sub_825070F0. project_audit_068_session3_2026_05_20.md
1.421.94 s Main init burst 10 thread spawns (tids 817) by tid=6. Ours matches this 1:1. Entries include 0x82181830, 0x8245A5D0, 0x82450A28, 0x82457EF0, 0x824CD458, 0x822F1EE0 (renderer, susp=T), 0x824D2878/0x824D2940 (XAudio, susp=T), 0x82178950 (XMA), 0x821748F0 (file IO spawner, susp=T). canary-tid-profiles.md:42-55
1.671 s Renderer spawn tid=6 calls ExCreateThread with entry 0x822F1EE0, ctx 0xBCE24A40, suspended=True. Becomes canary tid=13. canary-tid-profiles.md:21,49
1.7261.728 s XAudio spawn tids 14/15 (XAudio voice-mask poll + sister) spawned suspended. Will dominate event volume (~11M events combined). canary-tid-profiles.md:50-51
1.942.15 s Secondary init burst 8 more spawns (tids 1825), file-IO + XAM helpers. Ours emits 0 here — already wedged. result.md:48
9.49.6 s vtable install epoch Host-side POD struct copy installs 0x8200A1E8 at run-specific arena address (0xBCE25340 or 0xBCE251C0 per arena drift). This is the ANON_Class_713383D7 instance whose slot 1 = sub_825070F0. project_audit_068_session3_2026_05_20.md
~9.5 s Boot-init VdSwap (on tid=6) After VdInitializeEngines + VdShutdownEngines + VdInitializeEngines + VdSetGraphicsInterruptCallback + VdSetSystemCommandBufferGpuIdentifierAddress + VdInitializeRingBuffer + VdEnableRingBufferRPtrWriteBack + VdGetSystemCommandBuffer, tid=6 emits one VdSwap to publish the boot framebuffer. draws=0 still (no PM4 draw packets). Mirror of ours-postfix.jsonl idx 105044-105285; canary same shape.
10.080 s tid=26 second-call helper 0x821748F0 second invocation. canary-tid-profiles.md:32
10.383 s sub_825070F0 worker fan-out Four ExCreateThread calls in 1 ms spawn entries 0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8 all sharing ctx 0xBCE251C0 (the ANON_Class instance). These are the workers that consume cache-file IO and signal the wedge event(s) that AUDIT-049 found dangling in ours. canary-tid-profiles.md:63-66, sub_825070F0.md
10.7 s Worker resume / first events tids 27, 28, 29 emit their first events. tid=28 dominates (3.26M events) doing file IO (530× NtReadFile of cache:\…), heavy CS contention (1.07M RtlEnterCS), and signaling the wedge events. canary-tid-profiles.md:33-35, sub_82452DC0.md
~10.7+ s Renderer wakes Once sub_825070F0 workers begin, the events that canary's tid=13 was waiting on get signaled. tid=13 transitions Blocked→Running, starts producing VdGetSystemCommandBuffer/VdSwap pairs at ~150 fps. canary-tid-profiles.md:21, result.md:30-39
~10.790 s Sustained rendering tid=13 emits 12,092 VdSwap calls. Intro movie ⇒ title screen ⇒ gameplay (depends on user input). In an unattended cold run, canary likely plateaus on the title screen but is genuinely rendering. canary-tid-profiles.md:21

Canary call-chain from entry_point to first gameplay draw

canary tid=6 (guest main)
  entry_point
   → sub_8216EA68 (post-init dispatcher)
      → sub_822F1AA8 (game-loop dispatcher)            (sub_822F1AA8.md)
         → bctrl vtable[0]({sub_82175330 → tail → sub_82173990})
            → sub_82173990 (sync task-spawn-and-join)  (sub_82173990.md)
               → bl sub_821746B0 (alloc task + spawn worker thid=17, F8000094)
                 [worker thid=17 runs body sub_821748F0
                   → sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08
                   → sub_821CB030 (creates Event, submits work via sub_82452DC0)
                   → … cache file loads (cache:\aab216c3\..., cache:\87719002\..., etc.)
                   → spawns child workers via ExCreateThread(...,821C4AD0,...)
                   → eventually ExTerminateThread(0)]
               → KeWaitForSingleObject(thid=17.handle) INFINITE
                 [blocks ~445 log lines wallclock; completes when thid=17 terminates]
               ← returns
         ← returns to sub_822F1AA8 outer loop
         → iterates sub_821741C8 → sub_82172BA0 → bctrl vtable[6]
            → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800
               → bctrl vtable[1] = sub_825070F0    (sub_825070F0.md)
                  → 4× ExCreateThread(...,0x82506528/58/88/B8, ctx=0xBCE25xxx, susp=T)
                  → 4× NtResumeThread / scheduler enables the workers
            [workers tids 27/28/29/+1 begin executing]
         → outer loop continues
            → KeWaitForSingleObject (4040×/60 s = ~67 fps frame-pacing wait)
            → bctrl vtable[2] → various per-frame work
            → tid=6's main loop produces no VdSwap directly past the init swap
canary tid=13 (renderer; spawned by tid=6 at 0x822F1EE0)
  [stays suspended OR Blocked-on-event until worker fan-out at 10.38 s]
  → after wake, enters render loop:
     while (running) {
       VdGetSystemCommandBuffer(...)        ; 12,092× / 90 s
       … build per-frame command buffer …
       VdSwap(buffer_ptr, fetch_ptr, …)      ; 12,092× / 90 s
     }

Pre-conditions canary establishes before first gameplay draw

In time order, all must hold:

  1. GPU subsystem initialized: VdInitializeEngines → VdInitializeRingBuffer → VdEnableRingBufferRPtrWriteBack → VdSetGraphicsInterruptCallback. Ours: ✓ (idx 105044-105117).
  2. Renderer thread alive: tid=13 created suspended via ExCreateThread(entry=0x822F1EE0, susp=T). Ours: ✓ (idx 105348).
  3. Worker-cluster activation: 4 workers spawned by sub_825070F0 consuming sub_82452DC0 work. Ours: ✗ 0 fires.
  4. sub_821CB030's Event signaled: the per-load completion event created at sub_821CB030+0x128 and waited at +0x1AC must be signaled by a sub_825070F0 worker. Ours: NO_SIGNALS_DESPITE_WAITS on handle 0x12d0.
  5. sub_82173990's join-wait completes: tid=6's wait at sub_82173990+0x2D0 on the thid=17 thread handle. Ours: ✗ tid=1 stuck on handle 0x12c8 (= tid=13's thread handle).
  6. Renderer wakes: per AUDIT-049, the worker-cluster must signal whatever guards tid=13's body. Canary: ✓. Ours: ✗ tid=13 itself wedges in sub_821CB030.

Numerical signature of canary at ~50 s wallclock (for reference)

  • 18.7 M events / 28 tids.
  • Renderer tid=13: 594 k events, including 12,092 VdSwap.
  • Worker tid=28 (sub_825070F0 worker 0): 3.26 M events.
  • XAudio tid=14/15: 6.15 M / 4.78 M events.
  • ours at 50 M-instr / ~3 s wallclock: 121 k events / 13 tids. Renderer tid=13 in ours: ~80 events (wedged).
  • The order of magnitude differs by ~150× because ours wedges ~7 s before canary's sub_825070F0 fan-out fires.

Uncertainty / open questions

  • What is the precise host-side install of the ANON_Class_713383D7 vtable 0x8200A1E8? AUDIT-068 sessions 14 localized this to a POD struct copy in the install epoch [9.4 s, 9.6 s], with the writer identified at GUEST PPC sub_824FD240+0x24 (NOT a host-side kernel import as initially feared). But in ours, sub_824FD240 and its callers sub_824F7800/CD0/8398 fire 0× because that chain is downstream of the tid=13 wedge. See project_audit_068_session4.
  • First "gameplay draw" precisely: the first VdSwap that emits PM4 draw packets (e.g. PM4_TYPE3 DRAW_INDX) into the ringbuffer. Need to inspect canary's PM4 ring at host_ns ≈ 10.7 s to confirm. AUDIT history hasn't disambiguated boot/empty-swap from gameplay-swap at the PM4-packet level. This is a methodology gap.
  • What unwedges canary's worker-cluster activation chain? AUDIT-068 pinned the install epoch but not the trigger — what guest call causes sub_824FD240+0x24's POD-copy to fire? Identifying the trigger and replaying it in ours is the unanswered Path β attack.