Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
10 KiB
Phase Non-match Investigation — Results
Date: 2026-05-19
Source: xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl (4.4 GB, 18.7M events, 28 tids)
Companion ours data: audit-runs/phase-w-wedge-reattack/ours-postfix.jsonl (121,569 events, 13 tids)
Outcome: (A) — AUDIT-058/063/067 framing CONFIRMED end-to-end using new Phase A thread.create events.
TL;DR
Per Phase A thread.create events (wired in C+15-α), canary spawns 23 threads; the final 4
fire at host_ns ≈ 10.38 s and have entry PCs 0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8
with shared context 0xBCE251C0 and stack 65,536 — these are exactly the 4 worker entries
documented in the sub_825070F0 dossier. The historical AUDIT-058/063 framing is correct:
sub_825070F0 is the one-shot 4-worker fan-out that ours never reaches.
Three of those four canary workers go on to dominate the trace:
tid=28 (3.26M events, sub_82506528), tid=27 (36k events, sub_82506558), tid=29 (91k events, sub_82506588)
— the fourth (0x825065B8) was never resumed in this 90s window.
Ours emits 10 thread.create events vs canary's 23, stops after spawn #10 (0x821748F0 at 1.727s),
and never produces another thread.create for the rest of the run. The 13 subsequent canary
spawns including the critical sub_825070F0 batch are entirely missing.
What canary's heavy workers DO
| tid | events | role | entry_pc |
|---|---|---|---|
| 14 | 6.15 M | XAudio voice-mask poll (26,126× XAudioGetVoiceCategoryVolumeChangeMask) | 0x824D2878 (aff=16) |
| 15 | 4.78 M | XAudio sister (KeWaitForSingleObject + heavy IRQL spinlock cycle) | 0x824D2940 (aff=32) |
| 28 | 3.26 M | sub_825070F0 worker 0 (1.07 M × RtlEnterCS, 530× NtReadFile) | 0x82506528 (ctx 0xBCE251C0) |
| 16 | 1.80 M | XMA decoder (XMACreateContext, RtlEnterCS heavy) |
0x82178950 |
| 21 | 1.00 M | NtWaitForMultipleObjectsEx worker | 0x824563E0 |
| 13 | 594 k | Renderer (12,092× VdSwap, VdGetSystemCommandBuffer; 1,805× Ke/NtSetEvent; 475× wait.begin) | 0x822F1EE0 |
The biggest workers (tid=14, tid=15) are NOT sub_825070F0 workers — they are spawned much earlier (1.726/1.727s)
via sub_824D2878 / sub_824D2940 and run forever as XAudio render/voice threads. Ours spawns these two
suspended (1.626s) but they never receive the resume call that would activate them — ours produces 0
XAudio* events on these tids (verifiable from ours's tid event counts: ours has only 13 tids total, none
with the 6M-event signature).
Spawn-chain summary (full table in canary-tid-profiles.md)
Three distinct fan-out clusters in canary, all from tid=6 (guest main):
- 1.42–1.94 s — main init burst: 10 spawns (tids 8–17). Ours matches this 1:1 in spawn count and entries.
- 1.94–2.15 s — secondary burst (XAM/XCONFIG helpers, tids 18–25): 8 additional spawns. Ours emits 0.
- 10.08–10.38 s — XAudio worker fan-out: 5 spawns (tids 26, 27, 28, 29, +1 unresumed). The last 4
are the
sub_825070F0workers. Ours emits 0.
sub_825070F0 spawn-chain confirmation (static + runtime)
sylpheed.dbconfirmssub_825070F0lives invtable 0x8200A208 slot 1and0x8200A928 slot 1(anonymous classANON_Class_713383D7, 7 slots each).- Zero
vptr_writes/ zeroxrefs/ zeroindirect_dispatch_candidatesreach either vtable. AUDIT-067's host-side install hypothesis is confirmed by static-analysis exhaustion. - Function body contains the 4 sequential
addi rN, r0, 0x8250652X+bl sub_824AA388(= ExCreateThread wrapper) blocks at PCs0x825071F8 / 0x82507244 / 0x82507290 / 0x825072DC. - The 4 worker entry thunks (
0x82506528 / 0x82506558 / 0x82506588 / 0x825065B8) are uniform vtable-slot callers: each loadsr3->vtable->[140|144|148|152]and dispatches via CTR (offsets 35/36/37/38). - Runtime ctx
0xBCE251C0is referenced 4× in canary jsonl (the 4 spawn events) and 0× in ours-postfix.jsonl. Ours never allocates the dispatcher object that holds the0x8200A208vtable.
Wake/signal chain to wedge (partial)
- Phase W: ours's wedge handle
0x12d0(Event/Autowaited atsub_821CB030+0x1B0on tid=13 the renderer); main tid=1 join-waits onThread(id=13)atsub_82173990+0x2D4. - Canary tid=13 (renderer) creates 10 handles, calls Ke/NtSetEvent 1,805×, calls wait.begin 475× — it is alive and signaling. Earliest tid=13 handle.create at 2.396 s; explosion at 10.7 s once the sub_825070F0 workers come online.
- Canary tid=13's signals correlate with the sub_825070F0 worker batch coming up at 10.7 s (tid=27/28/29 first-events are all 10.705 s). Without those workers, ours's renderer has no producer to wake the event it waits on, and main joins-on-renderer → full deadlock.
- Full SID-level mapping of "which canary worker fires the NtSetEvent that wakes the renderer's wait"
was not attempted (handle IDs and SIDs don't cross-correlate run-to-run; would require source-level
read of
sub_821CB030). The class of producer (sub_825070F0workers) is identified.
Reading-error / methodology notes
- #16 EH-handler caution: the
sub_824AA388spawn helper is reached viabl(direct call, not via EH unwind) — no risk of misanchoring on a catch handler. - #28 framing: Phase A
thread.create.payload.parent_tidredundantly equals the event'stidfield (perevent_log.cc:312-326: emitted ON the parent thread's stream, child tid is NOT in payload). Child-tid is recovered by FIFO matching tofirst_event[tid]chronologically. - #30 cross-engine SIDs: ours's wedge handle SID
d5e23609d3948568does not appear in canary because these are worker-local Event handles, not process-global dispatchers; only the shared-global recipe is scheduling-invariant. - Cold-run jitter was not a factor here — only one canary jsonl was processed; the spawn-chain identification is robust because the SID-independent entry_pc + ctx_ptr + stack_size triplet is effectively a content-addressed fingerprint that survives reruns.
Outcome: (A) — historical framing confirmed
The Phase A thread.create data directly corroborates AUDIT-058/063/067:
sub_825070F0IS the function that spawns the 4 sub_82506528-family workers (confirmed in canary trace, never fires in ours).- The dispatcher class
ANON_Class_713383D7whose vtable0x8200A208slot 1 points atsub_825070F0has its vtable installed via a path invisible to static guest analysis (AUDIT-067 unresolved). - The HEAVY workers (tid=14/15 → XAudio; tid=16 → XMA; tid=21 → NtWait worker) are spawned earlier
via different entries (
sub_824D2878,sub_824D2940,sub_82178950,sub_824563E0) but are all suspended; their resume gate is also missing in ours (those threads exist in ours-postfix but emit < 100 events each, all from the spawn-time bookkeeping).
Recommended next attack target
Re-attempt the deferred AUDIT-067 / AUDIT-068 host-side vptr install probe with current tooling. Specific subtasks:
-
Identify the allocator that produces the
ANON_Class_713383D7instance with vtable0x8200A208.- Static search: which fn loads
0x8200A208as a constant? (database says nothing — confirm with a fresh ghidra script that includes split-pair detection.) - Runtime probe: instrument both engines to log every
stw vptr, 0(obj)wherevptr ∈ {0x8200A208, 0x8200A928}. In canary, this MUST fire ≥ 1× before the 10.38 s spawn burst; in ours, it presumably never fires. Identify the PC.
- Static search: which fn loads
-
If host-side: trace through the kernel exports table. The most likely path is one of
XAudio2*Create,XMACreateContext,XMPCreate*, or an undocumentedXAudioAPI. Per the tid=14 call profile,XAudioGetVoiceCategoryVolumeChangeMaskis the only XAudio API actively touched — look at its dossier (or canary'sxboxkrnl_audio.cc/xam_audio.cc) for object-construction side-effects. -
Alternative: identify which Sylpheed API call is the trigger for the 10.38 s
sub_825070F0firing. Canary main (tid=6) at host_ns ≈ 10.30–10.38 s does the work that leads up to this; ~300 ms before, tid=6 has activity that ours doesn't reach. Diff tid=6's event stream in canary vs ours's tid=1 in the time window [10 s, 10.4 s] (canary) / [whatever ours's wallclock-equivalent is] — but ours doesn't reach 10 s wallclock either, so the divergence is upstream. -
Secondary attack: the XAudio tid=14/15 resume gate. Those threads are spawned suspended in BOTH engines (canary at 1.726/1.727 s, ours at 1.626 s); canary resumes them within ~1 ms and they emit 11 M events combined. What guest call resumes them in canary? Cross-thread NtResumeThread on the tid=14 handle. Sylpheed presumably resumes them via an XAudio2 API. If we can identify the resume call site in canary and figure out why ours doesn't reach it, we unblock 60% of the missing event volume (XAudio) independent of
sub_825070F0.
Artifacts
All artifacts in xenia-rs/audit-runs/phase-nonmatch-investigation/:
build_profiles.py— streaming jsonl profile builder (~200 LOC)tid-event-counts.csv— per-tid totals (28 rows)tid-top-calls.txt— per-tid top-20 kernel.call namestid-ntset-handles.txt— per-tid Ke/NtSetEvent handle distribution (EMPTY — canary's kernel.call payloads haveargs:{}for NtSetEvent; handle is in resolved-arg JSON not exposed in currentargs_resolved. Not needed for Outcome (A) determination. Future Phase: extend Phase Akernel.callto also surface ALL register args inargsfor diff-tool consumption.)tid-wait-handles.txt— per-tid wait.begin handle distribution (EMPTY for same reason: thewait.beginevents I sampled haveraw_handle_id=Nonebecause the payload uses ahandle_semantic_idsarray, not a singleraw_handle_id. The handle.create map is populated correctly — seehandle-create.json.)thread-creates.json— canary thread.create payloads keyed by child_tid (note: child_tid is FIFO-inferred, see profiles doc)thread-exits.json— canary thread.exit events (3 in this trace: tid=17/18/26)excreate-events.json— all ExCreateThread import.call events with idx/host_nscreate-thread-events.json— full thread.create event payloadshandle-create.json— all handle.create with raw_handle, sid, object_typespawn-chain.json— auto-correlated spawn → ExCreateThread linkagecanary-tid-profiles.md— human-readable per-tid catalogue + spawn-chain tablesresult.md— this file