Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17 KiB
Step 2 — Natural install-trigger sequence and ours divergence point
Date: 2026-05-21
Mode: PLAN-only (investigation; no engine LOC changes).
Sources: canary-jitter-1.jsonl (4.4 GB, 18.7M events) and
phase-w-wedge-reattack/ours-postfix.jsonl (28 MB, 121,569 events).
TL;DR
The Step 2 plan's framing — "identify the canary tid=6 kernel-call sequence in the install window [9.4s, 9.6s]" — cannot be applied because ours never reaches host_ns ≥ 1.73s. Ours's tid=1 wedges 8 seconds before the install epoch. The reframed question — "what canary-tid=6 sequence between the matched-prefix wedge point and the install epoch fails in ours?" — resolves to a single root cause one level upstream of the wedge:
Canary's spawned cache-loader worker (canary tid=17, entry
0x821748F0) executes ~4140 events and callsExTerminateThreadat host_ns = 2.092s, taking 154ms. Ours's analog (ours tid=13) executes 435 events, never reaches its second wait iteration, and wedges at its FIRSTNtWaitForSingleObjectEx(no signaler ever fires). Ours's tid=13 takes a different guest-code branch from the first wait onward — it callsNtReleaseSemaphoreinstead ofNtSetEventbetweenNtCreateEventandNtWaitForSingleObjectEx, so the event it then waits on is unsignaled.
This is a branch divergence inside guest code sub_821CB030's
body, NOT a missing kernel call in ours and NOT a wrong return
value from ours's kernel.
Step 0 outcome — install epoch reachable on canary, not on ours
| Source | First event | Last event |
|---|---|---|
| canary tid=6 events in [9.0s..11.0s] | 16,175 kernel.calls captured | install epoch + worker-spawn covered ✓ |
| ours tid=1 events | 1.728s (last event before wedge) | install epoch is at ~9.5s — 8s in the future |
Ours physically cannot reach 9.4s; tid=1 blocks on tid=13's thread-handle
at host_ns=1.728s, all other tids subsequently block too (see
phase-w-wedge-reattack/halt-on-deadlock-dump.txt). Therefore the
canary "kernel-call sequence ours doesn't make in the install window"
question is degenerate: ours makes none of canary's 16,175 calls
in that window because ours stops emitting at host_ns=1.73s.
The substantive Step 2 question reframes to: "What does canary
do between matched-prefix idx ~108,476 (= ours's last events) and
the install epoch?" Answer: it RUNS the worker tid=17 to
completion, which causes the join-wait on tid=1/6 to return, after
which tid=6 iterates sub_822F1AA8's main loop further and
eventually triggers sub_824FD240 and sub_825070F0. Everything
hinges on tid=17 completing.
Step 1 outcome — canary tid=6 spawns sub_821748F0 at host_ns=1.935s
Exact anchor:
canary tid=6 host_ns=1935433700 idx=108476
ExCreateThread(entry=0x821748f0, ctx=0xbc365620, stack=524288, susp=T)
→ handle.create raw=0xf80000a0 hsid=3bd922fbb385c2c9
canary tid=6 host_ns=1937223600 idx=108498
NtResumeThread
NtWaitForSingleObjectEx handles=[3bd922fbb385c2c9] timeout=-1
→ wait.begin
canary tid=6 host_ns=2092000000 idx=108499 (155 ms later)
kernel.return NtWaitForSingleObjectEx rv=0 status=0x00000000
The wait IS infinite (timeout_ns=-1) — yet it returns in 155ms because
the worker terminates (canary tid=17's last call is ExTerminateThread
at host_ns=2.0918s).
Ours's mirror:
ours tid=1 host_ns=1727479660 idx=108481
ExCreateThread(entry=0x821748f0, ctx=0x4024d640, stack=0, susp=T)
→ handle.create raw=0x000012c8 hsid=8a25e09a8a739c1b
ours tid=1 host_ns=1727611893 idx=108505
wait.begin handles=[8a25e09a8a739c1b] timeout=-1
ours tid=1 host_ns=1727614433 idx=108506
kernel.return NtWaitForSingleObjectEx rv=0 ← but this is just the
return record from the entry probe, NOT actual unblock
(Note: ours-postfix.jsonl schema emits the entry-probe kernel.return
even on an infinite wait, because the probe wraps the wait wrapper.
Per halt-on-deadlock-dump.txt, tid=1 is in fact still Blocked on
handle 0x000012c8 = Thread(id=13) at deadlock-detection time.)
The spawn parameters look identical in shape (same entry PC; ctx and stack are run-specific). Spawn semantics match.
Step 2 outcome — canary tid=17 vs ours tid=13 kernel-call differential
Lifetimes:
| canary tid=17 | ours tid=13 | |
|---|---|---|
| first event | host_ns=1.9378s | host_ns=1.7276s |
| last event | host_ns=2.0918s | host_ns=1.7307s |
| duration | 154 ms | 3 ms |
| total events | 4140 | 435 |
| kernel.call count | 1351 | 142 |
| terminates? | yes via ExTerminateThread |
no — wedged on wait |
Per-call differential (top entries by |canary − ours|):
| kernel.call | canary tid=17 | ours tid=13 | Δ |
|---|---|---|---|
| RtlEnterCriticalSection | 607 | 58 | +549 |
| RtlLeaveCriticalSection | 607 | 58 | +549 |
| NtClose | 19 | 2 | +17 |
| NtCreateEvent | 18 | 3 | +15 |
| NtDuplicateObject | 16 | 2 | +14 |
| RtlInitAnsiString | 11 | 1 | +10 |
| NtWaitForSingleObjectEx | 11 | 2 | +9 |
| RtlInitializeCriticalSectionAndSpinCount | 15 | 6 | +9 |
| NtQueryFullAttributesFile | 9 | 1 | +8 |
| NtReleaseSemaphore | 9 | 1 | +8 |
| RtlNtStatusToDosError | 9 | 1 | +8 |
| NtSetEvent | 8 | 1 | +7 |
| KeTlsSetValue | 2 | 0 | +2 |
| NtCreateFile | 2 | 0 | +2 |
| ExCreateThread | 1 | 0 | +1 |
| ExTerminateThread | 1 | 0 | +1 |
| KeTlsGetValue | 1 | 0 | +1 |
| KeQueryPerformanceFrequency | 0 | 1 | -1 |
Set-difference of unique kernel-call names: ours's set of called
APIs is a strict subset of canary's, plus KeQueryPerformanceFrequency
which canary called outside this window. No kernel API is missing
from ours's implementation that canary uses. All of these APIs
already work in ours (they are called successfully on tid=5, tid=1,
or tid=10 elsewhere in the same run).
The differential isn't "ours fails to implement a kernel call" — it's "ours executes 10× fewer iterations of the same loop body."
The control-flow divergence (the root cause)
Canary tid=17, idx 339-356 — the FIRST wait pattern:
idx=339 NtCreateEvent
idx=340 handle.create raw=0xf80000b8 hsid=1070523eb111c6ea object_type=1 (Event)
idx=343 NtDuplicateObject → handle.create at idx=344
idx=347 NtSetEvent ← THE EVENT IS SIGNALED BEFORE THE WAIT
idx=350 NtClose → handle.destroy at idx=351
idx=354 NtWaitForSingleObjectEx
idx=355 wait.begin handles=[1070523eb111c6ea] timeout=-1
idx=356 kernel.return rv=0 ← wait completes in 23µs because event was signaled
Ours tid=13, idx 175-434 — the analog wait pattern:
idx=175 NtCreateEvent
idx=177 handle.create raw=0x000012d0 hsid=d5e23609d3948568 object_type=1 (Event)
… 240 RtlEnterCriticalSection / RtlLeaveCriticalSection ops in between …
idx=419 NtDuplicateObject → handle.create at idx=420
idx=429 NtReleaseSemaphore ← DIFFERENT API — semaphore, not event-set
idx=432 NtWaitForSingleObjectEx
idx=433 wait.begin handles=[d5e23609d3948568] timeout=-1
idx=434 kernel.return rv=0 (entry probe only; actual wait blocks forever)
⏸ WEDGE — event d5e23609d3948568 is never signaled.
The key observation: between NtCreateEvent and the corresponding
NtWaitForSingleObjectEx, canary calls NtSetEvent to signal
the very event it is about to wait on (idiomatic self-signaled
wait-pump barrier). Ours skips the NtSetEvent, calls
NtReleaseSemaphore instead, and then blocks on the unsignaled event.
This is a guest-code branch divergence inside the helper
hierarchy sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0
(per sub_82173990.md chain). The branch predicate is some state
read between NtCreateEvent and the call site of NtSetEvent /
NtReleaseSemaphore.
Step 3/4 — Why does the predicate differ between engines?
The deep root: this exact divergence pattern is what AUDIT-069 S5 already found at a different lens:
AUDIT-069 S5: "Other producers: canary 25 vs ours 1." Canary has 24 additional thread sources releasing the work semaphore that ours doesn't have.
Combining S5 with this Step 2 finding:
- Ours's tid=13 emits ONLY 1 NtReleaseSemaphore before wedging (consistent with the 1 "other producer" S5 measured).
- Canary's tid=17 emits 9 NtReleaseSemaphore + 8 NtSetEvent before reaching ExTerminateThread. Each release/set comes from a different cache-load iteration.
- The iteration count is gated by the loop body completing each iteration. Each iteration begins by waiting on an event that must be PRE-SIGNALED to advance.
In canary, the event gets pre-signaled (NtSetEvent before NtWait). In ours, the same code path takes the "release semaphore + wait on event signaled by external" branch instead of the "set event + wait on event" branch. The state read by the predicate at the branch differs.
What state? Without disassembling sub_821CB030/sub_821CBA08
and binding the branch PC to the guest memory location the predicate
reads, we cannot say definitively. Candidate state sources:
- A bit/flag in the ctx (
0x4024d640in ours vs0xbc365620in canary — different addresses but same shape). Could be uninitialized in ours due to ANON_Class vtable install atsub_824FD240+0x24not having fired (AUDIT-068 S4). But that vtable install fires much later (host_ns=9.4s in canary), so this is unlikely. - The result of a prior
NtQueryFullAttributesFilecall. Canary tid=17 calls this 9× before reaching ExTerminateThread; ours tid=13 calls it 1× before wedging. The file being queried is in thecache:\filesystem (persub_82173990.mdchain). - A guest-memory shared CS-protected pointer set by another tid (canary tids 4/10/14 do 38+90+38 signal events in the [1.9..2.1s] window; in ours, tids 4/5/14 are STILL working in [0..1.73s] but their output is shifted to ours's tid=5, which per AUDIT-069 S5 matches canary's tid=10 producer count almost exactly — 90 NtReleaseSemaphore each).
Cause attribution
Per the Step 5 framework:
- Missing ours implementation? NO. Every kernel API canary tid=17 calls is also implemented in ours and works (verified by other tids using them successfully).
- Incorrect return value in ours? UNLIKELY but unverified. Phase
A schema doesn't capture args/return values for most calls;
args_resolved={}is empty for nearly every call in this window. - Missing side effect in ours? POSSIBLY. If
NtQueryFullAttributesFileorNtCreateFileoncache:\<hash>\...has a slightly different behavior in ours (e.g., succeeds when canary fails, or vice-versa), the resulting branch could diverge. - Upstream state divergence (most likely): a guest-memory value
read by a predicate inside
sub_821CB030/sub_821CBA08differs between engines. The earlier-in-this-tid CS-blob (240+ enter/leave pairs between idx 177 and idx 423) processes some data structure, the result of which selects the branch.
Best single guess (MEDIUM confidence): a NtQueryFullAttributesFile
on a cache:\<hash>\<filename> path returns a different value in
ours than in canary (file present vs not, size mismatch, or attrib
mismatch). The branch chooses "we need to recompute the cache item"
(NtReleaseSemaphore path) instead of "cache item is ready, signal
event and proceed" (NtSetEvent path).
Disjoint-gap count
ONE gap — the predicate divergence inside sub_821CB030's
body. However, the predicate divergence likely has a complex
upstream cause that involves either filesystem state or
guest-memory state initialized by another tid that ALSO has the
same kind of subtle drift. So:
- disjoint divergence sites in this trajectory: 1 (control-flow branch in sub_821CB030 chain).
- disjoint hypothesized causes: 2-3 (file attribute return value, shared-memory state from tid=10/5 dispatch worker, or vtable install bypass at upstream).
This is NOT the "50+ disjoint missing kernel patterns" failure mode predicted in tripstone 7. It's a single branch divergence with multiple candidate first-causes. Methodology pivot to Option C (critical-path sweep) is NOT indicated; targeted iterate per candidate first-cause IS indicated.
Recommended next concrete action
Iterate plan, ordered by minimum LOC + maximum signal:
Iterate Step 2.A — branch-probe inside sub_821CB030 body (~50-80 LOC ours + ~50 LOC canary)
Use existing audit_61_branch_probe_pcs to pin the divergent
branch inside sub_821CB030 / sub_821CBA08 / sub_821CC3F8.
Specifically probe every bne/beq PC inside these guest fns
that has reachable bl NtSetEvent on one branch and bl NtReleaseSemaphore on the other. Use sylpheed.db cross-references
to enumerate bl 0x824AA2F0 (NtSetEvent wrapper) and bl 0x824AB158
(NtReleaseSemaphore wrapper) call sites in these fns.
Capture both engines, diff branch-counts. The first divergent branch is the answer.
Iterate Step 2.B — args/return-value capture for the 9 NtQueryFullAttributesFile calls on canary tid=17 (~30 LOC canary)
Extend audit_61 or write a dedicated probe to log r3 (filename
buffer) and r0 (NTSTATUS return) for every
NtQueryFullAttributesFile call inside this 154-ms window. Compare
against ours's 1 call. If file-attribute return values differ on a
shared file, that's the trigger.
Iterate Step 2.C — guest-memory read-watch on the ctx struct (~20 LOC, reuses AUDIT-068 S3 read-probe)
Use audit_68_host_mem_read_probe to sample the worker ctx
(0xbc365620 in canary / 0x4024d640 in ours) at ~1ms cadence in
the window [1.7..2.1s]. Identify whether a flag/byte in the ctx
differs at the predicate-read time. This pinpoints the actual
read location if Step 2.A's branch-probe doesn't immediately reveal
the predicate source.
Tripstones honored
- #28: verified canary's actual behavior by reading the jsonl directly; the AUDIT-069 S5 framing is corroborated, not assumed.
- #32: contention regions may jitter; the 240+ CS enter/leave pairs in ours tid=13 are NOT identical to canary tid=17's count (607 vs 58). Differential here may include scheduling-determinism noise. Mitigation: cross-validate with 2nd cold canary run if Step 2.A doesn't immediately converge.
- #39: matched-prefix did NOT drive this; first-draw progression is the goal.
- #5 of plan tripstones: AUDIT-069 S5 "25 producers" finding IS downstream of Step 2's identified branch divergence. The 25 producers correspond to canary tid=17's loop iterations that ours tid=13 doesn't reach.
Cascade
- A (acquire canary install-epoch event log): ✓ HIGH (16,175 kernel calls captured cleanly in [9..11s] window).
- B (identify install-trigger sequence in canary): ✓ HIGH (canary tid=6 spawns sub_821748F0 at host_ns=1.935s, join-wait returns at 2.092s). The "install trigger" is not a single kernel call but the completion of worker tid=17, which causes the join wait to release tid=6 into the rest of the main-loop dispatch.
- C (identify where ours diverges from canary): ✓ HIGH (ours tid=13 wedges 3ms into its lifetime, vs canary tid=17 running 154ms; first kernel-call sequence divergence at the NtSetEvent vs NtReleaseSemaphore branch).
- D (attribute the divergence to a specific cause): MEDIUM (3 candidate root causes; need iterate 2.A/2.B/2.C to disambiguate).
- E (produce Δ-gap count + roadmap): ✓ HIGH (1 divergence site; 3 candidate first-causes; ~50-200 LOC iterate plan).
Honest assessment
- The wedge framing established by AUDIT-049 .. AUDIT-069 holds.
- Step 2 narrows the trigger from "the install epoch at 9.4s" down to "the worker tid=13's first wait at 1.73s" — a 7-order-of-magnitude refinement in time.
- The 25-producer finding from AUDIT-069 S5 IS a consequence of the Step 2 branch divergence: each missing iteration of canary tid=17's load loop is a missing "other producer" signal.
- The fix is NOT to mirror canary's kernel calls; ours implements
them correctly. The fix is to find why ours's
sub_821CB030predicate evaluates differently. - Confidence that the fix is a single guest-state correction (file-attribute mismatch, ctx-field uninitialized, or shared-memory flag race): MEDIUM.
Artifacts produced this session
All under xenia-rs/audit-runs/review-a-step2-natural-trigger/:
extract_canary_install_window.py— scanner for canary in [9..11s].extract_canary_tid6_pre_install.py— scanner for tid=6 [1.5..11s].extract_canary_worker_tid.py— locates spawn worker by hsid.extract_canary_tid17_full.py— tid=17 timeline + diff vs ours tid=13.extract_ours_tid1_full.py— ours tid=1 timeline.extract_ours_tid13_final.py— ours tid=13 timeline.find_signaler.py— finds canary tid=17 wait signalers.ours_signal_counts.py— ours per-tid signal counts.canary-tid6-install-window.csv— 32,383 events.canary-tid6-install-window.summary— kernel.call frequencies.canary-tid6-from-anchor.csv— 139,202 events.canary-tid17-worker-timeline.csv— 4140 events.ours-tid13-full-timeline.csv— 435 events.ours-tid1-final-150.csv— last 150 events on ours tid=1.ours-tid1-summary— kernel.call frequencies.canary-tid17-waits.csv— 29 wait.begin events with handle binding.differential-canary-tid17-vs-ours-tid13.txt— full call-name diff.step2-report.md— this report.
LOC delta in this session: 0 to xenia-rs/canary engines; 0 to sylpheed.db; ~600 LOC analysis scripts under audit-runs/.