handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,43 @@
Canary tid=6 install-window event summary [host_ns 9000000000..11000000000]
=== Top kernel.call by frequency ===
7525 RtlLeaveCriticalSection
7523 RtlEnterCriticalSection
237 XamInputGetCapabilities
178 NtWaitForSingleObjectEx
162 NtSetEvent
98 XNotifyGetNext
80 NtReleaseSemaphore
79 XamInputGetKeystrokeEx
79 XamInputGetState
43 RtlInitializeCriticalSection
19 NtAllocateVirtualMemory
18 NtCreateEvent
16 NtReadFile
16 RtlNtStatusToDosError
14 MmAllocatePhysicalMemoryEx
12 KeRaiseIrqlToDpcLevel
12 KeAcquireSpinLockAtRaisedIrql
12 KeReleaseSpinLockFromRaisedIrql
12 KfLowerIrql
5 ExCreateThread
4 RtlInitializeCriticalSectionAndSpinCount
4 ObReferenceObjectByHandle
4 KeSetAffinityThread
4 ObDereferenceObject
3 NtClose
2 RtlInitAnsiString
2 NtCreateFile
2 NtQueryInformationFile
2 KeQueryPerformanceFrequency
1 NtResumeThread
1 NtCreateSemaphore
1 XamGetSystemVersion
1 XMsgInProcessCall
1 XMsgStartIORequestEx
1 XamResetInactivity
1 XamEnableInactivityProcessing
1 XGetVideoMode
=== Unique kernel.call names ===
37

View File

@@ -0,0 +1,30 @@
Differential canary tid=17 (sub_821748F0 worker) vs ours tid=13
canary tid=17 total events: 4140, ours tid=13 total: 435
canary tid=17 duration: 1.9378s..2.0918s (154ms, terminates)
ours tid=13 duration: until wedge, never terminates
kernel.call canary ours delta
RtlEnterCriticalSection 607 58 +549
RtlLeaveCriticalSection 607 58 +549
NtClose 19 2 +17
NtCreateEvent 18 3 +15
NtDuplicateObject 16 2 +14
RtlInitAnsiString 11 1 +10
NtWaitForSingleObjectEx 11 2 +9
RtlInitializeCriticalSectionAndSpinCount 15 6 +9
NtQueryFullAttributesFile 9 1 +8
NtReleaseSemaphore 9 1 +8
RtlNtStatusToDosError 9 1 +8
NtSetEvent 8 1 +7
KeTlsSetValue 2 0 +2
NtCreateFile 2 0 +2
ExCreateThread 1 0 +1
ExTerminateThread 1 0 +1
KeQueryPerformanceFrequency 0 1 -1
KeTlsGetValue 1 0 +1
ExGetXConfigSetting 1 1 +0
KeSetAffinityThread 1 1 +0
ObDereferenceObject 1 1 +0
ObReferenceObjectByHandle 1 1 +0
XNotifyPositionUI 1 1 +0

View File

@@ -0,0 +1,121 @@
#!/usr/bin/env python3
"""Extract tid=6 kernel call sequence from canary-jitter-1.jsonl in install window.
Install window: host_ns in [9_000_000_000, 11_000_000_000] (9s..11s).
Per AUDIT-068 S3: vtable install at ~9.4-9.6s, sub_825070F0 spawn at 10.383s.
Outputs are written next to this script.
"""
import json
import os
import sys
from collections import Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
T_LO = 9_000_000_000
T_HI = 11_000_000_000
TARGET_TIDS = {6}
import_calls = []
kernel_calls = []
kernel_returns = []
handle_creates = []
thread_events = []
mem_events = []
other_events = []
count_in_window = 0
count_total = 0
with open(INPUT, "r") as f:
for line in f:
count_total += 1
if '"host_ns":' not in line:
continue
try:
i = line.index('"host_ns":') + len('"host_ns":')
j = i
while j < len(line) and (line[j].isdigit() or line[j] == '-'):
j += 1
host_ns = int(line[i:j])
except (ValueError, IndexError):
continue
if host_ns < T_LO:
continue
if host_ns >= T_HI:
break
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") not in TARGET_TIDS:
continue
count_in_window += 1
kind = ev.get("kind", "")
if kind == "import.call":
import_calls.append(ev)
elif kind == "kernel.call":
kernel_calls.append(ev)
elif kind == "kernel.return":
kernel_returns.append(ev)
elif kind == "handle.create":
handle_creates.append(ev)
elif kind in ("thread.create", "thread.exit"):
thread_events.append(ev)
elif kind in ("mem.write", "mem.read"):
mem_events.append(ev)
else:
other_events.append(ev)
print(f"Total lines scanned: {count_total}")
print(f"Events in window for tid in {TARGET_TIDS}: {count_in_window}")
print(f" import.call: {len(import_calls)}")
print(f" kernel.call: {len(kernel_calls)}")
print(f" kernel.return: {len(kernel_returns)}")
print(f" handle.create: {len(handle_creates)}")
print(f" thread.create/exit: {len(thread_events)}")
print(f" mem.read/write: {len(mem_events)}")
print(f" other: {len(other_events)}")
with open(os.path.join(OUTDIR, "canary-tid6-install-window.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,raw_handle,detail\n")
all_evts = []
for ev in kernel_calls:
name = ev["payload"].get("name", "?")
detail = json.dumps(ev["payload"].get("args_resolved") or ev["payload"].get("args", {}))[:200]
all_evts.append((ev["host_ns"], ev["tid_event_idx"], "kernel.call", name, "", detail))
for ev in kernel_returns:
name = ev["payload"].get("name", "?")
rv = ev["payload"].get("return_value", "")
st = ev["payload"].get("status", "")
detail = f"rv={rv} status={st}"
all_evts.append((ev["host_ns"], ev["tid_event_idx"], "kernel.return", name, "", detail))
for ev in handle_creates:
rh = ev["payload"].get("raw_handle_id", "")
ot = ev["payload"].get("object_type", "")
detail = f"object_type={ot}"
all_evts.append((ev["host_ns"], ev["tid_event_idx"], "handle.create", "", rh, detail))
for ev in thread_events:
detail = json.dumps(ev["payload"])[:200]
all_evts.append((ev["host_ns"], ev["tid_event_idx"], ev["kind"], "", "", detail))
all_evts.sort()
for ev in all_evts:
host_ns, idx, kind, name, rh, detail = ev
detail_escaped = detail.replace('"', '""')
f.write(f'{host_ns},{idx},{kind},{name},{rh},"{detail_escaped}"\n')
print(f"Wrote canary-tid6-install-window.csv with {len(all_evts)} ordered events.")
call_counts = Counter()
for ev in kernel_calls:
call_counts[ev["payload"].get("name", "?")] += 1
with open(os.path.join(OUTDIR, "canary-tid6-install-window.summary"), "w") as f:
f.write(f"Canary tid=6 install-window event summary [host_ns {T_LO}..{T_HI}]\n")
f.write(f"\n=== Top kernel.call by frequency ===\n")
for name, c in call_counts.most_common(80):
f.write(f" {c:6d} {name}\n")
f.write(f"\n=== Unique kernel.call names ===\n")
f.write(f" {len(call_counts)}\n")
print(f"Wrote canary-tid6-install-window.summary")

View File

@@ -0,0 +1,117 @@
#!/usr/bin/env python3
"""Capture canary tid=17 (the sub_821748F0 worker) FULL timeline.
Lifetime: 1.9378s to 2.0918s = 154ms.
4140 events total. Compare to ours's tid=13 which has only 80 events before wedge.
"""
import json
import os
from collections import Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
TARGET_TID = 17
T_LO = 1_900_000_000
T_HI = 2_200_000_000
evts = []
with open(INPUT, "r") as f:
for line in f:
if '"host_ns":' not in line:
continue
try:
i = line.index('"host_ns":') + len('"host_ns":')
j = i
while j < len(line) and (line[j].isdigit() or line[j] == '-'):
j += 1
host_ns = int(line[i:j])
except (ValueError, IndexError):
continue
if host_ns < T_LO:
continue
if host_ns >= T_HI:
break
if f'"tid":{TARGET_TID},' not in line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") != TARGET_TID:
continue
evts.append(ev)
print(f"canary tid={TARGET_TID}: {len(evts)} events")
if evts:
print(f" first host_ns: {evts[0]['host_ns']/1e9:.4f}s")
print(f" last host_ns: {evts[-1]['host_ns']/1e9:.4f}s")
# Top kernel calls.
sum_calls = Counter()
for ev in evts:
if ev["kind"] == "kernel.call":
sum_calls[ev["payload"].get("name", "?")] += 1
print(f"\n=== Top kernel.calls ({len(sum_calls)} unique) ===")
for n, c in sum_calls.most_common(40):
print(f" {c:5d} {n}")
# Save timeline.
with open(os.path.join(OUTDIR, f"canary-tid{TARGET_TID}-worker-timeline.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,detail\n")
for ev in evts:
name = ev["payload"].get("name", "")
detail = json.dumps(ev["payload"])[:400].replace('"', '""')
f.write(f'{ev["host_ns"]},{ev["tid_event_idx"]},{ev["kind"]},{name},"{detail}"\n')
# Compare against ours tid=13.
print("\n=== Now comparing ours tid=13 ===")
OURS_INPUT = "/home/fabi/RE - Project Sylpheed/xenia-rs/audit-runs/phase-w-wedge-reattack/ours-postfix.jsonl"
ours_evts = []
with open(OURS_INPUT, "r") as f:
for line in f:
if f'"tid":13' not in line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") != 13:
continue
ours_evts.append(ev)
print(f"ours tid=13: {len(ours_evts)} events")
if ours_evts:
print(f" first host_ns: {ours_evts[0]['host_ns']/1e9:.4f}s")
print(f" last host_ns: {ours_evts[-1]['host_ns']/1e9:.4f}s")
ours_sum = Counter()
for ev in ours_evts:
if ev["kind"] == "kernel.call":
ours_sum[ev["payload"].get("name", "?")] += 1
print(f"\n=== ours tid=13 kernel.calls ({len(ours_sum)} unique) ===")
for n, c in ours_sum.most_common(40):
print(f" {c:5d} {n}")
# Differential table.
all_names = set(sum_calls.keys()) | set(ours_sum.keys())
print(f"\n=== Differential canary tid=17 vs ours tid=13 ===")
print(f"{'kernel.call':<45s} {'canary':>8s} {'ours':>8s} {'delta':>8s}")
diffs = []
for n in sorted(all_names):
cc = sum_calls.get(n, 0)
oc = ours_sum.get(n, 0)
diffs.append((cc - oc, n, cc, oc))
diffs.sort(key=lambda x: -abs(x[0]))
for delta, n, cc, oc in diffs[:80]:
print(f" {n:<45s} {cc:>8d} {oc:>8d} {delta:>+8d}")
with open(os.path.join(OUTDIR, "differential-canary-tid17-vs-ours-tid13.txt"), "w") as f:
f.write(f"Differential canary tid=17 (sub_821748F0 worker) vs ours tid=13\n\n")
f.write(f"canary tid=17 total events: {len(evts)}, ours tid=13 total: {len(ours_evts)}\n")
f.write(f"canary tid=17 duration: 1.9378s..2.0918s (154ms, terminates)\n")
f.write(f"ours tid=13 duration: until wedge, never terminates\n\n")
f.write(f"{'kernel.call':<45s} {'canary':>8s} {'ours':>8s} {'delta':>8s}\n")
for delta, n, cc, oc in diffs:
f.write(f" {n:<45s} {cc:>8d} {oc:>8d} {delta:>+8d}\n")

View File

@@ -0,0 +1,146 @@
#!/usr/bin/env python3
"""Extract canary tid=6 timeline at and just before the same point ours wedges.
The matched-prefix endpoint is when ours's tid=1 calls
NtWaitForSingleObjectEx on tid=13.handle at host_ns=1.727s.
In canary, tid=6's analog wait is the sub_82173990 KeWaitForSingleObject
INFINITE — but in canary it completes when the spawned worker (tid=17 =
sub_821748F0 body) terminates. Need to find that wait in canary's stream.
Output: ordered timeline of canary tid=6 from spawn-of-sub_821748F0
through install-epoch.
"""
import json
import os
from collections import Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
TARGET_TID = 6
# Capture canary tid=6 from t=1.5s through t=11s (sub_821748F0 spawn through worker fan-out).
T_LO = 1_500_000_000
T_HI = 11_000_000_000
kernel_calls = []
kernel_returns = []
import_calls = []
handle_creates = []
thread_events = []
wait_events = []
other = []
with open(INPUT, "r") as f:
for line in f:
if '"host_ns":' not in line:
continue
try:
i = line.index('"host_ns":') + len('"host_ns":')
j = i
while j < len(line) and (line[j].isdigit() or line[j] == '-'):
j += 1
host_ns = int(line[i:j])
except (ValueError, IndexError):
continue
if host_ns < T_LO:
continue
if host_ns >= T_HI:
break
# Quick tid filter.
if f'"tid":{TARGET_TID},' not in line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") != TARGET_TID:
continue
kind = ev.get("kind", "")
if kind == "kernel.call":
kernel_calls.append(ev)
elif kind == "kernel.return":
kernel_returns.append(ev)
elif kind == "handle.create":
handle_creates.append(ev)
elif kind in ("thread.create", "thread.exit"):
thread_events.append(ev)
elif kind == "import.call":
import_calls.append(ev)
elif kind in ("wait.begin", "wait.end", "wait.wake"):
wait_events.append(ev)
else:
other.append(ev)
print(f"canary tid={TARGET_TID} in window [{T_LO/1e9}..{T_HI/1e9}s]")
print(f" kernel.call: {len(kernel_calls)}")
print(f" kernel.return: {len(kernel_returns)}")
print(f" handle.create: {len(handle_creates)}")
print(f" thread.create/exit: {len(thread_events)}")
print(f" import.call: {len(import_calls)}")
print(f" wait.* {len(wait_events)}")
# Save full timeline.
all_evts = []
for ev in kernel_calls + kernel_returns + handle_creates + thread_events + wait_events:
all_evts.append((ev["host_ns"], ev["tid_event_idx"], ev["kind"], ev["payload"]))
all_evts.sort()
# Find anchor points:
# 1. ExCreateThread with entry=0x821748f0 (the matched spawn site).
# 2. NtWaitForSingleObjectEx on the resulting handle (the analog of ours's wedge).
# 3. Wait return time.
# 4. Subsequent calls that lead to sub_825070F0 fan-out at host_ns ~10.383s.
print("\n=== Looking for anchor: ExCreateThread on entry 0x821748f0 ===")
anchor_idx = -1
anchor_ns = -1
anchor_handle = None
for i, (host_ns, idx, kind, payload) in enumerate(all_evts):
if kind == "thread.create":
entry = payload.get("entry_pc", "")
if entry == "0x821748f0" or entry == "0x821748F0":
print(f" Found at idx={idx} host_ns={host_ns} ({host_ns/1e9:.3f}s)")
print(f" payload={json.dumps(payload)}")
anchor_idx = i
anchor_ns = host_ns
anchor_handle = payload.get("handle_semantic_id")
break
# Locate the next NtWaitForSingleObjectEx on tid=6 - that's the join wait.
print("\n=== Finding the join-wait on tid=6 after ExCreateThread ===")
for i in range(anchor_idx, min(anchor_idx + 200, len(all_evts))):
host_ns, idx, kind, payload = all_evts[i]
if kind == "wait.begin":
if anchor_handle and anchor_handle in payload.get("handles_semantic_ids", []):
print(f" Join wait.begin at idx={idx} host_ns={host_ns} ({host_ns/1e9:.3f}s)")
print(f" timeout_ns={payload.get('timeout_ns')}")
join_wait_start_ns = host_ns
join_wait_start_eidx = i
break
elif kind == "kernel.call" and payload.get("name") == "KeWaitForSingleObject":
print(f" KeWait at idx={idx} host_ns={host_ns} ({host_ns/1e9:.3f}s)")
# Look for wait.end / wait.wake.
print("\n=== Finding the join-wait completion ===")
for i in range(anchor_idx, len(all_evts)):
host_ns, idx, kind, payload = all_evts[i]
if kind in ("wait.end", "wait.wake"):
if anchor_handle and anchor_handle in payload.get("handles_semantic_ids", []):
print(f" Wait wake at idx={idx} host_ns={host_ns} ({host_ns/1e9:.3f}s) kind={kind}")
print(f" payload={json.dumps(payload)[:300]}")
join_wait_end_ns = host_ns
join_wait_end_eidx = i
wait_duration_ns = host_ns - join_wait_start_ns
print(f" DURATION: {wait_duration_ns/1e9:.3f} s")
break
# Save the full timeline window from join-wait spawn (anchor) through end.
with open(os.path.join(OUTDIR, "canary-tid6-from-anchor.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,detail\n")
for host_ns, idx, kind, payload in all_evts[anchor_idx:]:
name = payload.get("name", "")
detail = json.dumps(payload)[:400].replace('"', '""')
f.write(f'{host_ns},{idx},{kind},{name},"{detail}"\n')
print(f"\nWrote canary-tid6-from-anchor.csv with {len(all_evts) - anchor_idx} events.")

View File

@@ -0,0 +1,108 @@
#!/usr/bin/env python3
"""Find the canary tid that runs the sub_821748F0 worker body and see
what it does in its 155ms lifetime (host_ns 1.935s to ~2.09s).
The spawn semantic-id is 3bd922fbb385c2c9.
"""
import json
import os
from collections import Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
TARGET_HSID = "3bd922fbb385c2c9"
# First pass: locate this thread's tid by matching the spawn handle.
target_tid = None
with open(INPUT, "r") as f:
for line in f:
if '"thread.create"' in line and TARGET_HSID in line:
ev = json.loads(line)
if ev.get("payload", {}).get("handle_semantic_id") == TARGET_HSID:
child_handle = TARGET_HSID
print(f" spawn at host_ns={ev['host_ns']}, entry={ev['payload'].get('entry_pc')}")
# The thread's own tid will be in events the thread itself emits.
# Look for the FIRST event whose tid is NOT the parent and whose
# subsequent guest_pc/handle matches this child handle.
break
# We need to find tid by looking at next thread.create's tid_event_idx=0 emission.
# Simpler: scan events after this point for new tid that emits with this handle.
# Canary's convention: each new thread emits its first events under its own tid.
# Better approach: find thread.create events with handle_semantic_id=TARGET_HSID,
# then find the smallest tid > 6 that emits AFTER that timestamp.
T_LO = 1_900_000_000 # 1.9s
T_HI = 3_000_000_000 # 3.0s
events_by_tid = {}
with open(INPUT, "r") as f:
for line in f:
if '"host_ns":' not in line:
continue
try:
i = line.index('"host_ns":') + len('"host_ns":')
j = i
while j < len(line) and (line[j].isdigit() or line[j] == '-'):
j += 1
host_ns = int(line[i:j])
except (ValueError, IndexError):
continue
if host_ns < T_LO:
continue
if host_ns >= T_HI:
break
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
tid = ev.get("tid")
if tid is None:
continue
events_by_tid.setdefault(tid, []).append(ev)
# tid=6 is parent. Look for new tids that emit between 1.935s and 2.1s.
print("\n=== tids active in window [1.9..3.0s] ===")
for tid, evts in sorted(events_by_tid.items()):
if not evts:
continue
first_ns = evts[0]["host_ns"]
last_ns = evts[-1]["host_ns"]
print(f" tid={tid}: {len(evts)} events from {first_ns/1e9:.4f}s to {last_ns/1e9:.4f}s")
# Find tid that spawns at ~1.935s and ends at ~2.09s.
candidate_tid = None
for tid, evts in events_by_tid.items():
if tid == 6:
continue
first_ns = evts[0]["host_ns"]
if first_ns < 1_930_000_000 or first_ns > 1_960_000_000:
continue
# Check exit time.
last_ns = evts[-1]["host_ns"]
print(f" CANDIDATE tid={tid}: first={first_ns/1e9:.4f}s last={last_ns/1e9:.4f}s events={len(evts)}")
candidate_tid = tid
if candidate_tid is None:
print("\nNo single candidate found.")
else:
print(f"\n=== Worker tid={candidate_tid} timeline (sub_821748F0 body) ===")
evts = events_by_tid[candidate_tid]
# All kernel.calls and thread.create.
summary = Counter()
for ev in evts:
if ev["kind"] == "kernel.call":
summary[ev["payload"].get("name", "?")] += 1
print(f"Total events: {len(evts)}")
print(f"Top kernel.call names:")
for n, c in summary.most_common(40):
print(f" {c:5d} {n}")
# Save full timeline.
with open(os.path.join(OUTDIR, f"canary-worker-tid{candidate_tid}-timeline.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,detail\n")
for ev in evts:
name = ev["payload"].get("name", "")
detail = json.dumps(ev["payload"])[:400].replace('"', '""')
f.write(f'{ev["host_ns"]},{ev["tid_event_idx"]},{ev["kind"]},{name},"{detail}"\n')
print(f"Wrote canary-worker-tid{candidate_tid}-timeline.csv")

View File

@@ -0,0 +1,37 @@
#!/usr/bin/env python3
"""Extract ours tid=13 timeline plus matching canary tid=17 prefix."""
import json
import os
OURS = "/home/fabi/RE - Project Sylpheed/xenia-rs/audit-runs/phase-w-wedge-reattack/ours-postfix.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
ours_evts = []
with open(OURS, "r") as f:
for line in f:
if '"tid":13' not in line:
continue
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") != 13:
continue
ours_evts.append(ev)
print(f"ours tid=13: {len(ours_evts)} events")
print(f"\n=== FULL ours tid=13 timeline (kernel.call, handle.create, wait.*) ===")
with open(os.path.join(OUTDIR, "ours-tid13-full-timeline.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,detail\n")
for ev in ours_evts:
name = ev["payload"].get("name", "") if isinstance(ev.get("payload"), dict) else ""
detail = json.dumps(ev.get("payload", {}))[:400].replace('"', '""')
f.write(f'{ev["host_ns"]},{ev["tid_event_idx"]},{ev["kind"]},{name},"{detail}"\n')
print(f"Wrote ours-tid13-full-timeline.csv")
# Show the last 40 events.
print("\n=== Last 40 events on ours tid=13 ===")
for ev in ours_evts[-40:]:
name = ev["payload"].get("name", "") if isinstance(ev.get("payload"), dict) else ""
detail = json.dumps(ev.get("payload", {}))[:200]
print(f" {ev['host_ns']/1e9:.5f}s idx={ev['tid_event_idx']} {ev['kind']:18s} {name:30s} {detail}")

View File

@@ -0,0 +1,86 @@
#!/usr/bin/env python3
"""Extract ours tid=1 timeline, focused on the final wedge region.
Ours wedges by ~1.7s, so we want the FINAL kernel-call sequence ours emits
on tid=1 before the wait that wedges. Compare against canary tid=6 at a
matching matched-prefix point.
"""
import json
import os
from collections import Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-rs/audit-runs/phase-w-wedge-reattack/ours-postfix.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
TARGET_TID = 1
kernel_calls = []
kernel_returns = []
handle_creates = []
thread_events = []
import_calls = []
wait_events = []
other = []
last_idx = -1
last_host_ns = 0
with open(INPUT, "r") as f:
for line in f:
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
if ev.get("tid") != TARGET_TID:
continue
kind = ev.get("kind", "")
last_idx = ev.get("tid_event_idx", last_idx)
last_host_ns = ev.get("host_ns", last_host_ns)
if kind == "kernel.call":
kernel_calls.append(ev)
elif kind == "kernel.return":
kernel_returns.append(ev)
elif kind == "handle.create":
handle_creates.append(ev)
elif kind in ("thread.create", "thread.exit"):
thread_events.append(ev)
elif kind == "import.call":
import_calls.append(ev)
elif kind in ("wait.begin", "wait.end", "wait.wake"):
wait_events.append(ev)
else:
other.append(ev)
print(f"Ours tid={TARGET_TID} total kernel.call: {len(kernel_calls)}")
print(f"Ours tid={TARGET_TID} last tid_event_idx: {last_idx}")
print(f"Ours tid={TARGET_TID} last host_ns: {last_host_ns} ({last_host_ns/1e9:.3f} s)")
call_counts = Counter()
for ev in kernel_calls:
call_counts[ev["payload"].get("name", "?")] += 1
# Show LAST 100 events on tid=1 to identify the wedge approach.
last_evts = []
for ev in kernel_calls + kernel_returns + wait_events + handle_creates + thread_events:
last_evts.append((ev["host_ns"], ev["tid_event_idx"], ev["kind"], ev["payload"]))
last_evts.sort()
last_evts = last_evts[-150:]
with open(os.path.join(OUTDIR, "ours-tid1-final-150.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,name,detail\n")
for host_ns, idx, kind, payload in last_evts:
name = payload.get("name", "")
detail = json.dumps(payload)[:300].replace('"', '""')
f.write(f'{host_ns},{idx},{kind},{name},"{detail}"\n')
with open(os.path.join(OUTDIR, "ours-tid1-summary"), "w") as f:
f.write(f"Ours tid={TARGET_TID} summary\n")
f.write(f"\nTotal kernel.call: {len(kernel_calls)}\n")
f.write(f"Last tid_event_idx: {last_idx}\n")
f.write(f"Last host_ns: {last_host_ns} ({last_host_ns/1e9:.3f} s)\n")
f.write(f"\n=== Top kernel.call by frequency ===\n")
for name, c in call_counts.most_common(80):
f.write(f" {c:6d} {name}\n")
f.write(f"\n=== Unique kernel.call names ===\n")
f.write(f" {len(call_counts)}\n")
print(f"Wrote ours-tid1-final-150.csv")
print(f"Wrote ours-tid1-summary")

View File

@@ -0,0 +1,135 @@
#!/usr/bin/env python3
"""Find what signals canary tid=17's NtWaitForSingleObjectEx event.
The wait at idx=432-435 is for an event created by tid=17 itself.
Find handle_semantic_ids for events tid=17 creates, then find
wait.wake / NtSetEvent on those handles from OTHER tids.
"""
import json
import os
from collections import defaultdict, Counter
INPUT = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
OUTDIR = os.path.dirname(os.path.abspath(__file__))
# Look at window [1.9..2.1s] for all events.
T_LO = 1_900_000_000
T_HI = 2_100_000_000
# Handles created by tid=17:
tid17_creates = []
# wait.begin by tid=17:
tid17_waits = []
# NtSetEvent / NtReleaseSemaphore by ALL tids in window with handle ids:
signal_events = []
# wait.wake/end events with handle ids:
wake_events = []
# Also track all handle.create events in window.
all_handles = {}
with open(INPUT, "r") as f:
for line in f:
if '"host_ns":' not in line:
continue
try:
i = line.index('"host_ns":') + len('"host_ns":')
j = i
while j < len(line) and (line[j].isdigit() or line[j] == '-'):
j += 1
host_ns = int(line[i:j])
except (ValueError, IndexError):
continue
if host_ns < T_LO:
continue
if host_ns >= T_HI:
break
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
kind = ev.get("kind", "")
tid = ev.get("tid")
payload = ev.get("payload", {})
if kind == "handle.create":
hsid = payload.get("handle_semantic_id")
rh = payload.get("raw_handle_id")
ot = payload.get("object_type")
all_handles[hsid] = {"raw_handle_id": rh, "object_type": ot, "creator_tid": tid, "host_ns": host_ns}
if tid == 17:
tid17_creates.append(ev)
elif kind == "wait.begin" and tid == 17:
tid17_waits.append(ev)
elif kind in ("wait.wake", "wait.end"):
wake_events.append(ev)
elif kind == "import.call":
n = payload.get("name", "")
if n in ("NtSetEvent", "NtReleaseSemaphore", "NtSetEventBoostPriority", "KeSetEvent"):
signal_events.append((host_ns, ev["tid_event_idx"], tid, n, payload))
print(f"tid=17 handle.create events: {len(tid17_creates)}")
print(f"tid=17 wait.begin events: {len(tid17_waits)}")
print(f"signal events (NtSetEvent/NtReleaseSemaphore/...): {len(signal_events)}")
print(f"wait.wake/end events: {len(wake_events)}")
# Show the wait.begin events on tid=17.
print("\n=== tid=17 wait.begin events ===")
for ev in tid17_waits[:30]:
pl = ev["payload"]
hids = pl.get("handles_semantic_ids", [])
timeout = pl.get("timeout_ns")
# Resolve handle:
info = [all_handles.get(h, {}) for h in hids]
print(f" t={ev['host_ns']/1e9:.5f}s idx={ev['tid_event_idx']} handles={hids} timeout={timeout}")
for h, inf in zip(hids, info):
print(f" {h} -> {inf}")
# Wake events for tid=17 in window.
print("\n=== wake events targeting tid=17 ===")
for ev in wake_events:
pl = ev["payload"]
if ev.get("tid") == 17:
print(f" t={ev['host_ns']/1e9:.5f}s idx={ev['tid_event_idx']} kind={ev['kind']} payload={json.dumps(pl)[:250]}")
# Now find signalers of tid=17's wait handles.
# Build set of hsid that tid=17 waited on.
wait_hsids = set()
for ev in tid17_waits:
for h in ev["payload"].get("handles_semantic_ids", []):
wait_hsids.add(h)
print(f"\n=== Unique handle semantic IDs waited on by tid=17: {len(wait_hsids)} ===")
for h in list(wait_hsids)[:20]:
info = all_handles.get(h, {})
print(f" {h} -> {info}")
# Check: for each, who created it? Object type? In this window's signal_events, who signals?
# The NtSetEvent / NtReleaseSemaphore events don't carry handle info in payload by default
# (payload is empty args:{}). Print first few to confirm.
print("\n=== Sample signal events (first 10) ===")
for s in signal_events[:10]:
print(f" t={s[0]/1e9:.5f}s idx={s[1]} tid={s[2]} name={s[3]} payload_keys={list(s[4].keys())}")
# Count signal events per tid in the window.
sig_counts = Counter()
for s in signal_events:
sig_counts[(s[2], s[3])] += 1
print(f"\n=== Signal event counts by (tid, name) in window [{T_LO/1e9}..{T_HI/1e9}s] ===")
for (tid, name), c in sorted(sig_counts.items()):
print(f" tid={tid:3d} {name:30s} {c}")
# Save tid=17 timeline showing wait.begin/wait.wake.
with open(os.path.join(OUTDIR, "canary-tid17-waits.csv"), "w") as f:
f.write("host_ns,tid_event_idx,kind,handles,timeout_ns,result\n")
for ev in tid17_waits:
pl = ev["payload"]
hids = ",".join(pl.get("handles_semantic_ids", []))
timeout = pl.get("timeout_ns", "")
f.write(f'{ev["host_ns"]},{ev["tid_event_idx"]},{ev["kind"]},"{hids}",{timeout},\n')
for ev in wake_events:
if ev.get("tid") != 17:
continue
pl = ev["payload"]
hids = ",".join(pl.get("handles_semantic_ids", []))
f.write(f'{ev["host_ns"]},{ev["tid_event_idx"]},{ev["kind"]},"{hids}","","{json.dumps(pl)[:200]}"\n')
print(f"\nWrote canary-tid17-waits.csv")

View File

@@ -0,0 +1,87 @@
Ours tid=1 summary
Total kernel.call: 36136
Last tid_event_idx: 108506
Last host_ns: 1727614433 (1.728 s)
=== Top kernel.call by frequency ===
17834 RtlEnterCriticalSection
17834 RtlLeaveCriticalSection
65 RtlInitializeCriticalSectionAndSpinCount
33 KeRaiseIrqlToDpcLevel
29 RtlInitializeCriticalSection
27 KeAcquireSpinLockAtRaisedIrql
27 KeReleaseSpinLockFromRaisedIrql
26 KfLowerIrql
22 NtClose
20 NtWriteFile
19 NtCreateEvent
17 RtlInitAnsiString
14 NtSetEvent
13 NtWaitForSingleObjectEx
11 MmAllocatePhysicalMemoryEx
9 RtlNtStatusToDosError
9 ExCreateThread
9 NtDuplicateObject
8 NtCreateFile
8 NtReleaseSemaphore
7 NtQueryFullAttributesFile
5 KeQueryPerformanceFrequency
3 KeGetCurrentProcessType
3 NtOpenFile
3 KeEnterCriticalRegion
3 KeLeaveCriticalRegion
3 NtCreateSemaphore
3 KeSetBasePriorityThread
3 ObReferenceObjectByHandle
3 ObDereferenceObject
2 RtlImageXexHeaderField
2 NtAllocateVirtualMemory
2 XexCheckExecutablePrivilege
2 KeTlsAlloc
2 KeTlsSetValue
2 KeQuerySystemTime
2 NtReadFile
2 NtDeviceIoControlFile
2 NtQueryVolumeInformationFile
2 MmFreePhysicalMemory
2 VdInitializeEngines
2 ExRegisterTitleTerminateNotification
2 ExGetXConfigSetting
2 VdSetSystemCommandBufferGpuIdentifierAddress
2 VdQueryVideoMode
2 VdQueryVideoFlags
2 VdRetrainEDRAM
2 NtResumeThread
2 KeResumeThread
1 XGetAVPack
1 XeCryptSha
1 XeKeysConsolePrivateKeySign
1 XamTaskSchedule
1 XamTaskCloseHandle
1 KeWaitForSingleObject
1 KeResetEvent
1 XamContentCreateEnumerator
1 XamNotifyCreateListener
1 VdShutdownEngines
1 VdSetGraphicsInterruptCallback
1 MmGetPhysicalAddress
1 VdInitializeRingBuffer
1 VdEnableRingBufferRPtrWriteBack
1 KiApcNormalRoutineNop
1 VdCallGraphicsNotificationRoutines
1 VdRetrainEDRAMWorker
1 VdIsHSIOTrainingSucceeded
1 VdGetSystemCommandBuffer
1 VdSwap
1 VdGetCurrentDisplayGamma
1 VdSetDisplayMode
1 VdGetCurrentDisplayInformation
1 RtlFillMemoryUlong
1 VdInitializeScalerCommandBuffer
1 VdPersistDisplay
1 KeInitializeSemaphore
1 XAudioRegisterRenderDriverClient
=== Unique kernel.call names ===
77

View File

@@ -0,0 +1,44 @@
#!/usr/bin/env python3
"""Count signal events on each tid in ours-postfix.jsonl during the full
run window. Compare against canary [1.9..2.1s]."""
import json
from collections import Counter
OURS = "/home/fabi/RE - Project Sylpheed/xenia-rs/audit-runs/phase-w-wedge-reattack/ours-postfix.jsonl"
counts = Counter()
last_ns_per_tid = {}
first_ns_per_tid = {}
event_count_per_tid = Counter()
with open(OURS, "r") as f:
for line in f:
try:
ev = json.loads(line)
except json.JSONDecodeError:
continue
tid = ev.get("tid")
if tid is None:
continue
host_ns = ev.get("host_ns", 0)
event_count_per_tid[tid] += 1
if tid not in first_ns_per_tid:
first_ns_per_tid[tid] = host_ns
last_ns_per_tid[tid] = host_ns
kind = ev.get("kind", "")
if kind == "kernel.call":
n = ev["payload"].get("name", "")
if n in ("NtSetEvent", "NtReleaseSemaphore", "NtSetEventBoostPriority", "KeSetEvent"):
counts[(tid, n)] += 1
print("=== Signal counts in ours-postfix.jsonl (full window ~1.73s wallclock) ===")
total = 0
for (tid, n), c in sorted(counts.items()):
print(f" tid={tid:3d} {n:30s} {c}")
total += c
print(f" TOTAL signals: {total}")
print("\n=== Per-tid event counts + lifetime ===")
for tid in sorted(event_count_per_tid):
first = first_ns_per_tid.get(tid, 0) / 1e9
last = last_ns_per_tid.get(tid, 0) / 1e9
print(f" tid={tid:3d} {event_count_per_tid[tid]:7d} events first={first:.4f}s last={last:.4f}s")

View File

@@ -0,0 +1,384 @@
# Step 2 — Natural install-trigger sequence and ours divergence point
**Date:** 2026-05-21
**Mode:** PLAN-only (investigation; no engine LOC changes).
**Sources:** `canary-jitter-1.jsonl` (4.4 GB, 18.7M events) and
`phase-w-wedge-reattack/ours-postfix.jsonl` (28 MB, 121,569 events).
## TL;DR
The Step 2 plan's framing —
"identify the canary tid=6 kernel-call sequence in the install window
[9.4s, 9.6s]" — **cannot be applied because ours never reaches
host_ns ≥ 1.73s.** Ours's tid=1 wedges 8 seconds before the install
epoch. The reframed question — "what canary-tid=6 sequence between
the matched-prefix wedge point and the install epoch fails in ours?"
— resolves to a **single root cause one level upstream of the wedge**:
> Canary's spawned cache-loader worker (canary tid=17, entry
> `0x821748F0`) executes ~4140 events and calls `ExTerminateThread`
> at host_ns = 2.092s, taking 154ms. Ours's analog (ours tid=13)
> executes 435 events, **never reaches its second wait iteration**,
> and wedges at its FIRST `NtWaitForSingleObjectEx` (no signaler ever
> fires). **Ours's tid=13 takes a different guest-code branch from
> the first wait onward — it calls `NtReleaseSemaphore` instead of
> `NtSetEvent` between `NtCreateEvent` and `NtWaitForSingleObjectEx`,
> so the event it then waits on is unsignaled.**
This is a **branch divergence inside guest code `sub_821CB030`'s
body**, NOT a missing kernel call in ours and NOT a wrong return
value from ours's kernel.
## Step 0 outcome — install epoch reachable on canary, not on ours
| Source | First event | Last event |
|---|---|---|
| canary tid=6 events in [9.0s..11.0s] | 16,175 kernel.calls captured | install epoch + worker-spawn covered ✓ |
| ours tid=1 events | 1.728s (last event before wedge) | install epoch is at ~9.5s — **8s in the future** |
Ours physically cannot reach 9.4s; tid=1 blocks on tid=13's thread-handle
at host_ns=1.728s, all other tids subsequently block too (see
`phase-w-wedge-reattack/halt-on-deadlock-dump.txt`). Therefore the
canary "kernel-call sequence ours doesn't make in the install window"
question is degenerate: ours makes **none** of canary's 16,175 calls
in that window because ours stops emitting at host_ns=1.73s.
The substantive Step 2 question reframes to: **"What does canary
do between matched-prefix idx ~108,476 (= ours's last events) and
the install epoch?"** Answer: it RUNS the worker tid=17 to
completion, which causes the join-wait on tid=1/6 to return, after
which tid=6 iterates `sub_822F1AA8`'s main loop further and
eventually triggers `sub_824FD240` and `sub_825070F0`. Everything
hinges on tid=17 completing.
## Step 1 outcome — canary tid=6 spawns sub_821748F0 at host_ns=1.935s
Exact anchor:
```
canary tid=6 host_ns=1935433700 idx=108476
ExCreateThread(entry=0x821748f0, ctx=0xbc365620, stack=524288, susp=T)
→ handle.create raw=0xf80000a0 hsid=3bd922fbb385c2c9
canary tid=6 host_ns=1937223600 idx=108498
NtResumeThread
NtWaitForSingleObjectEx handles=[3bd922fbb385c2c9] timeout=-1
→ wait.begin
canary tid=6 host_ns=2092000000 idx=108499 (155 ms later)
kernel.return NtWaitForSingleObjectEx rv=0 status=0x00000000
```
The wait IS infinite (timeout_ns=-1) — yet it returns in 155ms because
the worker terminates (canary tid=17's last call is `ExTerminateThread`
at host_ns=2.0918s).
Ours's mirror:
```
ours tid=1 host_ns=1727479660 idx=108481
ExCreateThread(entry=0x821748f0, ctx=0x4024d640, stack=0, susp=T)
→ handle.create raw=0x000012c8 hsid=8a25e09a8a739c1b
ours tid=1 host_ns=1727611893 idx=108505
wait.begin handles=[8a25e09a8a739c1b] timeout=-1
ours tid=1 host_ns=1727614433 idx=108506
kernel.return NtWaitForSingleObjectEx rv=0 ← but this is just the
return record from the entry probe, NOT actual unblock
```
(Note: `ours-postfix.jsonl` schema emits the entry-probe `kernel.return`
even on an infinite wait, because the probe wraps the wait wrapper.
Per `halt-on-deadlock-dump.txt`, tid=1 is in fact still `Blocked` on
handle `0x000012c8` = Thread(id=13) at deadlock-detection time.)
The spawn parameters look identical in shape (same entry PC; ctx and
stack are run-specific). **Spawn semantics match.**
## Step 2 outcome — canary tid=17 vs ours tid=13 kernel-call differential
Lifetimes:
| | canary tid=17 | ours tid=13 |
|---|---|---|
| first event | host_ns=1.9378s | host_ns=1.7276s |
| last event | host_ns=2.0918s | host_ns=1.7307s |
| duration | **154 ms** | **3 ms** |
| total events | 4140 | 435 |
| kernel.call count | 1351 | 142 |
| terminates? | yes via `ExTerminateThread` | no — wedged on wait |
Per-call differential (top entries by |canary ours|):
| kernel.call | canary tid=17 | ours tid=13 | Δ |
|---|---:|---:|---:|
| RtlEnterCriticalSection | 607 | 58 | +549 |
| RtlLeaveCriticalSection | 607 | 58 | +549 |
| NtClose | 19 | 2 | +17 |
| NtCreateEvent | 18 | 3 | +15 |
| NtDuplicateObject | 16 | 2 | +14 |
| RtlInitAnsiString | 11 | 1 | +10 |
| NtWaitForSingleObjectEx | 11 | 2 | +9 |
| RtlInitializeCriticalSectionAndSpinCount | 15 | 6 | +9 |
| NtQueryFullAttributesFile | 9 | 1 | +8 |
| NtReleaseSemaphore | 9 | 1 | +8 |
| RtlNtStatusToDosError | 9 | 1 | +8 |
| NtSetEvent | 8 | 1 | +7 |
| KeTlsSetValue | 2 | 0 | +2 |
| NtCreateFile | 2 | 0 | +2 |
| ExCreateThread | 1 | 0 | +1 |
| ExTerminateThread | 1 | 0 | +1 |
| KeTlsGetValue | 1 | 0 | +1 |
| KeQueryPerformanceFrequency | 0 | 1 | -1 |
**Set-difference of unique kernel-call names**: ours's set of called
APIs is a strict subset of canary's, plus `KeQueryPerformanceFrequency`
which canary called outside this window. **No kernel API is missing
from ours's implementation that canary uses.** All of these APIs
already work in ours (they are called successfully on tid=5, tid=1,
or tid=10 elsewhere in the same run).
The differential isn't "ours fails to implement a kernel call" —
it's "ours executes 10× fewer iterations of the same loop body."
## The control-flow divergence (the root cause)
Canary tid=17, idx 339-356 — the FIRST wait pattern:
```
idx=339 NtCreateEvent
idx=340 handle.create raw=0xf80000b8 hsid=1070523eb111c6ea object_type=1 (Event)
idx=343 NtDuplicateObject → handle.create at idx=344
idx=347 NtSetEvent ← THE EVENT IS SIGNALED BEFORE THE WAIT
idx=350 NtClose → handle.destroy at idx=351
idx=354 NtWaitForSingleObjectEx
idx=355 wait.begin handles=[1070523eb111c6ea] timeout=-1
idx=356 kernel.return rv=0 ← wait completes in 23µs because event was signaled
```
Ours tid=13, idx 175-434 — the analog wait pattern:
```
idx=175 NtCreateEvent
idx=177 handle.create raw=0x000012d0 hsid=d5e23609d3948568 object_type=1 (Event)
… 240 RtlEnterCriticalSection / RtlLeaveCriticalSection ops in between …
idx=419 NtDuplicateObject → handle.create at idx=420
idx=429 NtReleaseSemaphore ← DIFFERENT API — semaphore, not event-set
idx=432 NtWaitForSingleObjectEx
idx=433 wait.begin handles=[d5e23609d3948568] timeout=-1
idx=434 kernel.return rv=0 (entry probe only; actual wait blocks forever)
⏸ WEDGE — event d5e23609d3948568 is never signaled.
```
The key observation: **between `NtCreateEvent` and the corresponding
`NtWaitForSingleObjectEx`**, canary calls `NtSetEvent` to signal
the very event it is about to wait on (idiomatic self-signaled
wait-pump barrier). Ours **skips the NtSetEvent**, calls
`NtReleaseSemaphore` instead, and then blocks on the unsignaled event.
This is a **guest-code branch divergence** inside the helper
hierarchy `sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0`
(per `sub_82173990.md` chain). The branch predicate is some state
read between `NtCreateEvent` and the call site of `NtSetEvent` /
`NtReleaseSemaphore`.
## Step 3/4 — Why does the predicate differ between engines?
The deep root: this exact divergence pattern is what AUDIT-069 S5
already found at a different lens:
> **AUDIT-069 S5**: "Other producers: canary 25 vs ours 1." Canary
> has 24 additional thread sources releasing the work semaphore that
> ours doesn't have.
Combining S5 with this Step 2 finding:
1. Ours's tid=13 emits ONLY 1 NtReleaseSemaphore before wedging
(consistent with the 1 "other producer" S5 measured).
2. Canary's tid=17 emits 9 NtReleaseSemaphore + 8 NtSetEvent before
reaching ExTerminateThread. Each release/set comes from a
different cache-load iteration.
3. The iteration count is gated by the loop body completing each
iteration. Each iteration begins by waiting on an event that
must be PRE-SIGNALED to advance.
In canary, the event gets pre-signaled (NtSetEvent before NtWait).
In ours, the same code path takes the "release semaphore + wait
on event signaled by external" branch instead of the "set event +
wait on event" branch. **The state read by the predicate at the
branch differs.**
What state? Without disassembling `sub_821CB030`/`sub_821CBA08`
and binding the branch PC to the guest memory location the predicate
reads, we cannot say definitively. Candidate state sources:
- A bit/flag in the ctx (`0x4024d640` in ours vs `0xbc365620` in
canary — different addresses but same shape). Could be uninitialized
in ours due to ANON_Class vtable install at `sub_824FD240+0x24`
not having fired (AUDIT-068 S4). But that vtable install fires
much later (host_ns=9.4s in canary), so this is unlikely.
- The result of a prior `NtQueryFullAttributesFile` call. Canary
tid=17 calls this 9× before reaching ExTerminateThread; ours
tid=13 calls it 1× before wedging. The file being queried is in
the `cache:\` filesystem (per `sub_82173990.md` chain).
- A guest-memory shared CS-protected pointer set by another tid
(canary tids 4/10/14 do 38+90+38 signal events in the
[1.9..2.1s] window; in ours, tids 4/5/14 are STILL working in
[0..1.73s] but their output is shifted to ours's tid=5, which
per AUDIT-069 S5 matches canary's tid=10 producer count almost
exactly — 90 NtReleaseSemaphore each).
## Cause attribution
Per the Step 5 framework:
1. **Missing ours implementation?** NO. Every kernel API canary
tid=17 calls is also implemented in ours and works (verified by
other tids using them successfully).
2. **Incorrect return value in ours?** UNLIKELY but unverified. Phase
A schema doesn't capture args/return values for most calls;
`args_resolved={}` is empty for nearly every call in this window.
3. **Missing side effect in ours?** POSSIBLY. If `NtQueryFullAttributesFile`
or `NtCreateFile` on `cache:\<hash>\...` has a slightly different
behavior in ours (e.g., succeeds when canary fails, or vice-versa),
the resulting branch could diverge.
4. **Upstream state divergence (most likely)**: a guest-memory value
read by a predicate inside `sub_821CB030`/`sub_821CBA08` differs
between engines. The earlier-in-this-tid CS-blob (240+ enter/leave
pairs between idx 177 and idx 423) processes some data structure,
the result of which selects the branch.
**Best single guess (MEDIUM confidence)**: a `NtQueryFullAttributesFile`
on a `cache:\<hash>\<filename>` path returns a different value in
ours than in canary (file present vs not, size mismatch, or attrib
mismatch). The branch chooses "we need to recompute the cache item"
(NtReleaseSemaphore path) instead of "cache item is ready, signal
event and proceed" (NtSetEvent path).
## Disjoint-gap count
**ONE gap** — the predicate divergence inside `sub_821CB030`'s
body. However, the predicate divergence likely has a **complex
upstream cause** that involves either filesystem state or
guest-memory state initialized by another tid that ALSO has the
same kind of subtle drift. So:
- **disjoint divergence sites in this trajectory**: 1 (control-flow
branch in sub_821CB030 chain).
- **disjoint hypothesized causes**: 2-3 (file attribute return value,
shared-memory state from tid=10/5 dispatch worker, or vtable install
bypass at upstream).
This is **NOT** the "50+ disjoint missing kernel patterns" failure
mode predicted in tripstone 7. It's a single branch divergence with
multiple candidate first-causes. Methodology pivot to Option C
(critical-path sweep) is **NOT** indicated; targeted iterate per
candidate first-cause IS indicated.
## Recommended next concrete action
**Iterate plan, ordered by minimum LOC + maximum signal**:
### Iterate Step 2.A — branch-probe inside sub_821CB030 body (~50-80 LOC ours + ~50 LOC canary)
Use existing `audit_61_branch_probe_pcs` to pin the divergent
branch inside `sub_821CB030` / `sub_821CBA08` / `sub_821CC3F8`.
Specifically probe every `bne`/`beq` PC inside these guest fns
that has reachable `bl NtSetEvent` on one branch and `bl
NtReleaseSemaphore` on the other. Use sylpheed.db cross-references
to enumerate `bl 0x824AA2F0` (NtSetEvent wrapper) and `bl 0x824AB158`
(NtReleaseSemaphore wrapper) call sites in these fns.
Capture both engines, diff branch-counts. The first divergent
branch is the answer.
### Iterate Step 2.B — args/return-value capture for the 9 NtQueryFullAttributesFile calls on canary tid=17 (~30 LOC canary)
Extend `audit_61` or write a dedicated probe to log `r3` (filename
buffer) and `r0` (NTSTATUS return) for every
`NtQueryFullAttributesFile` call inside this 154-ms window. Compare
against ours's 1 call. If file-attribute return values differ on a
shared file, that's the trigger.
### Iterate Step 2.C — guest-memory read-watch on the ctx struct (~20 LOC, reuses AUDIT-068 S3 read-probe)
Use `audit_68_host_mem_read_probe` to sample the worker ctx
(`0xbc365620` in canary / `0x4024d640` in ours) at ~1ms cadence in
the window [1.7..2.1s]. Identify whether a flag/byte in the ctx
differs at the predicate-read time. This pinpoints the actual
read location if Step 2.A's branch-probe doesn't immediately reveal
the predicate source.
## Tripstones honored
- **#28**: verified canary's actual behavior by reading the jsonl
directly; the AUDIT-069 S5 framing is corroborated, not assumed.
- **#32**: contention regions may jitter; the 240+ CS enter/leave
pairs in ours tid=13 are NOT identical to canary tid=17's count
(607 vs 58). Differential here may include scheduling-determinism
noise. Mitigation: cross-validate with 2nd cold canary run if
Step 2.A doesn't immediately converge.
- **#39**: matched-prefix did NOT drive this; first-draw progression
is the goal.
- **#5 of plan tripstones**: AUDIT-069 S5 "25 producers" finding IS
downstream of Step 2's identified branch divergence. The 25
producers correspond to canary tid=17's loop iterations that ours
tid=13 doesn't reach.
## Cascade
- A (acquire canary install-epoch event log): ✓ HIGH (16,175 kernel
calls captured cleanly in [9..11s] window).
- B (identify install-trigger sequence in canary): ✓ HIGH
(canary tid=6 spawns sub_821748F0 at host_ns=1.935s, join-wait
returns at 2.092s). The "install trigger" is not a single
kernel call but the **completion of worker tid=17**, which
causes the join wait to release tid=6 into the rest of the
main-loop dispatch.
- C (identify where ours diverges from canary): ✓ HIGH (ours
tid=13 wedges 3ms into its lifetime, vs canary tid=17 running
154ms; first kernel-call sequence divergence at the
NtSetEvent vs NtReleaseSemaphore branch).
- D (attribute the divergence to a specific cause): MEDIUM (3
candidate root causes; need iterate 2.A/2.B/2.C to disambiguate).
- E (produce Δ-gap count + roadmap): ✓ HIGH (1 divergence site;
3 candidate first-causes; ~50-200 LOC iterate plan).
## Honest assessment
- The wedge framing established by AUDIT-049 .. AUDIT-069 holds.
- Step 2 narrows the trigger from "the install epoch at 9.4s" down
to "the worker tid=13's first wait at 1.73s" — a 7-order-of-magnitude
refinement in time.
- The 25-producer finding from AUDIT-069 S5 IS a consequence of
the Step 2 branch divergence: each missing iteration of canary
tid=17's load loop is a missing "other producer" signal.
- The fix is NOT to mirror canary's kernel calls; ours implements
them correctly. The fix is to find why ours's `sub_821CB030`
predicate evaluates differently.
- Confidence that the fix is a single guest-state correction
(file-attribute mismatch, ctx-field uninitialized, or shared-memory
flag race): MEDIUM.
## Artifacts produced this session
All under `xenia-rs/audit-runs/review-a-step2-natural-trigger/`:
- `extract_canary_install_window.py` — scanner for canary in [9..11s].
- `extract_canary_tid6_pre_install.py` — scanner for tid=6 [1.5..11s].
- `extract_canary_worker_tid.py` — locates spawn worker by hsid.
- `extract_canary_tid17_full.py` — tid=17 timeline + diff vs ours tid=13.
- `extract_ours_tid1_full.py` — ours tid=1 timeline.
- `extract_ours_tid13_final.py` — ours tid=13 timeline.
- `find_signaler.py` — finds canary tid=17 wait signalers.
- `ours_signal_counts.py` — ours per-tid signal counts.
- `canary-tid6-install-window.csv` — 32,383 events.
- `canary-tid6-install-window.summary` — kernel.call frequencies.
- `canary-tid6-from-anchor.csv` — 139,202 events.
- `canary-tid17-worker-timeline.csv` — 4140 events.
- `ours-tid13-full-timeline.csv` — 435 events.
- `ours-tid1-final-150.csv` — last 150 events on ours tid=1.
- `ours-tid1-summary` — kernel.call frequencies.
- `canary-tid17-waits.csv` — 29 wait.begin events with handle binding.
- `differential-canary-tid17-vs-ours-tid13.txt` — full call-name diff.
- `step2-report.md` — this report.
**LOC delta in this session**: 0 to xenia-rs/canary engines; 0 to
sylpheed.db; ~600 LOC analysis scripts under audit-runs/.