handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/audit-069-wait-signal-producer/fix-canary-s5.diff
+++ b/audit-runs/audit-069-wait-signal-producer/fix-canary-s5.diff
@@ -0,0 +1,304 @@
+diff --git a/src/xenia/cpu/cpu_flags.cc b/src/xenia/cpu/cpu_flags.cc
+index 3ff067e15..e6f412f91 100644
+--- a/src/xenia/cpu/cpu_flags.cc
+++ b/src/xenia/cpu/cpu_flags.cc
+@@ -57,3 +57,110 @@ DEFINE_bool(break_condition_truncate, true, "truncate value to 32-bits", "CPU");
+ 
+ DEFINE_bool(break_on_debugbreak, true, "int3 on JITed __debugbreak requests.",
+             "CPU");
+
+// AUDIT-DEMO: smoke marker (memory entry: emulator.cc:225,283). Always-on bool.
+DEFINE_bool(audit_demo_setup_trace, true,
+            "Audit smoke marker: log AUDIT-DEMO-SETUP-BEGIN at emulator setup.",
+            "Audit");
+
+// AUDIT-061: comma-separated list of guest PCs to log on each fire.
+// Format: "0xPC1,0xPC2,..." (max 32 PCs). Each fire emits
+// AUDIT-061-BR pc=X lr=X cr0=LGE cr6=LGE r3=X r4=X r5=X r6=X r31=X tid=N.
+// Default empty (off); no perf cost when empty.
+DEFINE_string(audit_61_branch_probe_pcs, "",
+              "AUDIT-061: CSV of guest PCs to trace (cr0/cr6 + regs/tid).",
+              "Audit");
+
+// AUDIT-067: comma-separated list of u32 values to watch. When non-empty,
+// every 4-byte guest store (stw/stwu/stwx/stwux/stmw) emits a runtime
+// equality check; matches log AUDIT-067-VAL pc=X lr=X val=X dst=X r3..r6 r31 tid=N.
+// Max 4 values. Default empty (off); zero overhead when empty.
+DEFINE_string(audit_67_value_watch, "",
+              "AUDIT-067: CSV of u32 values (max 4) — log every guest "
+              "store whose value matches.",
+              "Audit");
+
+// AUDIT-068: host-side memory-write watch. See cpu_flags.h header for format.
+// Mirrors AUDIT-067 but covers host-side writes (xe::store_and_swap<T>,
+// Memory::Zero/Fill/Copy). Empty default = zero cost.
+DEFINE_string(audit_68_host_mem_watch_values, "",
+              "AUDIT-068: CSV of u32 values (max 8) — log every host-side "
+              "guest-memory write whose value matches.",
+              "Audit");
+DEFINE_string(audit_68_host_mem_watch_addrs, "",
+              "AUDIT-068: CSV of guest VAs or VA ranges 'START-END' (max 8) "
+              "— log every host-side guest-memory write whose guest VA falls "
+              "within the configured set.",
+              "Audit");
+
+// AUDIT-068 Session 3: read-mode probe. See cpu_flags.h for format.
+DEFINE_string(audit_68_host_mem_read_probe, "",
+              "AUDIT-068 Session 3: CSV of 'VA:SIZE:PERIOD_NS' tuples (max 8) "
+              "— a dedicated poll thread reads the value at each VA every "
+              "PERIOD_NS and emits AUDIT-068-READ-CHANGE on transition.",
+              "Audit");
+
+// AUDIT-069: see cpu_flags.h header. Empty default = zero cost.
+DEFINE_string(audit_69_event_signal_watch, "",
+              "AUDIT-069: CSV of guest event-handle IDs (max 4) — log each "
+              "XEvent::Set / Ke*Event / Nt*Event fire whose target matches.",
+              "Audit");
+DEFINE_string(audit_69_event_signal_native_ptr, "",
+              "AUDIT-069: CSV of guest event native VAs (X_KEVENT*) (max 4) "
+              "— log each set fire whose native pointer matches.",
+              "Audit");
+DEFINE_bool(audit_69_log_all_sets, false,
+            "AUDIT-069: when true, log EVERY XEvent::Set/Pulse fire (used "
+            "for one-run wait→signal correlation across handle drift). "
+            "Default false; use only with --mute=true.",
+            "Audit");
+
+// AUDIT-070 (S5 of AUDIT-069 family): semaphore-release watch. See header.
+DEFINE_string(audit_70_semaphore_release_watch, "",
+              "AUDIT-070: CSV of guest semaphore handle IDs (max 4) — log "
+              "each NtReleaseSemaphore / xeKeReleaseSemaphore fire whose "
+              "target matches.",
+              "Audit");
+DEFINE_bool(audit_70_log_all_releases, false,
+            "AUDIT-070: when true, log EVERY NtReleaseSemaphore / "
+            "xeKeReleaseSemaphore fire (used to identify the work-semaphore "
+            "handle on first run). Default false; use only with --mute=true.",
+            "Audit");
+
+// Phase A — see kernel/event_log.h.
+DEFINE_string(phase_a_event_log_path, "",
+              "Phase A: write schema-v1 JSONL event log to this path. "
+              "Empty (default) = disabled.",
+              "Audit");
+DEFINE_bool(phase_a_event_log_mem_writes, false,
+            "Phase A: include mem.write events in the JSONL log. RESERVED — "
+            "not wired in this phase. Default false.",
+            "Audit");
+
+// Phase D Stage 1 — see kernel/event_log.h `EmitContentionObserved`.
+DEFINE_bool(kernel_emit_contention, false,
+            "Phase D Stage 1: emit `contention.observed` events when "
+            "RtlEnterCriticalSection's spin loop is exhausted and the call "
+            "falls through to xeKeWaitForSingleObject. Default false (zero "
+            "cost when disabled). Requires --phase_a_event_log_path to be "
+            "set as well.",
+            "Audit");
+
+// Phase B — see kernel/phase_b_snapshot.h.
+DEFINE_string(phase_b_snapshot_dir, "",
+              "Phase B: write 5-file structured state snapshot to "
+              "<dir>/canary/ at the moment immediately before the first "
+              "guest PPC instruction of entry_point. Empty (default) = "
+              "disabled, zero overhead.",
+              "Audit");
+DEFINE_bool(phase_b_snapshot_and_exit, false,
+            "Phase B: after writing the snapshot, exit the process "
+            "immediately (std::_Exit(0)) so re-runs are byte-deterministic.",
+            "Audit");
+DEFINE_bool(phase_b_dump_section_content, false,
+            "Phase B: in memory.json, populate section_contents[].content_b64 "
+            "with raw bytes of every committed XEX-image region. Default "
+            "false — per-region SHA-256 is enough for the routine diff; "
+            "this is the escape hatch for the STOP-and-report condition "
+            "(image_loaded_sha256 mismatch).",
+            "Audit");diff --git a/src/xenia/cpu/cpu_flags.h b/src/xenia/cpu/cpu_flags.h
+index 38c4f98ba..95fe8cb22 100644
+--- a/src/xenia/cpu/cpu_flags.h
+++ b/src/xenia/cpu/cpu_flags.h
+@@ -35,4 +35,76 @@ DECLARE_bool(break_condition_truncate);
+ 
+ DECLARE_bool(break_on_debugbreak);
+ 
+// AUDIT-DEMO smoke marker.
+DECLARE_bool(audit_demo_setup_trace);
+
+// AUDIT-061: multi-PC branch probe — emits one log line per fire with
+// (pc, lr, cr0 LGE, cr6 LGE, r3, r4, r5, r6, r31, tid). CSV of guest PCs.
+DECLARE_string(audit_61_branch_probe_pcs);
+
+// AUDIT-067: value-watch — emit a log line for each 32-bit guest store whose
+// value-to-be-stored matches any configured value. CSV of u32 values
+// ("0xDEADBEEF,..."), max 4 entries. Default empty (off); zero cost when empty.
+DECLARE_string(audit_67_value_watch);
+
+// AUDIT-068: host-side memory-write watch — emit a log line for each host-side
+// write to guest memory whose VALUE matches any configured u32 value, or whose
+// guest VA falls within any configured ADDR or ADDR-range. Mirrors AUDIT-067
+// but covers the host-side write paths (xe::store_and_swap<T>, Memory::Zero/
+// Fill/Copy) that AUDIT-067's JIT store-opcode hooks cannot see.
+//
+// VALUES: CSV of u32 values, max 8 entries; e.g. "0x8200A208,0x8200A928".
+// ADDRS:  CSV of guest VAs or VA ranges, max 8 entries; range form is
+//         "0xSTART-0xEND" (inclusive). e.g. "0x42500000-0x42600000,0xBCE25340".
+// Default empty (off); zero cost on the hot path when both are empty.
+DECLARE_string(audit_68_host_mem_watch_values);
+DECLARE_string(audit_68_host_mem_watch_addrs);
+
+// AUDIT-068 Session 3: read-mode probe. CSV of "VA:SIZE:PERIOD_NS" tuples
+// (max 8). A dedicated low-priority thread polls each VA every PERIOD_NS and
+// emits AUDIT-068-READ-CHANGE when the value transitions. SIZE in {1,2,4,8}.
+// Example: "0xBCE25340:4:1000000" = poll u32 at 0xBCE25340 every 1 ms.
+// Default empty (off); the poll thread is not spawned when empty.
+DECLARE_string(audit_68_host_mem_read_probe);
+
+// AUDIT-069: event-signal watch. CSV of guest handle IDs (e.g. "0xF8000098")
+// to log on every XEvent::Set / KeSetEvent / NtSetEvent / KePulseEvent /
+// NtPulseEvent fire whose target matches. Max 4 entries. Default empty (off);
+// zero cost on the hot path when empty.
+DECLARE_string(audit_69_event_signal_watch);
+// AUDIT-069: event-signal watch by native guest VA (X_KEVENT*). CSV of guest
+// VAs (max 4). Default empty (off). Use when the handle id varies across
+// boots but the native dispatcher pointer is stable.
+DECLARE_string(audit_69_event_signal_native_ptr);
+// AUDIT-069: when true, log EVERY XEvent::Set / XEvent::Pulse fire (subject
+// to the slowpath gate). Use only with --mute=true and short windows — high
+// volume. Default false (off).
+DECLARE_bool(audit_69_log_all_sets);
+
+// AUDIT-070 (S5 of AUDIT-069 family): semaphore-release watch. CSV of guest
+// handle IDs (e.g. "0xF8000098") to log on every NtReleaseSemaphore /
+// xeKeReleaseSemaphore fire whose target matches. Max 4 entries. Default
+// empty (off); zero cost on the hot path when empty.
+DECLARE_string(audit_70_semaphore_release_watch);
+// AUDIT-070: when true, log EVERY NtReleaseSemaphore / xeKeReleaseSemaphore
+// fire. Use only with --mute=true and short windows — used to identify the
+// canary work-semaphore handle on first run. Default false (off).
+DECLARE_bool(audit_70_log_all_releases);
+
+// Phase A: JSONL event-log emitter path. When non-empty, the engine writes
+// schema-v1 JSONL events to this file. Empty (default) = no overhead, no
+// behavior change. Schema: xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md
+DECLARE_string(phase_a_event_log_path);
+DECLARE_bool(phase_a_event_log_mem_writes);
+
+// Phase B: initial-state snapshot. When the dir cvar is non-empty, the
+// engine writes a five-file structured state snapshot (cpu_state.json,
+// memory.json, kernel.json, vfs.json, config.json, plus manifest.json) to
+// `<dir>/canary/` at the moment immediately before the first guest PPC
+// instruction of the XEX entry_point executes. See
+// `xenia-rs/audit-runs/phase-b-state-equivalence/`.
+DECLARE_string(phase_b_snapshot_dir);
+DECLARE_bool(phase_b_snapshot_and_exit);
+DECLARE_bool(phase_b_dump_section_content);
+
+ #endif  // XENIA_CPU_CPU_FLAGS_H_diff --git a/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc b/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc
+index ced21a600..e1c74d7ec 100644
+--- a/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc
+++ b/src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc
+@@ -12,6 +12,8 @@
+ #include "xenia/base/clock.h"
+ #include "xenia/base/platform.h"
+ #include "xenia/cpu/processor.h"
+#include "xenia/kernel/audit_70_semaphore_release_watch.h"
+#include "xenia/kernel/event_log.h"
+ #include "xenia/kernel/util/shim_utils.h"
+ #include "xenia/kernel/xboxkrnl/xboxkrnl_private.h"
+ #include "xenia/kernel/xsemaphore.h"
+@@ -147,6 +149,25 @@ uint32_t ExCreateThread(xe::be<uint32_t>* handle_ptr, uint32_t stack_size,
+     if (thread_id_ptr) {
+       *thread_id_ptr = thread->thread_id();
+     }
+    // Phase C+15-α: schema-v1 `thread.create` event. Symmetric with
+    // ours's `ex_create_thread`. Emitted by the **parent** thread.
+    // handle.create for the thread handle itself was already emitted
+    // via ObjectTable::AddHandle inside XThread::Create. Here we
+    // surface the spawn-specific metadata.
+    if (phase_a::IsEnabled()) {
+      uint64_t sid = phase_a::LookupHandleSemanticId(thread->handle());
+      XThread* parent = XThread::TryGetCurrentThread();
+      uint32_t parent_tid = 0;
+      if (parent) {
+        parent_tid = static_cast<uint32_t>(
+            parent->guest_object<X_KTHREAD>()->thread_id);
+      }
+      uint32_t affinity = (creation_flags >> 24) & 0xFF;
+      bool suspended = (creation_flags & 0x1) != 0;
+      phase_a::EmitThreadCreate(sid, parent_tid, start_address, start_context,
+                                /* priority */ 0, affinity, actual_stack_size,
+                                suspended);
+    }
+   }
+   return result;
+ }
+@@ -165,6 +186,9 @@ DECLARE_XBOXKRNL_EXPORT1(ExCreateThread, kThreading, kImplemented);
+ 
+ uint32_t ExTerminateThread(uint32_t exit_code) {
+   XThread* thread = XThread::GetCurrentThread();
+  // Phase C+15-α: schema-v1 `thread.exit` is emitted inside
+  // `XThread::Exit` (covers both explicit ExTerminateThread and
+  // implicit thread-entry returns).
+ 
+   // NOTE: this kills us right now. We won't return from it.
+   return thread->Exit(exit_code);
+@@ -718,6 +742,9 @@ uint32_t xeKeReleaseSemaphore(X_KSEMAPHORE* semaphore_ptr, uint32_t increment,
+   int32_t previous_count = 0;
+   [[maybe_unused]] bool success =
+       sem->ReleaseSemaphore(adjustment, &previous_count);
+  // AUDIT-070: log Ke-form release fires whose target handle matches.
+  audit_70::check_release(sem->handle(), "xeKeReleaseSemaphore",
+                          static_cast<int32_t>(adjustment), previous_count);
+   return static_cast<uint32_t>(previous_count);
+ }
+ 
+@@ -786,6 +813,13 @@ dword_result_t NtReleaseSemaphore_entry(dword_t sem_handle,
+           uint32_t(release_count), previous_count);
+       result = X_STATUS_SEMAPHORE_LIMIT_EXCEEDED;
+     }
+    // AUDIT-070: log Nt-form release fires whose target handle matches.
+    // Logged regardless of success/limit-exceeded — distinguished by
+    // result/previous_count in subsequent analysis.
+    audit_70::check_release(static_cast<uint32_t>(sem_handle),
+                            "NtReleaseSemaphore",
+                            static_cast<int32_t>(release_count),
+                            previous_count);
+   } else {
+     result = X_STATUS_INVALID_HANDLE;
+   }
+@@ -954,6 +988,19 @@ uint32_t xeKeWaitForSingleObject(void* object_ptr, uint32_t wait_reason,
+     return X_STATUS_ABANDONED_WAIT_0;
+   }
+ 
+  // Phase C+15-α: schema-v1 `wait.begin` event. Symmetric with ours's
+  // `ke_wait_for_single_object`. Resolve the SID via the object's
+  // first registered handle.
+  if (phase_a::IsEnabled()) {
+    uint64_t sid = 0;
+    if (!object->handles().empty()) {
+      sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
+    }
+    int64_t timeout_ns = timeout_ptr ? (static_cast<int64_t>(*timeout_ptr) * 100) : -1;
+    phase_a::EmitWaitBegin(&sid, 1, timeout_ns, alertable != 0,
+                           /* wait_all */ false);
+  }
+
+   X_STATUS result =
+       object->Wait(wait_reason, processor_mode, alertable, timeout_ptr);
+   if (alertable) {
+@@ -980,6 +1027,16 @@ uint32_t NtWaitForSingleObjectEx(uint32_t object_handle, uint32_t wait_mode,
+                                  uint32_t alertable, uint64_t* timeout_ptr) {
+   X_STATUS result = X_STATUS_SUCCESS;
+ 
+  // Phase C+15-α: schema-v1 `wait.begin` event. Symmetric with ours's
+  // `nt_wait_for_single_object_ex`. Resolve SID directly from the
+  // handle.
+  if (phase_a::IsEnabled()) {
+    uint64_t sid = phase_a::LookupHandleSemanticId(object_handle);
+    int64_t timeout_ns = timeout_ptr ? (static_cast<int64_t>(*timeout_ptr) * 100) : -1;
+    phase_a::EmitWaitBegin(&sid, 1, timeout_ns, alertable != 0,
+                           /* wait_all */ false);
+  }
+
+   auto object =
+       kernel_state()->object_table()->LookupObject<XObject>(object_handle);
+   if (object) {
--- a/audit-runs/audit-069-wait-signal-producer/fix-canary.diff
+++ b/audit-runs/audit-069-wait-signal-producer/fix-canary.diff
@@ -0,0 +1,206 @@
+diff --git a/src/xenia/cpu/cpu_flags.cc b/src/xenia/cpu/cpu_flags.cc
+index 3ff067e15..e024bfb26 100644
+--- a/src/xenia/cpu/cpu_flags.cc
+++ b/src/xenia/cpu/cpu_flags.cc
+@@ -57,3 +57,98 @@ DEFINE_bool(break_condition_truncate, true, "truncate value to 32-bits", "CPU");
+ 
+ DEFINE_bool(break_on_debugbreak, true, "int3 on JITed __debugbreak requests.",
+             "CPU");
+
+// AUDIT-DEMO: smoke marker (memory entry: emulator.cc:225,283). Always-on bool.
+DEFINE_bool(audit_demo_setup_trace, true,
+            "Audit smoke marker: log AUDIT-DEMO-SETUP-BEGIN at emulator setup.",
+            "Audit");
+
+// AUDIT-061: comma-separated list of guest PCs to log on each fire.
+// Format: "0xPC1,0xPC2,..." (max 32 PCs). Each fire emits
+// AUDIT-061-BR pc=X lr=X cr0=LGE cr6=LGE r3=X r4=X r5=X r6=X r31=X tid=N.
+// Default empty (off); no perf cost when empty.
+DEFINE_string(audit_61_branch_probe_pcs, "",
+              "AUDIT-061: CSV of guest PCs to trace (cr0/cr6 + regs/tid).",
+              "Audit");
+
+// AUDIT-067: comma-separated list of u32 values to watch. When non-empty,
+// every 4-byte guest store (stw/stwu/stwx/stwux/stmw) emits a runtime
+// equality check; matches log AUDIT-067-VAL pc=X lr=X val=X dst=X r3..r6 r31 tid=N.
+// Max 4 values. Default empty (off); zero overhead when empty.
+DEFINE_string(audit_67_value_watch, "",
+              "AUDIT-067: CSV of u32 values (max 4) — log every guest "
+              "store whose value matches.",
+              "Audit");
+
+// AUDIT-068: host-side memory-write watch. See cpu_flags.h header for format.
+// Mirrors AUDIT-067 but covers host-side writes (xe::store_and_swap<T>,
+// Memory::Zero/Fill/Copy). Empty default = zero cost.
+DEFINE_string(audit_68_host_mem_watch_values, "",
+              "AUDIT-068: CSV of u32 values (max 8) — log every host-side "
+              "guest-memory write whose value matches.",
+              "Audit");
+DEFINE_string(audit_68_host_mem_watch_addrs, "",
+              "AUDIT-068: CSV of guest VAs or VA ranges 'START-END' (max 8) "
+              "— log every host-side guest-memory write whose guest VA falls "
+              "within the configured set.",
+              "Audit");
+
+// AUDIT-068 Session 3: read-mode probe. See cpu_flags.h for format.
+DEFINE_string(audit_68_host_mem_read_probe, "",
+              "AUDIT-068 Session 3: CSV of 'VA:SIZE:PERIOD_NS' tuples (max 8) "
+              "— a dedicated poll thread reads the value at each VA every "
+              "PERIOD_NS and emits AUDIT-068-READ-CHANGE on transition.",
+              "Audit");
+
+// AUDIT-069: see cpu_flags.h header. Empty default = zero cost.
+DEFINE_string(audit_69_event_signal_watch, "",
+              "AUDIT-069: CSV of guest event-handle IDs (max 4) — log each "
+              "XEvent::Set / Ke*Event / Nt*Event fire whose target matches.",
+              "Audit");
+DEFINE_string(audit_69_event_signal_native_ptr, "",
+              "AUDIT-069: CSV of guest event native VAs (X_KEVENT*) (max 4) "
+              "— log each set fire whose native pointer matches.",
+              "Audit");
+DEFINE_bool(audit_69_log_all_sets, false,
+            "AUDIT-069: when true, log EVERY XEvent::Set/Pulse fire (used "
+            "for one-run wait→signal correlation across handle drift). "
+            "Default false; use only with --mute=true.",
+            "Audit");
+
+// Phase A — see kernel/event_log.h.
+DEFINE_string(phase_a_event_log_path, "",
+              "Phase A: write schema-v1 JSONL event log to this path. "
+              "Empty (default) = disabled.",
+              "Audit");
+DEFINE_bool(phase_a_event_log_mem_writes, false,
+            "Phase A: include mem.write events in the JSONL log. RESERVED — "
+            "not wired in this phase. Default false.",
+            "Audit");
+
+// Phase D Stage 1 — see kernel/event_log.h `EmitContentionObserved`.
+DEFINE_bool(kernel_emit_contention, false,
+            "Phase D Stage 1: emit `contention.observed` events when "
+            "RtlEnterCriticalSection's spin loop is exhausted and the call "
+            "falls through to xeKeWaitForSingleObject. Default false (zero "
+            "cost when disabled). Requires --phase_a_event_log_path to be "
+            "set as well.",
+            "Audit");
+
+// Phase B — see kernel/phase_b_snapshot.h.
+DEFINE_string(phase_b_snapshot_dir, "",
+              "Phase B: write 5-file structured state snapshot to "
+              "<dir>/canary/ at the moment immediately before the first "
+              "guest PPC instruction of entry_point. Empty (default) = "
+              "disabled, zero overhead.",
+              "Audit");
+DEFINE_bool(phase_b_snapshot_and_exit, false,
+            "Phase B: after writing the snapshot, exit the process "
+            "immediately (std::_Exit(0)) so re-runs are byte-deterministic.",
+            "Audit");
+DEFINE_bool(phase_b_dump_section_content, false,
+            "Phase B: in memory.json, populate section_contents[].content_b64 "
+            "with raw bytes of every committed XEX-image region. Default "
+            "false — per-region SHA-256 is enough for the routine diff; "
+            "this is the escape hatch for the STOP-and-report condition "
+            "(image_loaded_sha256 mismatch).",
+            "Audit");
+diff --git a/src/xenia/cpu/cpu_flags.h b/src/xenia/cpu/cpu_flags.h
+index 38c4f98ba..cf5719b8b 100644
+--- a/src/xenia/cpu/cpu_flags.h
+++ b/src/xenia/cpu/cpu_flags.h
+@@ -35,4 +35,66 @@ DECLARE_bool(break_condition_truncate);
+ 
+ DECLARE_bool(break_on_debugbreak);
+ 
+// AUDIT-DEMO smoke marker.
+DECLARE_bool(audit_demo_setup_trace);
+
+// AUDIT-061: multi-PC branch probe — emits one log line per fire with
+// (pc, lr, cr0 LGE, cr6 LGE, r3, r4, r5, r6, r31, tid). CSV of guest PCs.
+DECLARE_string(audit_61_branch_probe_pcs);
+
+// AUDIT-067: value-watch — emit a log line for each 32-bit guest store whose
+// value-to-be-stored matches any configured value. CSV of u32 values
+// ("0xDEADBEEF,..."), max 4 entries. Default empty (off); zero cost when empty.
+DECLARE_string(audit_67_value_watch);
+
+// AUDIT-068: host-side memory-write watch — emit a log line for each host-side
+// write to guest memory whose VALUE matches any configured u32 value, or whose
+// guest VA falls within any configured ADDR or ADDR-range. Mirrors AUDIT-067
+// but covers the host-side write paths (xe::store_and_swap<T>, Memory::Zero/
+// Fill/Copy) that AUDIT-067's JIT store-opcode hooks cannot see.
+//
+// VALUES: CSV of u32 values, max 8 entries; e.g. "0x8200A208,0x8200A928".
+// ADDRS:  CSV of guest VAs or VA ranges, max 8 entries; range form is
+//         "0xSTART-0xEND" (inclusive). e.g. "0x42500000-0x42600000,0xBCE25340".
+// Default empty (off); zero cost on the hot path when both are empty.
+DECLARE_string(audit_68_host_mem_watch_values);
+DECLARE_string(audit_68_host_mem_watch_addrs);
+
+// AUDIT-068 Session 3: read-mode probe. CSV of "VA:SIZE:PERIOD_NS" tuples
+// (max 8). A dedicated low-priority thread polls each VA every PERIOD_NS and
+// emits AUDIT-068-READ-CHANGE when the value transitions. SIZE in {1,2,4,8}.
+// Example: "0xBCE25340:4:1000000" = poll u32 at 0xBCE25340 every 1 ms.
+// Default empty (off); the poll thread is not spawned when empty.
+DECLARE_string(audit_68_host_mem_read_probe);
+
+// AUDIT-069: event-signal watch. CSV of guest handle IDs (e.g. "0xF8000098")
+// to log on every XEvent::Set / KeSetEvent / NtSetEvent / KePulseEvent /
+// NtPulseEvent fire whose target matches. Max 4 entries. Default empty (off);
+// zero cost on the hot path when empty.
+DECLARE_string(audit_69_event_signal_watch);
+// AUDIT-069: event-signal watch by native guest VA (X_KEVENT*). CSV of guest
+// VAs (max 4). Default empty (off). Use when the handle id varies across
+// boots but the native dispatcher pointer is stable.
+DECLARE_string(audit_69_event_signal_native_ptr);
+// AUDIT-069: when true, log EVERY XEvent::Set / XEvent::Pulse fire (subject
+// to the slowpath gate). Use only with --mute=true and short windows — high
+// volume. Default false (off).
+DECLARE_bool(audit_69_log_all_sets);
+
+// Phase A: JSONL event-log emitter path. When non-empty, the engine writes
+// schema-v1 JSONL events to this file. Empty (default) = no overhead, no
+// behavior change. Schema: xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md
+DECLARE_string(phase_a_event_log_path);
+DECLARE_bool(phase_a_event_log_mem_writes);
+
+// Phase B: initial-state snapshot. When the dir cvar is non-empty, the
+// engine writes a five-file structured state snapshot (cpu_state.json,
+// memory.json, kernel.json, vfs.json, config.json, plus manifest.json) to
+// `<dir>/canary/` at the moment immediately before the first guest PPC
+// instruction of the XEX entry_point executes. See
+// `xenia-rs/audit-runs/phase-b-state-equivalence/`.
+DECLARE_string(phase_b_snapshot_dir);
+DECLARE_bool(phase_b_snapshot_and_exit);
+DECLARE_bool(phase_b_dump_section_content);
+
+ #endif  // XENIA_CPU_CPU_FLAGS_H_
+diff --git a/src/xenia/kernel/xevent.cc b/src/xenia/kernel/xevent.cc
+index b583bf732..f8bf47952 100644
+--- a/src/xenia/kernel/xevent.cc
+++ b/src/xenia/kernel/xevent.cc
+@@ -11,6 +11,7 @@
+ 
+ #include "xenia/base/byte_stream.h"
+ #include "xenia/base/logging.h"
+#include "xenia/kernel/audit_69_event_signal_watch.h"
+ 
+ namespace xe {
+ namespace kernel {
+@@ -58,12 +59,19 @@ void XEvent::InitializeNative(void* native_ptr, X_DISPATCH_HEADER* header) {
+ }
+ 
+ int32_t XEvent::Set(uint32_t priority_increment, bool wait) {
+  // AUDIT-069: log event-signal fires whose target matches the configured
+  // handle ID or native VA. Hot path is a single relaxed atomic load when
+  // the cvars are empty (default).
+  audit_69::check_event_set(this->handle(), this->guest_object(),
+                            "XEvent::Set");
+   set_priority_increment(priority_increment);
+   event_->Set();
+   return 1;
+ }
+ 
+ int32_t XEvent::Pulse(uint32_t priority_increment, bool wait) {
+  audit_69::check_event_set(this->handle(), this->guest_object(),
+                            "XEvent::Pulse");
+   set_priority_increment(priority_increment);
+   event_->Pulse();
+   return 1;
--- a/audit-runs/audit-069-wait-signal-producer/s3/handle-sequence-diff.md
+++ b/audit-runs/audit-069-wait-signal-producer/s3/handle-sequence-diff.md
@@ -0,0 +1,143 @@
+# AUDIT-069 Session 3 — handle-sequence diff (ours tid=5 vs canary tid=10)
+
+Two engines run γ-signaler family on identical thread (entry=0x82450A28, ctx=0x828F3B68).
+ours labels this thread tid=5; canary labels it tid=10 (cross-engine tid mismatch, AUDIT-068 reading-error #28).
+
+## Fire-count summary
+
+| caller LR | symbol | wrapper PC | ours fires | canary fires | ratio |
+|---|---|---|---|---|---|
+| 0x8245DA44 | γ-D-A (sub_8245D9D8) | 0x824AA2F0 (NtSetEvent) | 5 | 23 | 22% |
+| 0x8245DB08 | γ-D-B (sub_8245DA78) | 0x824AA2F0 (NtSetEvent) | 1 | 8 | 12% |
+| 0x8245DC5C | γ-DB40 (sub_8245DB40) | 0x824AAF50 (Ke wrapper) | 75 | 461 | 16% |
+| **TOTAL tid=5/tid=10 signaler work** | | | **81** | **492** | **16%** |
+
+**Headline divergence**: ours completes ~16% of canary's producer-loop iterations.
+Not (only) "wrong handles" — ours produces FAR fewer signals.
+
+## Per-LR position-aligned sequence (handle = r3)
+
+Note: ours uses normal slot-id namespace (0x10xx). canary uses pseudo-handle namespace (F8000xxx).
+Handles cannot be compared by raw ID. Compare by position-in-per-LR-sequence and by call-args (size r5).
+
+### γ-DB40 dispatch (lr=0x8245DC5C) — Ke wrapper @ 0x824AAF50
+
+Args: r3=handle, r4=buf_ptr, r5=size, r6=0
+
+| pos | ours r3 | ours r5(size) | ours r4(buf) | canary r3 | canary r5(size) | canary r4(buf) |
+|---:|---|---|---|---|---|---|
+| 0 | 0x00001040 | 0x00000800 | 0x41a01cd0 | 0xf8000030 | 0x00000800 | 0xbdb18cd0 |
+| 1 | 0x0000105c | 0x00000800 | 0x41a01cd0 | 0xf8000034 | 0x00000800 | 0xbdb19cd0 |
+| 2 | 0x00001098 | 0x00019000 | 0x42c12090 | 0xf8000044 | 0x00000800 | 0xbdb19cd0 |
+| 3 | 0x000010ac | 0x00000800 | 0x41a01cd0 | 0xf8000044 | 0x00019000 | 0xbed2a090 |
+| 4 | 0x000010d0 | 0x0001c000 | 0x431520d0 | 0xf8000078 | 0x0001c000 | 0xbf26a0d0 |
+| 5 | 0x000010e0 | 0x00020000 | 0x4c946800 | 0xf8000078 | 0x00000800 | 0xbdb19cd0 |
+| 6 | 0x000010e0 | 0x00020000 | 0x4c966800 | 0xf8000078 | 0x00020000 | 0xb2cb0800 |
+| 7 | 0x000010e0 | 0x00020000 | 0x4c986800 | 0xf8000078 | 0x00020000 | 0xb2cd0800 |
+| 8 | 0x000010e0 | 0x00020000 | 0x4c9a6800 | 0xf8000078 | 0x00020000 | 0xb2cf0800 |
+| 9 | 0x000010e0 | 0x00020000 | 0x4c9c6800 | 0xf8000078 | 0x00020000 | 0xb2d10800 |
+| 10 | 0x000010e0 | 0x00020000 | 0x4c9e6800 | 0xf8000078 | 0x00020000 | 0xb2d30800 |
+| 11 | 0x000010e0 | 0x00020000 | 0x4ca06800 | 0xf8000078 | 0x00020000 | 0xb2d50800 |
+| 12 | 0x000010e0 | 0x00020000 | 0x4ca26800 | 0xf8000078 | 0x00020000 | 0xb2d70800 |
+| 13 | 0x000010e0 | 0x00020000 | 0x4ca46800 | 0xf8000078 | 0x00020000 | 0xb2d90800 |
+| 14 | 0x000010e0 | 0x00020000 | 0x4ca66800 | 0xf8000078 | 0x00020000 | 0xb2db0800 |
+| 15 | 0x000010e0 | 0x00020000 | 0x4ca86800 | 0xf8000078 | 0x00020000 | 0xb2dd0800 |
+| 16 | 0x000010e0 | 0x00020000 | 0x4caa6800 | 0xf8000078 | 0x00020000 | 0xb2df0800 |
+| 17 | 0x000010e0 | 0x00020000 | 0x4cac6800 | 0xf8000078 | 0x00020000 | 0xb2e10800 |
+| 18 | 0x000010e0 | 0x00020000 | 0x4cae6800 | 0xf8000078 | 0x00020000 | 0xb2e30800 |
+| 19 | 0x000010e0 | 0x00020000 | 0x4cb06800 | 0xf8000078 | 0x00020000 | 0xb2e50800 |
+... (ours total 75, canary total 461)
+
+### γ-D-A dispatch (lr=0x8245DA44) — NtSetEvent wrapper @ 0x824AA2F0
+
+Args: r3=handle, r4=2(SignalKind=Set), r5=handle (dup), r6=ctx
+
+| pos | ours r3 | ours r4 | canary r3 | canary r4 |
+|---:|---|---|---|---|
+| 0 | 0x00001054 | 0x00000002 | 0xf8000044 | 0x00000002 |
+| 1 | 0x00001064 | 0x00000002 | 0xf8000048 | 0x00000002 |
+| 2 | 0x000010a0 | 0x00000002 | 0xf8000074 | 0x00000002 |
+| 3 | 0x000010b4 | 0x00000002 | 0xf8000080 | 0x00000002 |
+| 4 | 0x000010ec | 0x00000002 | 0xf8000098 | 0x00000002 |
+| 5 | --- | --- | 0xf80000a8 | 0x00000002 |
+| 6 | --- | --- | 0xf80000b8 | 0x00000002 |
+| 7 | --- | --- | 0xf80000c4 | 0x00000002 |
+| 8 | --- | --- | 0xf80000d4 | 0x00000002 |
+| 9 | --- | --- | 0xf80000e0 | 0x00000002 |
+| 10 | --- | --- | 0xf80000e8 | 0x00000002 |
+| 11 | --- | --- | 0xf80000f0 | 0x00000002 |
+| 12 | --- | --- | 0xf80000f8 | 0x00000002 |
+| 13 | --- | --- | 0xf80000fc | 0x00000002 |
+| 14 | --- | --- | 0xf80000c4 | 0x00000002 |
+| 15 | --- | --- | 0xf800009c | 0x00000002 |
+| 16 | --- | --- | 0xf80000d4 | 0x00000002 |
+| 17 | --- | --- | 0xf80000d4 | 0x00000002 |
+| 18 | --- | --- | 0xf80000d4 | 0x00000002 |
+| 19 | --- | --- | 0xf80000d0 | 0x00000002 |
+| 20 | --- | --- | 0xf80000d0 | 0x00000002 |
+| 21 | --- | --- | 0xf80000d0 | 0x00000002 |
+| 22 | --- | --- | 0xf8000124 | 0x00000002 |
+... (ours total 5, canary total 23)
+
+### γ-D-B dispatch (lr=0x8245DB08) — NtSetEvent wrapper @ 0x824AA2F0
+
+| pos | ours r3 | ours r4 | canary r3 | canary r4 |
+|---:|---|---|---|---|
+| 0 | 0x000010d8 | 0x7116fc40 | 0xf8000044 | 0x7033fc10 |
+| 1 | --- | --- | 0xf8000080 | 0x7033fc10 |
+| 2 | --- | --- | 0xf80000c0 | 0x7033fc10 |
+| 3 | --- | --- | 0xf80000d0 | 0x7033fc10 |
+| 4 | --- | --- | 0xf80000b4 | 0x7033fc10 |
+| 5 | --- | --- | 0xf80000d4 | 0x7033fc10 |
+| 6 | --- | --- | 0xf80000d0 | 0x7033fc10 |
+| 7 | --- | --- | 0xf80000c8 | 0x7033fc10 |
+
+## First-mismatch identification
+
+Per-LR position 0:
+
+- γ-DB40 pos[0]: ours r3=0x1040 r5=0x800 r4=0x41a01cd0 | canary r3=0xF8000030 r5=0x800 r4=0xBDB18CD0
+  - **r5 (size) MATCHES** = 0x800.
+  - r4 (buf pointer) DIFFERS in absolute address (0x41a01cd0 vs 0xBDB18CD0) — different memory layouts, expected.
+  - r3 different namespace — to be expected (pseudo-handle vs slot id).
+
+- γ-D-A pos[0]: ours r3=0x1054 r4=0x2 | canary r3=0xF8000044 r4=0x2
+  - r4 (signal-kind=Set) MATCHES.
+  - Args structurally match.
+
+- γ-D-B pos[0]: ours r3=0x10D8 r4=0x7116FC40 r5=0x2 | canary r3=0xF8000044 r4=0x7033FC10 r5=0x2
+  - r5 (signal-kind) MATCHES.
+  - r4 (ctx pointer) DIFFERS in absolute address — different stack layout.
+
+Position-0 invocations are STRUCTURALLY consistent. The divergence in per-fire COUNT (5 vs 23, 1 vs 8, 75 vs 461) means ours's producer LOOP runs ~5× fewer iterations before exiting.
+
+## Wedge handle status in ours
+
+**AUDIT-062 archive** (~9 days old) recorded ours wedge handles `0x12AC` and `0x12B8` (kind=Event/Auto)
+with `<NO_SIGNALS_DESPITE_WAITS>` annotation.
+
+In THIS run's ours lr-trace: handle 0x12AC count = **0**, handle 0x12B8 count = **0**.
+
+Max handle seen in lr-trace: 0x121C (cache file handle).
+The wedge handles `0x12AC`/`0x12B8` were NOT created in this 5B-instruction run — boot terminates early.
+
+## Boot-termination evidence
+
+- ours exec completed 1.5B instr / 47s wallclock, OR 5B instr / 159s wallclock — same handle universe.
+- `--halt-on-deadlock` did NOT trigger.
+- import_calls = 39,290 identical on both runs.
+- tid=5 producer fires 81 events then goes quiet; consumer threads remain blocked on existing handles indefinitely.
+- Wedge `0x12AC`/`0x12B8` from AUDIT-062 archive likely formed in deeper-boot trajectory (NtCreateEvent calls after a graphics-frame-tick or similar event that doesn't fire here).
+
+## Classification: missing-signal vs race
+
+**ours produces 81 signals where canary produces 492 from the SAME caller chain on the SAME guest thread.**
+
+This is a **producer-loop-underrun** classification:
+- The signaler thread (tid=5) runs the EXACT SAME guest-code path (PCs match, LRs match).
+- Position-0 args match structurally.
+- But the loop ITERATES far fewer times before going idle.
+
+The "wrong handles" framing from AUDIT-062 is partial: the bigger problem is that **the loop exits early** — most of the work that canary completes never gets touched by ours.
+
+Mechanism: sub_82450A68 dispatch loop reads work from a guest-memory work queue. Each iteration enqueues a new task once the previous fires. If the producer FEEDING that queue under-fires, the dispatch loop's read-head reaches the tail early and the loop exits (or blocks on a dispatcher event with no pending work).
--- a/audit-runs/audit-069-wait-signal-producer/s4/divergence-analysis.md
+++ b/audit-runs/audit-069-wait-signal-producer/s4/divergence-analysis.md
@@ -0,0 +1,209 @@
+# AUDIT-069 Session 4 — divergence analysis
+
+Date: 2026-05-20
+xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED)
+
+## Headline (HIGH confidence — direct per-iteration measurement)
+
+The S3 framing of "producer-loop underrun" was directionally right but
+mis-located the divergence. The loop in `sub_82450A68` **does not take
+an early-exit branch in either engine** — neither ours nor canary ever
+reaches `0x82450B50` (the exit path). Both stay in the loop indefinitely.
+
+The divergence is **WHAT the NtWaitForMultipleObjectsEx call returns at
+each iteration**:
+
+- **Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.**
+- **Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.**
+
+This refines the producer-loop classification: it is NOT loop-underrun
+(both engines's loops run continuously). It is a **semaphore-state
+divergence** — ours's work semaphore is over-released or never properly
+drained; canary's drains correctly and the wait times out per 16ms tick.
+
+## Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt)
+
+```
+0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68)
+0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg)
+0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours),
+                                  [r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours).
+0x82450A98:  bl 0x824AB240  ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout
+0x82450A9C-A0:  cmplwi/beq cr6, r3, 0  → 0x82450B50  [EXIT-WAIT1: r3==0 → exit (stop signaled)]
+0x82450AA4-A8:  li r29,0; li r28,4    [FIRST-ITER body entry]
+0x82450AAC:  lwz r11, 212(r31)          [BACK-EDGE TARGET; reads "fast-path flag"]
+0x82450AB0-BC:  cntlzw / extrwi / cmplwi / bne cr6, 0xAC8  [BR-A: flag@212!=0 → search path]
+0x82450AC0-C4:  li r4,5; b 0xB2C  [BR-B: flag@212==0 → direct dispatch w/ r4=5]
+0x82450AC8-CC:  mr r30,r29; addi r11,r31,112  [search-path setup]
+0x82450AD0-E0:  lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8  [BR-C: candidate found]
+0x82450AE4-F0:  addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0  [BR-D: search continue]
+0x82450AF4:  b 0xB34  [BR-E: search exhausted → skip dispatch, re-wait]
+0x82450AF8:  lwz r11, 224(r31)  [budget check]
+0x82450AFC-00:  cmplwi cr6, r11, 0; beq cr6, 0xB28  [BR-F: budget@224==0 → skip refresh]
+0x82450B04-0C:  lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28  [BR-G: budget cmp]
+0x82450B10:  bl 0x824AA830  [KeQueryPerformanceCounter; sub_824AA830]
+0x82450B14-1C:  lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34  [BR-H: budget exceeded → re-wait]
+0x82450B20-24:  stw r28, 220(r31); stw r29, 224(r31)
+0x82450B28:  mr r4, r30
+0x82450B2C-30:  mr r3, r31; bl 0x82450B68  [DISPATCH: calls γ-signaler family]
+0x82450B34-44:  li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240  [RE-WAIT]
+0x82450B48-4C:  cmplwi cr6, r3, 0; bne cr6, 0x82450AAC  [BACK-EDGE: r3!=0 → loop]
+0x82450B50-58:  li r3,0; addi r1,r1,128; b 0x825F0FD8  [EXIT path]
+```
+
+## Handle slots (ours, mem-watch confirmed)
+
+```
+[r31+88] = [0x828F3BC0]  written at PC 0x8244FFB0 from NtCreateEvent  → ours handle 0x104C
+[r31+92] = [0x828F3BC4]  written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050
+```
+
+Created in `sub_8244FF50` (the spawn helper) BEFORE ExCreateThread:
+- handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0)
+- handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)
+
+This is a **stop-event + work-semaphore** pattern, NOT two events.
+NtWaitForMultipleObjectsEx with WaitAny:
+- r3 = WAIT_OBJECT_0  = 0     → handle[0] (stop event) signaled → EXIT
+- r3 = WAIT_OBJECT_0+1 = 1   → handle[1] (semaphore) acquired (decremented) → DO WORK
+- r3 = WAIT_TIMEOUT = 0x102  → 16ms elapsed with no signal → continue (poll)
+
+## Per-PC iteration counts (HIGH confidence, direct branch-probe)
+
+| PC | path | ours fires | canary fires | ratio |
+|---|---|---:|---:|---:|
+| 0x82450AA4 | FIRST-ITER entry | 1 | 1 | 1× |
+| 0x82450AAC | BACK-EDGE target | 91 | 4 | (canary crashed early) |
+| 0x82450AC0 | BR-B: flag@212==0 direct-dispatch r4=5 | 2 | 0 | — |
+| 0x82450AC8 | BR-A: flag@212!=0 search path | 90 | 4 | — |
+| 0x82450AE4 | inner-search continue | 72 | 17 | — |
+| 0x82450AF4 | BR-E: search exhausted | 8 | 3 | — |
+| 0x82450AF8 | BR-C: candidate found | 82 | 1 | — |
+| 0x82450B04 | BR-F: budget skip | 81 | 0 | — |
+| 0x82450B10 | budget refresh (KeQuery) | 8 | 0 | — |
+| 0x82450B28 | dispatch entry (r4=r30) | 74 | 1 | — |
+| 0x82450B34 | re-wait entry | 92 | 4 | — |
+| **0x82450B50** | **EXIT path** | **0** | **0** | **never exits** |
+
+Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on
+exit. The relevant signal is in the **r3 distribution at the back-edge**,
+not the absolute counts.
+
+## r3 distribution at the back-edge (HIGH confidence)
+
+### Ours (91 captures at PC=0x82450AAC, lr=0x82450B48)
+
+```
+r3=0x00000001 × 91/91 (100%)
+r3=0x00000102 ×  0/91 (0%)
+```
+
+### Canary (4 captures at PC=0x82450AAC, lr=0x82450B48)
+
+```
+r3=0x00000001 × 1/4 (25%)
+r3=0x00000102 × 3/4 (75%)
+```
+
+Pattern visible in canary trace: first re-wait returns 0x1 (work
+available immediately), subsequent re-waits return 0x102 (timeout).
+
+## The divergent guest-memory location
+
+The "divergent load" the user's framing predicted (a guest load reading
+some flag whose value differs ours-vs-canary) is **the wait return
+value, computed inside the kernel** — not a guest-memory load. The
+return r3 comes from `NtWaitForMultipleObjectsEx` (a kernel import).
+
+The kernel-side state that differs is the **WORK SEMAPHORE COUNT**:
+
+- Ours: count > 0 at every wait → wait succeeds (decrement, r3=1)
+- Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102)
+
+The semaphore count is influenced by:
+- `NtReleaseSemaphore(handle[1], 1)` calls (increments count by 1)
+- `NtWaitForMultipleObjectsEx` success on handle[1] (decrements by 1)
+
+So either:
+- (a) ours's NtReleaseSemaphore is called more aggressively than canary's
+- (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug)
+- (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug)
+- (d) ours's NtReleaseSemaphore over-releases (kind-extra count)
+
+## NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs)
+
+```
+sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites),
+sub_822f2328,
+sub_823dd770, sub_823dd838, sub_823de4b8 (×3),
+sub_823df320,
+sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2)
+sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8)
+sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28)
+sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0,
+sub_8245AAF0, sub_8245ABD8, sub_8245AD00
+```
+
+The most-suspicious sites for this audit are the three in the
+dispatch-loop module: `sub_82450218`, `sub_824503A0`, and the
+self-release in `sub_82450B68`.
+
+## Most-recent kernel calls before the divergent load (ours tid=5)
+
+The "divergent load" is the kernel-side return of `NtWaitForMultipleObjectsEx`.
+No guest-memory load is the proximate cause. Most-recent kernel calls
+before each wait on ours tid=5 (from S3's ours-lr-trace data):
+
+- `sub_824AB158` ↔ `NtReleaseSemaphore` (via wrapper)
+- `sub_824AA2F0` ↔ `NtSetEvent`
+- `sub_824AAF50` ↔ `KeSetEvent`-style with ptr+size args
+- `sub_824AA830` ↔ `KeQueryPerformanceCounter`-like
+- `sub_824AB240` ↔ `NtWaitForMultipleObjectsEx` itself
+
+## Hypothesis (MEDIUM-HIGH confidence)
+
+The semaphore is being **over-released** in ours. Specifically, one of
+the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8,
+or any of the 22 other release-call sites) is firing release more often
+than the dispatch loop is consuming work — OR — ours's wait kernel
+handler in `xenia-kernel/src/exports.rs` is not atomically decrementing
+the semaphore count on WAIT_OBJECT_0+N.
+
+Ranked S5 leads:
+
+1. **Audit ours's `NtWaitForMultipleObjectsEx` handler implementation**:
+   does it decrement the semaphore on success? (Likely yes — would
+   regress many things otherwise. Test with a small probe.)
+2. **Probe `NtReleaseSemaphore` call rate on handle 0x1050** in ours.
+   Compare to canary on equivalent handle (some F8000xxx in canary).
+   Hypothesis: ours releases more often per dispatch.
+3. **Cross-check the canary equivalent handle**: canary uses
+   `XSemaphore::native_object()` pseudo-handle for handle[1]. Use
+   `audit_69_event_signal_watch` extension (or grep S1's
+   `signal-probe-correlated.log` for KeReleaseSemaphore + the relevant
+   ptr) to identify canary's semaphore handle ID, then run the same probe.
+
+## Classification
+
+NOT a loop-exit-branch divergence (neither engine exits).
+NOT a missing-thread / missing-spawn divergence (S2 closed that).
+NOT a wrong-handle-selection divergence (S3 confirmed args match).
+
+It IS a **semaphore-state divergence**: ours's NtWaitForMultipleObjects
+keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's
+returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in
+ours; zero in canary.
+
+## Confidence flags
+
+| finding | confidence | reasoning |
+|---|---|---|
+| both loops never exit (B50 never fires) | HIGH | direct measurement |
+| ours r3=1 always at back-edge | HIGH | 91/91 captures direct measurement |
+| canary r3=0x102 mostly at back-edge | HIGH | 3/4 captures direct measurement |
+| handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF | HIGH | mem-watch + disasm confirmed |
+| handle[0] is NtCreateEvent | HIGH | disasm confirmed at 0x824A9F18 |
+| ours handle slot values 0x104C, 0x1050 | HIGH | mem-watch confirmed |
+| no exit-branch divergence in matching iter | HIGH | exit branch never taken in either |
+| semaphore-state divergence root cause | MEDIUM-HIGH | r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init) |
+| S5 path-1 (NtWaitForMultiple decrement bug) | MEDIUM | most likely culprit given kernel-side state divergence pattern, but other hypotheses still open |
--- a/audit-runs/audit-069-wait-signal-producer/s4/sub_82450A68-disasm.txt
+++ b/audit-runs/audit-069-wait-signal-producer/s4/sub_82450A68-disasm.txt
@@ -0,0 +1,80 @@
+  0x82450a28:  mflr    r12
+  0x82450a2c:  stw     r12, -8(r1)
+  0x82450a30:  std     r31, -16(r1)
+  0x82450a34:  stwu    r1, -96(r1)
+  0x82450a38:  mr      r31, r3
+  0x82450a3c:  li      r4, 3
+  0x82450a40:  li      r3, -2
+  0x82450a44:  bl      0x824AA658
+  0x82450a48:  mr      r3, r31
+  0x82450a4c:  bl      0x82450A68
+  0x82450a50:  addi    r1, r1, 96
+  0x82450a54:  lwz     r12, -8(r1)
+  0x82450a58:  mtlr    r12
+  0x82450a5c:  ld      r31, -16(r1)
+  0x82450a60:  blr
+  0x82450a64:  .long   0x00000000
+  0x82450a68:  mflr    r12
+  0x82450a6c:  bl      0x825F0F88
+  0x82450a70:  stwu    r1, -128(r1)
+  0x82450a74:  mr      r31, r3
+  0x82450a78:  li      r6, 16
+  0x82450a7c:  li      r5, 0
+  0x82450a80:  addi    r4, r1, 80
+  0x82450a84:  li      r3, 2
+  0x82450a88:  lwz     r11, 88(r31)
+  0x82450a8c:  stw     r11, 80(r1)
+  0x82450a90:  lwz     r11, 92(r31)
+  0x82450a94:  stw     r11, 84(r1)
+  0x82450a98:  bl      0x824AB240
+  0x82450a9c:  cmplwi  cr6, r3, 0x0
+  0x82450aa0:  beq     cr6, 0x82450B50
+  0x82450aa4:  li      r29, 0
+  0x82450aa8:  li      r28, 4
+  0x82450aac:  lwz     r11, 212(r31)
+  0x82450ab0:  cntlzw  r11, r11
+  0x82450ab4:  extrwi  r11, r11, 1, 26
+  0x82450ab8:  cmplwi  cr6, r11, 0x0
+  0x82450abc:  bne     cr6, 0x82450AC8
+  0x82450ac0:  li      r4, 5
+  0x82450ac4:  b       0x82450B2C
+  0x82450ac8:  mr      r30, r29
+  0x82450acc:  addi    r11, r31, 112
+  0x82450ad0:  lwz     r10, 0(r11)
+  0x82450ad4:  cntlzw  r10, r10
+  0x82450ad8:  extrwi  r10, r10, 1, 26
+  0x82450adc:  cmplwi  cr6, r10, 0x0
+  0x82450ae0:  beq     cr6, 0x82450AF8
+  0x82450ae4:  addi    r30, r30, 1
+  0x82450ae8:  addi    r11, r11, 20
+  0x82450aec:  cmplwi  cr6, r30, 0x5
+  0x82450af0:  blt     cr6, 0x82450AD0
+  0x82450af4:  b       0x82450B34
+  0x82450af8:  lwz     r11, 224(r31)
+  0x82450afc:  cmplwi  cr6, r11, 0x0
+  0x82450b00:  beq     cr6, 0x82450B28
+  0x82450b04:  lwz     r11, 220(r31)
+  0x82450b08:  cmpw    cr6, r11, r30
+  0x82450b0c:  bge     cr6, 0x82450B28
+  0x82450b10:  bl      0x824AA830
+  0x82450b14:  lwz     r11, 224(r31)
+  0x82450b18:  cmplw   cr6, r3, r11
+  0x82450b1c:  blt     cr6, 0x82450B34
+  0x82450b20:  stw     r28, 220(r31)
+  0x82450b24:  stw     r29, 224(r31)
+  0x82450b28:  mr      r4, r30
+  0x82450b2c:  mr      r3, r31
+  0x82450b30:  bl      0x82450B68
+  0x82450b34:  li      r6, 16
+  0x82450b38:  li      r5, 0
+  0x82450b3c:  addi    r4, r1, 80
+  0x82450b40:  li      r3, 2
+  0x82450b44:  bl      0x824AB240
+  0x82450b48:  cmplwi  cr6, r3, 0x0
+  0x82450b4c:  bne     cr6, 0x82450AAC
+  0x82450b50:  li      r3, 0
+  0x82450b54:  addi    r1, r1, 128
+  0x82450b58:  b       0x825F0FD8
+  0x82450b5c:  .long   0x00000000
+  0x82450b60:  lwz     r18, 9792(r31)
+  0x82450b64:  lwz     r16, 13880(r14)
--- a/audit-runs/audit-069-wait-signal-producer/s5/sub_82450B68-disasm.txt
+++ b/audit-runs/audit-069-wait-signal-producer/s5/sub_82450B68-disasm.txt
@@ -0,0 +1,202 @@
+Disassembly from requested address 0x82450b68 (200 instructions):
+
+  0x82450b68:  mflr    r12
+  0x82450b6c:  bl      0x825F0F74
+  0x82450b70:  subi    r31, r1, 176
+  0x82450b74:  stwu    r1, -176(r1)
+  0x82450b78:  mr      r29, r4
+  0x82450b7c:  mr      r27, r3
+  0x82450b80:  cmpwi   cr6, r29, 5
+  0x82450b84:  bne     cr6, 0x82450B94
+  0x82450b88:  addi    r28, r27, 196
+  0x82450b8c:  addi    r26, r27, 28
+  0x82450b90:  b       0x82450BAC
+  0x82450b94:  slwi    r11, r29, 2
+  0x82450b98:  mr      r26, r27
+  0x82450b9c:  add     r11, r29, r11
+  0x82450ba0:  slwi    r11, r11, 2
+  0x82450ba4:  add     r11, r11, r27
+  0x82450ba8:  addi    r28, r11, 96
+  0x82450bac:  addi    r23, r27, 56
+  0x82450bb0:  mr      r3, r23
+  0x82450bb4:  stw     r23, 84(r31)
+  0x82450bb8:  bl      0x8284DCFC
+  0x82450bbc:  mr      r3, r26
+  0x82450bc0:  bl      0x8284DCFC
+  0x82450bc4:  lwz     r7, 16(r28)
+  0x82450bc8:  cntlzw  r11, r7
+  0x82450bcc:  extrwi  r11, r11, 1, 26
+  0x82450bd0:  cmplwi  cr6, r11, 0x0
+  0x82450bd4:  beq     cr6, 0x82450BEC
+  0x82450bd8:  mr      r3, r26
+  0x82450bdc:  bl      0x8284DD0C
+  0x82450be0:  mr      r3, r23
+  0x82450be4:  bl      0x8284DD0C
+  0x82450be8:  b       0x82450EE8
+  0x82450bec:  lwz     r11, 12(r28)
+  0x82450bf0:  lwz     r9, 8(r28)
+  0x82450bf4:  srwi    r10, r11, 2
+  0x82450bf8:  clrlwi  r8, r11, 30
+  0x82450bfc:  cmplw   cr6, r9, r10
+  0x82450c00:  bgt     cr6, 0x82450C08
+  0x82450c04:  sub     r10, r10, r9
+  0x82450c08:  lwz     r9, 4(r28)
+  0x82450c0c:  slwi    r10, r10, 2
+  0x82450c10:  slwi    r8, r8, 2
+  0x82450c14:  lwz     r6, 8(r28)
+  0x82450c18:  addi    r11, r11, 1
+  0x82450c1c:  slwi    r6, r6, 2
+  0x82450c20:  li      r24, 0
+  0x82450c24:  lwzx    r10, r10, r9
+  0x82450c28:  cmplw   cr6, r6, r11
+  0x82450c2c:  lwzx    r30, r10, r8
+  0x82450c30:  stw     r11, 12(r28)
+  0x82450c34:  stw     r30, 80(r31)
+  0x82450c38:  bgt     cr6, 0x82450C40
+  0x82450c3c:  stw     r24, 12(r28)
+  0x82450c40:  subic.  r11, r7, 1
+  0x82450c44:  stw     r11, 16(r28)
+  0x82450c48:  bne     0x82450C50
+  0x82450c4c:  stw     r24, 12(r28)
+  0x82450c50:  addi    r25, r27, 28
+  0x82450c54:  mr      r3, r25
+  0x82450c58:  bl      0x8284DCFC
+  0x82450c5c:  mr      r3, r25
+  0x82450c60:  stw     r30, 216(r27)
+  0x82450c64:  bl      0x8284DD0C
+  0x82450c68:  mr      r3, r26
+  0x82450c6c:  bl      0x8284DD0C
+  0x82450c70:  lwz     r11, 28(r30)
+  0x82450c74:  clrlwi  r11, r11, 31
+  0x82450c78:  cmplwi  cr6, r11, 0x0
+  0x82450c7c:  bne     cr6, 0x82450D30
+  0x82450c80:  lwz     r11, 8(r30)
+  0x82450c84:  cmplwi  cr6, r11, 0x1
+  0x82450c88:  blt     cr6, 0x82450CE4
+  0x82450c8c:  bne     cr6, 0x82450D3C
+  0x82450c90:  lwz     r11, 28(r30)
+  0x82450c94:  rlwinm  r11, r11, 0, 29, 29
+  0x82450c98:  cmplwi  cr6, r11, 0x0
+  0x82450c9c:  beq     cr6, 0x82450CB0
+  0x82450ca0:  mr      r4, r30
+  0x82450ca4:  mr      r3, r27
+  0x82450ca8:  bl      0x824510E0
+  0x82450cac:  b       0x82450CBC
+  0x82450cb0:  mr      r4, r30
+  0x82450cb4:  mr      r3, r27
+  0x82450cb8:  bl      0x824517B0
+  0x82450cbc:  stw     r29, 220(r27)
+  0x82450cc0:  bl      0x824AA830
+  0x82450cc4:  mr      r11, r3
+  0x82450cc8:  lwz     r3, 92(r27)
+  0x82450ccc:  li      r5, 0
+  0x82450cd0:  addi    r11, r11, 66
+  0x82450cd4:  li      r4, 1
+  0x82450cd8:  stw     r11, 224(r27)
+  0x82450cdc:  bl      0x824AB158
+  0x82450ce0:  b       0x82450D3C
+  0x82450ce4:  lwz     r11, 28(r30)
+  0x82450ce8:  mr      r4, r30
+  0x82450cec:  mr      r3, r27
+  0x82450cf0:  rlwinm  r11, r11, 0, 29, 29
+  0x82450cf4:  cmplwi  cr6, r11, 0x0
+  0x82450cf8:  beq     cr6, 0x82450D04
+  0x82450cfc:  bl      0x82450F68
+  0x82450d00:  b       0x82450D08
+  0x82450d04:  bl      0x82451238
+  0x82450d08:  stw     r29, 220(r27)
+  0x82450d0c:  bl      0x824AA830
+  0x82450d10:  mr      r11, r3
+  0x82450d14:  lwz     r3, 92(r27)
+  0x82450d18:  li      r5, 0
+  0x82450d1c:  addi    r11, r11, 66
+  0x82450d20:  li      r4, 1
+  0x82450d24:  stw     r11, 224(r27)
+  0x82450d28:  bl      0x824AB158
+  0x82450d2c:  b       0x82450D3C
+  0x82450d30:  lwz     r11, 28(r30)
+  0x82450d34:  ori     r11, r11, 0x2
+  0x82450d38:  stw     r11, 28(r30)
+  0x82450d3c:  lwz     r11, 8(r30)
+  0x82450d40:  mr      r29, r24
+  0x82450d44:  cmpwi   cr6, r11, 2
+  0x82450d48:  blt     cr6, 0x82450E08
+  0x82450d4c:  cmpwi   cr6, r11, 3
+  0x82450d50:  ble     cr6, 0x82450DA0
+  0x82450d54:  cmpwi   cr6, r11, 4
+  0x82450d58:  bne     cr6, 0x82450E08
+  0x82450d5c:  lwz     r11, 28(r30)
+  0x82450d60:  rlwinm  r11, r11, 0, 29, 29
+  0x82450d64:  cmplwi  cr6, r11, 0x0
+  0x82450d68:  bne     cr6, 0x82450D98
+  0x82450d6c:  lwz     r29, 36(r30)
+  0x82450d70:  mr      r3, r29
+  0x82450d74:  lwz     r11, 0(r29)
+  0x82450d78:  lwz     r11, 4(r11)
+  0x82450d7c:  mtctr   r11
+  0x82450d80:  bctrl
+  0x82450d84:  clrlwi  r11, r3, 24
+  0x82450d88:  cmplwi  cr6, r11, 0x0
+  0x82450d8c:  beq     cr6, 0x82450D98
+  0x82450d90:  mr      r3, r29
+  0x82450d94:  bl      0x8244FB38
+  0x82450d98:  li      r29, 1
+  0x82450d9c:  b       0x82450E28
+  0x82450da0:  addi    r3, r30, 40
+  0x82450da4:  bl      0x82451DB8
+  0x82450da8:  lwz     r11, 32(r30)
+  0x82450dac:  cmplwi  cr6, r11, 0x0
+  0x82450db0:  beq     cr6, 0x82450DCC
+  0x82450db4:  rlwinm  r11, r11, 0, 0, 31
+  0x82450db8:  lwz     r10, 4(r30)
+  0x82450dbc:  lwz     r11, 4(r11)
+  0x82450dc0:  cmplw   cr6, r10, r11
+  0x82450dc4:  li      r11, 1
+  0x82450dc8:  beq     cr6, 0x82450DD0
+  0x82450dcc:  mr      r11, r24
+  0x82450dd0:  clrlwi  r11, r11, 24
+  0x82450dd4:  cmplwi  cr6, r11, 0x0
+  0x82450dd8:  beq     cr6, 0x82450E00
+  0x82450ddc:  lwz     r4, 8(r30)
+  0x82450de0:  lwz     r5, 0(r30)
+  0x82450de4:  lwz     r3, 32(r30)
+  0x82450de8:  cmpwi   cr6, r4, 1
+  0x82450dec:  ble     cr6, 0x82450DFC
+  0x82450df0:  bl      0x8245D9D8
+  0x82450df4:  li      r29, 1
+  0x82450df8:  b       0x82450E28
+  0x82450dfc:  stw     r4, 8(r3)
+  0x82450e00:  li      r29, 1
+  0x82450e04:  b       0x82450E28
+  0x82450e08:  mr      r3, r26
+  0x82450e0c:  stw     r26, 88(r31)
+  0x82450e10:  bl      0x8284DCFC
+  0x82450e14:  addi    r4, r31, 80
+  0x82450e18:  mr      r3, r28
+  0x82450e1c:  bl      0x823232C0
+  0x82450e20:  mr      r3, r26
+  0x82450e24:  bl      0x8284DD0C
+  0x82450e28:  clrlwi  r11, r29, 24
+  0x82450e2c:  cmplwi  cr6, r11, 0x0
+  0x82450e30:  beq     cr6, 0x82450ECC
+  0x82450e34:  lwz     r11, 28(r30)
+  0x82450e38:  rlwinm  r11, r11, 0, 30, 30
+  0x82450e3c:  cmplwi  cr6, r11, 0x0
+  0x82450e40:  beq     cr6, 0x82450E68
+  0x82450e44:  mr      r3, r26
+  0x82450e48:  stw     r26, 88(r31)
+  0x82450e4c:  bl      0x8284DCFC
+  0x82450e50:  addi    r4, r31, 80
+  0x82450e54:  mr      r3, r28
+  0x82450e58:  bl      0x823232C0
+  0x82450e5c:  mr      r3, r26
+  0x82450e60:  bl      0x8284DD0C
+  0x82450e64:  b       0x82450ECC
+  0x82450e68:  lwz     r11, 40(r30)
+  0x82450e6c:  cmplwi  cr6, r11, 0x0
+  0x82450e70:  beq     cr6, 0x82450EA4
+  0x82450e74:  rlwinm  r3, r11, 0, 0, 31
+  0x82450e78:  bl      0x82458A70
+  0x82450e7c:  lwz     r29, 40(r30)
+  0x82450e80:  lwz     r3, 0(r29)
+  0x82450e84:  bl      0x824583E8
--- a/audit-runs/audit-069-wait-signal-producer/writer-report-v2.md
+++ b/audit-runs/audit-069-wait-signal-producer/writer-report-v2.md
@@ -0,0 +1,192 @@
+# AUDIT-069 Session 2 — writer report v2
+
+Date: 2026-05-20
+xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED from S1)
+`git diff HEAD | sha256sum`: `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357` (UNCHANGED from S1 end)
+No canary instrumentation added this session.
+
+## Headline
+
+**S1's framing is FALSIFIED.**  ours does NOT lack a "canary-tid=10
+equivalent" thread. The spawn chain executes identically:
+
+  main (ours tid=1) → sub_8244FEA8 → sub_8244FF50
+                    → ExCreateThread(entry=0x82450A28, ctx=0x828F3B68)
+                    → ours tid=5 starts
+                    → sub_82450A28 (1×) → sub_82450A68 (1×)
+                    → γ-signaler family (sub_8245D9D8 6×, sub_8245DA78 1×, sub_8245DB40 75×)
+
+  This is bit-equivalent to canary's chain, modulo the tid label
+  (canary calls it tid=10, ours calls it tid=5 — same entry, same ctx,
+  same dispatch loop, same γ-signaler family fires from inside it).
+
+  The signaler spawn-chain is NOT the bug. S1's "the bug is at the
+  thread-spawn layer" hypothesis is wrong.
+
+## Spawn chain (DB-derived, READ-ONLY DuckDB)
+
+| Fn | callers in DB | role |
+|---|---|---|
+| 0x82450A28 | 1 ref-edge from 0x8244FFF8 (sub_8244FF50+0xA8) | thread entry (data ptr only) |
+| 0x8244FF50 | 1 call-edge from 0x8244FEE8 (sub_8244FEA8+0x40) | ExCreateThread caller |
+| 0x8244FEA8 | 11 call-edges (8 unique callers across sub_821A5150, sub_821CB968, sub_821CC2E8, sub_821D2850, sub_82237EC8, sub_8225EE20, sub_822E0350, sub_824528A8, sub_82452DC0 (2×), sub_8245E528) | spawn helper |
+
+## Per-PC fire counts (ours-cold, 1.5B instr, fresh today)
+
+| PC | symbol | fires | tid |
+|---|---|---|---|
+| 0x8244FEA8 | sub_8244FEA8 (spawn helper) | 7 | 1 |
+| 0x8244FF50 | sub_8244FF50 (ExCreateThread caller) | 1 | 1 |
+| 0x82450A28 | sub_82450A28 (thread entry) | 1 | 5 |
+| 0x82450A68 | sub_82450A68 (worker dispatch loop) | 1 | 5 |
+| 0x8245D9D8 | γ-signaler D | 6 | 5 |
+| 0x8245DA78 | γ-signaler D-B | 1 | 5 |
+| 0x8245DB40 | γ-signaler D-NEW | 75 | 5 |
+
+Spawn event log confirms `ExCreateThread: tid=5 handle=0x1050 entry=0x82450a28 start_ctx=0x828f3b68`.
+Total `kernel.calls{name=ExCreateThread} = 10`.
+
+## Comparison with canary (S1 data — fresh today, not stale)
+
+| metric | canary | ours |
+|---|---|---|
+| thread with entry=0x82450A28 | tid=10 | tid=5 |
+| start_ctx | 0x828F3B68 | 0x828F3B68 |
+| γ-D family signaler firings | all on tid=10 | all on tid=5 |
+| NtSetEvent fires from γ-D (via wrapper 0x824AA2F0) | confirmed | confirmed |
+
+The spawn chain and γ-signaler invocation match. The only divergence at the
+signaler call site is **which handle gets signaled**, not whether the
+signaler runs.
+
+## Divergence point (parent fires, child also fires)
+
+NONE — every node in the spawn chain fires in ours. The S1-prescribed
+"first ancestor that fires while child does not" never materialises because
+the entire chain is reached identically.
+
+The actual divergence is downstream of the spawn-chain — at the
+**handle-selection** step inside the γ-signaler family, per AUDIT-062's
+prior finding ("ours's γ-signalers signal WRONG handles — neighbors of the
+wedge handle, not the wedge itself").
+
+## Gate condition
+
+There is no gate that ours fails. The control flow reaches the γ-signaler
+and invokes the NtSetEvent wrapper (`sub_824AA2F0`) with bit-identical
+control flow. The argument to NtSetEvent (the handle) is the
+divergent term.
+
+In the AUDIT-062 archive ours-ntset.jsonl, the γ-D signaler on ours tid=5
+calls NtSetEvent on handles `0x103C`, `0x1068`, `0x106C`, `0x1094`, ...
+These are guest-side handle slots that the *waiter* is NOT waiting on.
+
+Per S1, canary's wedge waiter (tid=17, tid=26) waits on `F80000A4` and
+`F8000110`. Note that canary's handles are *pseudo-handles* (high-bit
+encoded), while ours's slot allocator hands out normal `0x10xx` IDs —
+a known cross-engine handle convention mismatch already documented
+in AUDIT-019/043/062.
+
+The semantic question is therefore: **what does the producer compute as
+the "next handle to signal", and is the computation reading
+a different value of the bookkeeping struct in ours vs canary?**
+This is the question AUDIT-062 hit and parked; it must be re-opened
+now that S1 has clarified the producer thread is reached identically.
+
+## ours-side analog status
+
+The relevant kernel handlers are:
+
+- `NtSetEvent` — ours `xenia-kernel/src/exports.rs` is per-AUDIT-062 archive
+  bit-equivalent to canary in semantics (signals the event, schedules wakeup).
+  Returns SUCCESS in both.
+
+- `ExCreateThread` — ours bit-equivalent (S2 spawn matches canary trajectory
+  ctx + entry + suspended flag).
+
+- `xeKeWaitForSingleObject` (wedge wait at 0x821CB1DC) — ours behaviour
+  matches per AUDIT-049/065 prior work; the WAIT itself is fine, what
+  remains broken is the signaler picking the right handle on tid=5.
+
+  Net: NO kernel handler bug. The divergence is **guest-state computed
+  inside the γ-signaler family at sub_8245D9D8 / sub_8245DA78 /
+  sub_8245DB40** — i.e. data that lives in the queue/list dispatched
+  by sub_82450A68.
+
+## Reading-error #28 reclassification
+
+S1 inadvertently committed the same class of error documented as #28 in
+prior audit memory: "treating per-engine tid label numerically across
+engines without a tid-mapping translation." S1 used canary's "tid=10"
+verbatim and AUDIT-062's "tid=10: 0 fires" verbatim, concluding "ours's
+thread set lacks the canary-tid=10 equivalent." In reality the same
+guest thread exists on both, with renumbered host-side tid labels.
+
+The correct cross-engine identity is `(entry_pc, start_ctx)`, not the
+tid integer. S2 re-validates by `entry=0x82450a28 ∧ ctx=0x828f3b68`,
+which uniquely identifies the spawn on both engines.
+
+Do NOT register a new reading-error #; this is the existing #28 surface.
+
+## Session 3 recommendation (refined)
+
+Drop the spawn-chain investigation entirely. The producer thread runs.
+
+**Path A (RECOMMENDED, ~80 LOC ours-only)**: build a probe of the
+**handle-passed-to-NtSetEvent** on tid=5 (ours) inside the γ-signaler
+PCs, paired with the symmetric `audit_69_event_signal_watch` capture
+from S1 in canary. Compare the *sequence of handle IDs* per signaler
+invocation. The first mismatch identifies the guest-state divergence
+that drives wrong-handle selection.
+
+Plumbing path: extend `--lr-trace` in ours (`crates/xenia-app/src/main.rs:233-243`)
+to also capture `r3` snapshot at multiple PCs, matching canary's
+audit_69 wrapper-entry capture. Already exists (M12 lr_trace lists
+pc/tid/hw/cycle/r3/r4/r5/r6/lr). Probe ours `0x824AA2F0` and `0x824AAF50`
+entry PCs.
+
+**Path B (~50 LOC diff-tool)**: extend the diff-events JSONL absorber to
+treat the canary→ours handle-ID mapping as a runtime-discovered alias
+when the underlying dispatcher pointer matches. Doesn't fix the bug,
+absorbs the symptom.
+
+**Path C (root-cause, larger)**: walk sub_82450A68 dispatch loop body
+disassembly + AUDIT-062 archive to identify which guest-memory struct
+holds the queue of "handles to signal." The wrong handles on ours mean
+this struct gets populated wrong somewhere upstream of tid=5's dispatch
+loop — likely from sub_8244FEA8's 7 fires (which call sites enqueue
+work, and what data is enqueued).
+
+LOC budget for S3: Path A ~80, Path B ~50, Path C unknown (~200+).
+
+## Cascade A/B/C/D
+
+- **A** (DB-derived spawn chain): PASS (11 callers, 1 unique call edge to FF50).
+- **B** (per-fn fire counts ours+canary): PASS (ours fresh, canary from S1 fresh).
+- **C** (divergence-point identification): N/A — no divergence in spawn chain;
+  S1 framing falsified. Re-direction recommended.
+- **D** (kernel-handler bit-equivalence check): PASS (NtSetEvent / ExCreateThread
+  per AUDIT-062 archive; no new kernel bug detected).
+
+Net: 3/4 PASS, 1/4 N/A (because the postulated divergence wasn't there).
+
+## Discipline
+
+- xenia-rs HEAD UNCHANGED (sha256 of `git diff HEAD` matches S1 end).
+- No canary instrumentation added this session — S1's data is fresh.
+- ours-rs ran with `--ctor-probe` (read-only, lockstep-digest-unaffected
+  flag already in main.rs:194).
+- No source modifications to ours.
+- ours-rs cache (none on this host); no canary run, no canary cache to wipe.
+
+## Artifacts
+
+```
+audit-runs/audit-069-wait-signal-producer/
+  session-2-spawn-walk.log    (combined probe + DB queries + fires table)
+  writer-report-v2.md         (this file)
+  s2/ours-probe.stdout        (780 lines, 91 CTOR-PROBE records)
+  s2/ours-probe.stderr        (241 lines, all spawn events + summary)
+```
+
+No `fix-canary-v2.diff` (no canary instrumentation added).
--- a/audit-runs/audit-069-wait-signal-producer/writer-report-v3.md
+++ b/audit-runs/audit-069-wait-signal-producer/writer-report-v3.md
@@ -0,0 +1,229 @@
+# AUDIT-069 Session 3 — writer report v3
+
+Date: 2026-05-20
+xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED from S1/S2)
+`git diff HEAD | sha256sum`: `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357`
+   (UNCHANGED at start AND end of S3)
+No canary instrumentation added this session.
+No ours source modifications. `--lr-trace` is a runtime flag (main.rs:233-243).
+
+## Headline (HIGH confidence, direct measurement)
+
+ours's tid=5 (= canary tid=10 by entry/ctx identity) fires the γ-signaler
+family from the SAME guest LRs as canary — but **only 81 times where
+canary fires 492 times (16%)**. This is NOT a "wrong-handle" bug — it is
+a **producer-loop underrun**. The dispatch loop in `sub_82450A68` exits
+early or starves; consumer threads then block on events that ours never
+gets to signal.
+
+S2's "the producer fires identically, just selects wrong handles" framing
+is REFINED, not falsified: the producer reaches the wrappers via the
+EXACT same call sites but completes ~5× fewer iterations.
+
+## Method
+
+Read-only `--lr-trace=0x824AA2F0,0x824AAF50` on cold ours boot, 1.5B
+instructions / 47 s wallclock (and re-validated at 5B / 159s — same 81
+fires, same handle universe, same import_calls=39290 → no new work after
+the producer's initial burst). JSONL output to s3/ours-lr-trace.jsonl.
+Cross-engine paired against S1's `signal-probe-correlated.log` (canary
+data, fresh 2026-05-20).
+
+## Per-LR fire counts
+
+| caller LR | symbol | wrapper PC | canary tid=10 | ours tid=5 | ratio |
+|---|---|---|---:|---:|---:|
+| 0x8245DA44 | γ-D-A (sub_8245D9D8) | 0x824AA2F0 | 23 | 5 | 22% |
+| 0x8245DB08 | γ-D-B (sub_8245DA78) | 0x824AA2F0 | 8 | 1 | 12% |
+| 0x8245DC5C | γ-DB40 (sub_8245DB40 NEW) | 0x824AAF50 | 461 | 75 | 16% |
+| **TOTAL** | | | **492** | **81** | **16%** |
+
+ours runs the same producer code, but the loop terminates early. S2's per-PC
+fire-count table also shows ours = 6/1/75 for the three γ-fns — this S3
+data agrees with S2 for the wrapper-entry side too.
+
+## Handle namespaces are incomparable by raw ID
+
+- canary uses `XEvent::native_object()` pseudo-handles `F8000xxx` (high bit
+  set, encodes a synthetic ID assigned by `XObject::GetNativeObject`).
+- ours uses normal slot IDs `0x10xx` from the handle-slot allocator.
+
+Comparison must be by (a) **position in the per-LR sequence** and (b)
+**call args** (size r5, signal-kind r4).
+
+## Position-0 args MATCH (HIGH confidence, direct measurement)
+
+| LR | r5 (size / kind) | matches? |
+|---|---|---|
+| 0x8245DC5C | ours=0x800 / canary=0x800 | YES |
+| 0x8245DA44 | ours=2 (Set) / canary=2 | YES |
+| 0x8245DB08 | ours=2 / canary=2 | YES |
+
+r4 (buffer/ctx pointers) DIFFER in absolute address (different memory
+layouts) but TYPE-shaped identically. The first invocation of each
+signaler is structurally identical. The divergence is in COUNT of
+subsequent loop iterations, not in handle-selection of position-0.
+
+See `s3/handle-sequence-diff.md` for full position-aligned table.
+
+## γ-DB40 signal-target distribution (the 461-vs-75 case)
+
+| canary handle | count | ours handle | count |
+|---|---:|---|---:|
+| F80000C8 | 229 | 0x000010E0 | 69 |
+| F80000DC | 79 | 0x00001040 | 1 |
+| F8000078 | 71 | 0x0000105C | 1 |
+| F80000BC | 39 | 0x00001098 | 1 |
+| F800012C | 28 | 0x000010AC | 1 |
+| F80000B4 | 7 | 0x000010D0 | 1 |
+| F8000044 | 4 | 0x0000121C | 1 |
+
+Shape: both have one dominant handle that absorbs ~half the signals
+(canary 229/461=50%, ours 69/75=92%) and a long tail. ours's tail is
+truncated — only 7 distinct handles in γ-DB40 vs canary's 10+.
+
+This is consistent with **the producer enqueues the same kinds of work
+items but the upstream feeder under-fires**, so the dominant work-item
+(handle `0x10E0` ≈ `F80000C8` by position) gets some iterations,
+the next-most-common items get truncated to 1×, and the long tail
+(canary's `F80000DC` 79× / `F8000078` 71×) is mostly missing.
+
+## Wedge handle status (HIGH confidence)
+
+AUDIT-062 archive recorded ours wedge handles `0x12AC` and `0x12B8` with
+`<NO_SIGNALS_DESPITE_WAITS>` annotation in a deeper-boot run.
+
+In S3's lr-trace: **handle 0x12AC count = 0, handle 0x12B8 count = 0**.
+**No handle ≥ 0x121C appears in tid=5's signal trace at all.**
+
+Max handle observed in this run: 0x121C (cache:/aab216c3 NtCreateFile).
+
+The wedge handles are NEVER allocated in this 5B-instruction run, because
+boot terminates **before** the trajectory that would create them. The
+producer fires 81 times, then tid=5 goes quiet; the import_call counter
+freezes at 39,290; `--halt-on-deadlock` does NOT trigger (consumers wait
+on existing events that were never the wedge in this run).
+
+**This is a stronger statement than "the wedge handle is never signaled":
+the wedge handle is never even CREATED, because the boot never reaches
+the point of creating it.** ours's boot trajectory is truncated by the
+producer underrun upstream.
+
+## Classification: producer-loop underrun (HIGH confidence)
+
+NOT a race (timing-dependent), NOT a wrong-handle bug (the args at
+matching positions are structurally identical), NOT a missing-kernel-
+handler bug (the signals that DO fire pass through bit-equivalent
+wrappers).
+
+It is **producer-loop underrun**: sub_82450A68's dispatch loop iterates
+fewer times. Either:
+1. The work queue (read from guest memory by sub_82450A68) is populated
+   with fewer items by some upstream feeder.
+2. The dispatch loop's exit condition trips early.
+3. The thread blocks on a dispatcher event that never gets re-signaled.
+
+Mechanism candidates (S4 to discriminate):
+- **upstream feeder**: callers of sub_8244FEA8 (11 sites in DB) — one
+  enqueues less work in ours. Most likely the audio cluster
+  (sub_8225EE20) or sub_82452DC0 (2 calls) given they relate to APUBUG-
+  PRODUCER-001 territory.
+- **dispatch loop exit**: the loop reads a flag from the dispatcher
+  struct at `0x828F3B68 + offset`; a state divergence there exits early.
+- **inner KeWait at 0x824AB240** (mentioned in S1 spawn-chain notes):
+  if this wait times out / fails differently in ours, the loop exits.
+
+## Reading-error registry
+
+NO new reading-error class needed. This session confirms one existing
+class:
+
+- **#28 cross-engine tid label mismatch** — used correctly here
+  (compared by entry/ctx, not by tid integer).
+- **AUDIT-062 "wrong handles" framing** is a SYMPTOM of the producer
+  underrun (fewer signals → some handles signaled, others starved),
+  not a separate bug.
+
+## Cascade
+
+- **A** (capture ours per-PC signaler firings): PASS (137 records, 81 on tid=5).
+- **B** (parallel canary sequence from S1): PASS (492 records on tid=10).
+- **C** (first-mismatch identification): PASS — divergence is in iteration
+  count, not in handle-at-position-0. Position-0 args match structurally.
+- **D** (race-vs-missing-signal classification): PASS — neither pure race
+  nor pure missing-signal. It is **producer-loop underrun** (boot doesn't
+  reach the wedge-handle-creating subsystem).
+
+Net 4/4 PASS.
+
+## S4 recommendation (refined)
+
+**Drop the "wrong-handles-from-γ-signaler" framing.** Focus upstream on
+WHY tid=5's dispatch loop runs ~5× fewer iterations.
+
+### Path A (RECOMMENDED, ~30 LOC ours-only diagnostic, no source mod)
+
+Use `--lr-trace=0x82450A68` (the dispatch-loop body PC) plus the existing
+`--branch-probe` to see WHERE in the loop body ours exits. If the loop has
+a backward branch at offset X and ours's last fire is at offset Y < X, the
+loop is exiting early. Pair with the inner `bl 0x824AB240` (KeWaitForMultipleObjects)
+to see if the loop blocks on a wait that returns differently than canary.
+
+### Path B (~80 LOC ours-only) — feeder-side capture
+
+`--lr-trace=0x8244FEA8` on cold ours AND canary. The spawn-helper fires 11
+times statically in DB-derived list of callers; runtime fires 7× in S2's
+ours run. Pair r3/r4 (the spawned thread's start_ctx args) with canary's
+equivalent. ours may be missing one or more enqueues — the missing
+enqueue is the upstream root cause.
+
+### Path C (~250 LOC, larger) — work-queue struct disassembly
+
+Disassemble sub_82450A68 body, identify the work-queue struct it reads
+from (likely at `[r29 + N]` where r29 = start_ctx 0x828F3B68 or a derived
+pointer). Watch the struct with `--mem-watch` to identify the populator
+(which fn writes the queue items). Trace that populator upstream.
+
+LOC budget for S4: Path A ~30, Path B ~80, Path C ~250.
+
+**Path A first** — gives the precise exit-condition (loop-body branch vs
+inner-wait timeout) at zero LOC cost.
+
+## Discipline
+
+- xenia-rs HEAD UNCHANGED (sha256 of `git diff HEAD` matches S1/S2 end).
+- No source modifications.
+- `--lr-trace` is read-only, lockstep-digest-unaffected (per state.rs:1463-1500).
+- No canary run this session (S1's data is fresh).
+- No canary cache to wipe (no canary run).
+- ours runs cold (no cache pre-population).
+
+## Artifacts
+
+```
+audit-runs/audit-069-wait-signal-producer/s3/
+  ours-lr-trace.jsonl              (137 records, both PCs, all tids)
+  ours-lr-trace.stderr             (run log + counters)
+  ours-lr-trace.stdout             (empty under --quiet)
+  ours-lr-trace-824AA2F0.log       (60 records, NtSetEvent wrapper)
+  ours-lr-trace-824AAF50.log       (77 records, Ke wrapper)
+  ours-lr-trace-extended.{jsonl,stderr,stdout}  (5B-instr re-validation: same 81 fires)
+  handle-sequence-diff.md          (parallel comparison + first-mismatch table)
+  writer-report-v3.md              (this file)
+```
+
+No fresh canary run was needed — S1's `signal-probe-correlated.log`
+(154,187 lines) carries all canary signal-probe data.
+
+## Summary of S1 → S2 → S3 progression
+
+- **S1**: identified canary's tid=10 as the signaler; claimed ours lacks
+  this thread (FALSIFIED by S2).
+- **S2**: spawn-chain runs identically on ours tid=5; refined to "wrong-
+  handle selection" downstream (REFINED by S3).
+- **S3**: ours runs identical PC/LR chain but with ~5× fewer iterations.
+  Loop underrun classification. Wedge handle never even gets created in
+  ours's truncated boot trajectory.
+
+The bug is **upstream of the γ-signaler**: in WHAT the dispatch loop
+reads from the work queue, or in the loop's exit condition.
--- a/audit-runs/audit-069-wait-signal-producer/writer-report-v4.md
+++ b/audit-runs/audit-069-wait-signal-producer/writer-report-v4.md
@@ -0,0 +1,357 @@
+# AUDIT-069 Session 4 — writer report v4
+
+Date: 2026-05-20
+xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED from S1/S2/S3)
+`git diff HEAD | sha256sum`: `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357`
+   (UNCHANGED at start AND end of S4)
+No ours source modifications. No canary instrumentation added.
+Canary `audit_61_branch_probe_pcs` cvar used (pre-existing from S1).
+Canary cache restored from `/tmp/canary-cache-bak-audit-068`.
+
+## Headline (HIGH confidence — direct per-iteration measurement)
+
+S3's "producer-loop underrun" framing pointed in the right direction
+but mis-located the divergence. **Neither engine ever takes the
+exit-branch in `sub_82450A68` (PC=0x82450B50, the LR=epilog path)**.
+Both engines's dispatch threads stay in the loop indefinitely (no
+deadlock; just waiting).
+
+The actual divergence is in the **return value of the
+`NtWaitForMultipleObjectsEx` call at PC=0x82450B44**:
+
+- **Ours: r3 = 0x00000001 in 91/91 captures (100%)** — semaphore acquired.
+- **Canary: r3 = 0x00000102 in 3/4 captures (75%)** — WAIT_TIMEOUT.
+
+The two handles being waited on are:
+- **handle[0] = NtCreateEvent** at `[r31+88]` — the STOP event (signal → exit).
+- **handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)**
+  at `[r31+92]` — the WORK semaphore (signal → process work).
+
+Both created by `sub_8244FF50` (spawn helper) BEFORE `ExCreateThread`.
+mem-watch confirms handle slots in ours: `0x104C` (event) / `0x1050`
+(semaphore) at run-1; absolute IDs drift across runs but the slot
+layout is invariant.
+
+This is **NOT an exit-branch divergence, NOT loop-underrun in the
+literal sense — it is a SEMAPHORE-STATE divergence**. In ours the
+work-semaphore count is non-zero at every wait entry (so the wait
+always returns immediately with success); in canary the count is zero
+at most wait entries (so the wait times out per the 16ms relative
+timeout).
+
+## Method (READ-ONLY, no source mod)
+
+1. Disassembled `sub_82450A68` body (80 instructions) via
+   `xenia-rs disasm --at 0x82450A68 -n 200`. Saved to
+   `s4/sub_82450A68-disasm.txt`.
+2. Identified loop topology: prolog → wait-#1 → body (with inner search
+   over 5-slot table at [r31+112..212]) → dispatch (bl 0x82450B68 →
+   γ-signaler family) → re-wait → back-edge OR exit.
+3. Ran ours-cold with `--branch-probe=` on 14 BB-entry PCs covering all
+   loop-body paths. Captured 696 records over ~80s wallclock /
+   91 loop iterations.
+4. Ran canary-cold (cache wiped → restored from
+   `/tmp/canary-cache-bak-audit-068`) with same `audit_61_branch_probe_pcs`
+   cvar set. Canary process faulted in vkd3d-proton at ~10s wallclock;
+   captured 35 records / 4 loop iterations. Sufficient to surface the
+   r3 distribution.
+5. Used `--mem-watch=0x828F3BC0,0x828F3BC4` to identify which ours
+   handle IDs live in slots `[r31+88]` and `[r31+92]`. Then
+   disassembled `sub_8244FF50` to confirm event-vs-semaphore types via
+   the import jumps (`NtCreateEvent` at 0x824A9F18, `NtCreateSemaphore`
+   at 0x824AB0C0).
+6. Cross-checked ours's kernel handlers (`nt_wait_for_multiple_objects_ex`,
+   `do_wait_multiple`, `handle_consume`, `nt_release_semaphore`,
+   `try_release_semaphore`, `wake_eligible_waiters`) — code looks
+   correct in isolation; the divergence is NOT in these handlers
+   directly.
+
+## Per-PC iteration counts
+
+| PC | path | ours fires | canary fires | note |
+|---|---|---:|---:|---|
+| 0x82450AA4 | first-iter entry | 1 | 1 | both entered once |
+| 0x82450AAC | back-edge target | 91 | 4 | canary crashed early |
+| 0x82450AC0 | flag@212==0 → r4=5 | 2 | 0 | rare path |
+| 0x82450AC8 | flag@212!=0 → search | 90 | 4 | dominant |
+| 0x82450AE4 | inner-search continue | 72 | 17 | |
+| 0x82450AF4 | search-exhausted | 8 | 3 | no candidate found |
+| 0x82450AF8 | candidate-found | 82 | 1 | |
+| 0x82450B04 | budget skip | 81 | 0 | |
+| 0x82450B10 | budget refresh | 8 | 0 | |
+| 0x82450B28 | dispatch entry | 74 | 1 | bl 0x82450B68 |
+| 0x82450B34 | re-wait entry | 92 | 4 | |
+| **0x82450B50** | **EXIT (epilog)** | **0** | **0** | **never reached** |
+
+## r3 at back-edge (the divergence signal)
+
+| | ours | canary |
+|---|---|---|
+| r3=0x1 | 91/91 (100%) | 1/4 (25%) |
+| r3=0x102 (TIMEOUT) | 0/91 (0%) | 3/4 (75%) |
+| r3=0x0 (handle[0] signaled) | 0/91 | 0/4 |
+| r3=other | 0/91 | 0/4 |
+
+This is the **per-iteration measurement** the user's framing predicted.
+The matching iterations show different r3 values at the SAME PC. The
+"load feeding the predicate" is, however, NOT a guest-memory load — it
+is the kernel-side return of `NtWaitForMultipleObjectsEx`. The
+divergent KERNEL STATE is the work-semaphore count.
+
+## Wait wrapper chain (disasm-derived)
+
+```
+sub_824AB240:
+  li r7, 0          ; alertable = 0
+  b 0x824AB190      ; tail-jump
+
+sub_824AB190(r3=numObj, r4=&handles, r5=WaitMode, r6=Timeout(=16 ms), r7=Alertable):
+  ...
+  bl 0x824ACA88     ; converts r4=16 ms → LARGE_INTEGER -160000 (relative 100-ns ticks)
+  ...
+  bl 0x8284E08C     ; NtWaitForMultipleObjectsEx (ord 254, import @ VA 0x8284E08C)
+  ; returns NTSTATUS in r3:
+  ;   0      = WAIT_OBJECT_0   = handle[0] (stop event) signaled
+  ;   1      = WAIT_OBJECT_0+1 = handle[1] (work semaphore) acquired (atomically decrements count by 1)
+  ;   0x102  = WAIT_TIMEOUT    = 16 ms elapsed with no signal
+```
+
+`sub_82450A68` branches on this:
+- `cmplwi cr6, r3, 0; beq cr6, 0xB50` → r3 == 0 → EXIT (stop event signaled)
+- `cmplwi cr6, r3, 0; bne cr6, 0xAAC` → r3 != 0 (including 0x102) → CONTINUE
+  - r3 == 1 → at least one work-item is available → run the inner table search
+  - r3 == 0x102 → just a 16ms timer wake; inner search will likely find no candidate
+    and the loop just re-waits
+
+In canary's brief 4-iteration captured window, only iteration-0 had real
+work (`r3=1`); iterations 1-3 were timer-wakes (`r3=0x102`). In ours's
+91-iteration window, all back-edges saw `r3=1`: someone has released
+the semaphore at least once between each consume.
+
+## Handle slot identification (HIGH confidence)
+
+Via `--mem-watch=0x828F3BC0,0x828F3BC4`:
+
+```
+MEM-WATCH addr=0x828f3bc0 old=0x00000000 new=0x0000104c
+   store_addr=0x828f3bc0 store_len=4 tid=1 pc=0x8244ffb0 lr=0x8244ffb0
+MEM-WATCH addr=0x828f3bc4 old=0x00000000 new=0x00001050
+   store_addr=0x828f3bc4 store_len=4 tid=1 pc=0x8244ffcc lr=0x8244ffcc
+```
+
+Static disasm of writer PCs:
+```
+0x8244FFAC: bl 0x824A9F18    ; NtCreateEvent wrapper
+0x8244FFB0: stw r3, 88(r30)  ; handle[0] = event = ours 0x104C
+0x8244FFC8: bl 0x824AB0C0    ; NtCreateSemaphore wrapper (r4=0=Initial, r5=0x7FFFFFFF=Max)
+0x8244FFCC: stw r3, 92(r30)  ; handle[1] = semaphore = ours 0x1050
+```
+
+The semaphore is created with **InitialCount=0**. So if no one ever
+calls `NtReleaseSemaphore` on it, the wait will only ever return
+`STATUS_TIMEOUT`. Canary's behavior (mostly 0x102, occasionally 0x1)
+matches this: producers release the semaphore ~1× per ~16ms.
+
+Ours's behavior (always 0x1) means **producers release the semaphore
+FASTER THAN the consumer drains it**.
+
+## NtReleaseSemaphore call graph (xrefs to wrapper sub_824AB158)
+
+Wrapper sub_824AB158 calls NtReleaseSemaphore (ord 243, import @
+VA 0x8284E07C). Called from 22 sites across 18 functions:
+
+```
+0x822c6770 fn=0x822c6748
+0x822c6848 fn=0x822c6808
+0x822c95c4 .. 0x822c9718 fn=0x822c8b50 (×6 inline call sites)
+0x822f23e8 fn=0x822f2328
+0x823dd7f8 fn=0x823dd770
+0x823dda3c fn=0x823dd838
+0x823df008..1b4 fn=0x823de4b8 (×3)
+0x823df604 fn=0x823df320
+0x82450310 fn=0x82450218   ← dispatcher-module enqueuer (callers: sub_82452DC0 ×2)
+0x824504c4 fn=0x824503A0   ← dispatcher-module enqueuer (callers: sub_82452690, sub_8245E1D8)
+0x82450cdc fn=0x82450b68   ← THE DISPATCH FUNCTION itself (self-release)
+0x82450d28 fn=0x82450b68   ← THE DISPATCH FUNCTION itself (self-release)
+0x82456b48 fn=0x824569c0 (jump form)
+0x82458020 fn=0x82457fe0
+0x824584c8 fn=0x82458468
+0x82459424 fn=0x824591c0
+0x8245ab6c fn=0x8245aaf0
+0x8245ac6c fn=0x8245abd8
+0x8245ade0 fn=0x8245ad00
+```
+
+**Critical observation**: the dispatch function `sub_82450B68`
+contains TWO release sites (at offsets 0xCDC, 0xD28). Each successful
+dispatch run can release the semaphore again. If both branches release
+1 token, and the wait consumes only -1 per iteration, the count would
+drift up. This is consistent with the "ours over-released" hypothesis.
+
+Some sub_82450B68 branches release the semaphore via `lwz r3, 92(r27)`
+which is `handle[1]` of the dispatcher itself. So the dispatch function
+re-fills its own pipe.
+
+## Hypothesis (MEDIUM-HIGH confidence)
+
+The semaphore is being over-released in ours due to a divergent
+**dispatch-loop control flow inside `sub_82450B68`** that
+differentially decides whether to fire the self-release. Either:
+(a) ours takes a sub_82450B68 branch that releases when canary's doesn't
+(this is the dual of S3's question: which sub-branches differ?), OR
+(b) ours's parse_timeout scales the 16 ms relative timeout by /100
+    (exports.rs:4495 — `magnitude.max(1) / 100`), turning a 16 ms wall-clock
+    timeout into 1,600 emulator-ticks. This may differentially interact
+    with how often the semaphore gets a release between wait entries.
+
+The exit-branch-at-matching-iteration framing from the user's task spec
+does NOT apply here: there IS no exit-branch divergence (both never
+exit). The divergence is in the wait return value, which has no
+proximate guest-memory load. The "load feeding the predicate" is a
+kernel-state read (the semaphore count) performed inside the kernel
+import handler itself.
+
+## Most-recent kernel calls (tid=5 in ours, from S3 lr-trace
+data + S4 cross-check)
+
+Most-recent kernel calls before each wait at PC=0x82450B44 (re-wait
+site), on ours tid=5:
+
+- `NtReleaseSemaphore(handle=0x1050, count=1)` via wrapper
+  sub_824AB158, lr=0x82450CDC OR lr=0x82450D28 (both inside sub_82450B68
+  dispatch body) — self-release in the dispatch tail.
+- `KeSetEvent(handle=0x10xx)` via wrapper sub_824AA2F0 OR sub_824AAF50 —
+  γ-signaler family fires (the audit's original signaler PCs from S1/S3).
+- `KeQueryPerformanceCounter`-like via sub_824AA830 — used in budget
+  refresh path.
+
+In **canary**, the equivalent sequence per S1's signal-probe-correlated.log
+(180s window) is similar (γ-signalers fire 492× on tid=10), but the
+SELF-RELEASE rate matters more — that determines whether the consumer
+keeps seeing a non-zero semaphore.
+
+## S5 recommendation (refined)
+
+The right next step is **NOT** to walk further upstream in the
+γ-signaler chain (S3's lead). It is to **measure the per-branch flow
+inside `sub_82450B68` itself** — find which of its many branches
+release the semaphore and how that branch is selected.
+
+### Path A (RECOMMENDED, ~0 LOC, read-only)
+
+`--branch-probe` covering `sub_82450B68` body (PCs 0x82450B68 ..
+0x82451238, the dispatch body). Want to capture:
+
+1. Frequency at the two release sites `0x82450CDC` and `0x82450D28`
+   (per-call cumulative count on tid=5).
+2. Frequency at the OTHER exit sites in sub_82450B68 (e.g. the early
+   return at `0x82450EE8` which does NOT release).
+
+If ours's release-rate at CDC/D28 is significantly higher than canary's,
+that confirms (a). If similar, then (b) becomes the next theory.
+
+### Path B (~80 LOC ours-side probe, no source mod)
+
+Use `--branch-probe` on PCs inside `xenia_kernel::exports::parse_timeout`
+to confirm the magnitude/100 scaling actually causes the divergence.
+Actually this requires source instrumentation since parse_timeout is
+Rust, not guest code. Mid-priority.
+
+### Path C (~30 LOC canary diagnostic)
+
+Add canary cvar `audit_69_semaphore_count_probe = VA` that emits the
+post-Set count for the semaphore at native VA matching ours's
+[r31+92]'s underlying X_KSEMAPHORE. Compare per-iteration count
+progression canary-vs-ours.
+
+LOC budget for S5: Path A = 0, Path B = ~80, Path C = ~30.
+
+**Path A first** — narrows the divergence to specific sub_82450B68
+sub-branch behavior at zero LOC cost.
+
+## Cascade
+
+- **A** (disasm sub_82450A68): PASS (HIGH) — 80-instruction body,
+  3 BB-paths, 12 BB-entries identified.
+- **B** (ours per-iteration loop-branch trace): PASS (HIGH) —
+  91 back-edge captures, all r3=0x1.
+- **C** (canary same trace): PARTIAL (MEDIUM) — canary crashed at
+  4 iterations in vkd3d-proton on exit; 4 captures sufficient to surface
+  r3=0x102 dominance, but not a long-window comparison.
+- **D** (identify divergent load): PARTIAL (MEDIUM) — no guest-memory
+  load is the proximate cause; the divergence is in the kernel-side
+  semaphore-count state. The "load" is conceptually inside
+  `do_wait_multiple`'s read of `KernelObject::Semaphore.count`.
+
+Net 2/4 PASS-HIGH, 2/4 PARTIAL-MEDIUM. Methodology learned: when both
+engines stay in a loop, "which branch did ours take differently" is the
+WRONG question — ask "what's different at the SAME branch."
+
+## Confidence flags (summary)
+
+| finding | confidence |
+|---|---|
+| Both engines never take exit-branch (B50) | HIGH |
+| ours back-edge r3=1 always (91/91) | HIGH |
+| canary back-edge r3=0x102 mostly (3/4) | HIGH |
+| handle[1] is NtCreateSemaphore w/ InitialCount=0 | HIGH |
+| handle[0] is NtCreateEvent | HIGH |
+| Divergence is kernel-side semaphore-count state | MEDIUM-HIGH |
+| sub_82450B68 self-release over-fires in ours | MEDIUM |
+| parse_timeout /100 scaling is contributing | LOW-MEDIUM |
+
+## Discipline
+
+- xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED
+  (sha256 of `git diff HEAD` matches S1/S2/S3 end at session start AND end).
+- READ-ONLY ours. No source mod. `--branch-probe` / `--lr-trace` /
+  `--mem-watch` / `--trace-handles-focus` are runtime read-only flags
+  documented as "lockstep digest unaffected" (state.rs comments).
+- Canary `audit_61_branch_probe_pcs` cvar enabled with our PC set; set
+  back to "" at session end. Verified.
+- Canary `mute = true` set during run, restored to `false` at session end.
+- Canary cache wiped before cold canary run, restored from
+  `/tmp/canary-cache-bak-audit-068` at session end.
+
+## Artifacts
+
+```
+audit-runs/audit-069-wait-signal-producer/s4/
+  sub_82450A68-disasm.txt          (80 ins disasm: sub_82450A28 entry + body)
+  ours-loop-branch-trace.stdout    (696 BRANCH-PROBE records, ours-cold)
+  ours-loop-branch-trace.stderr    (empty under --quiet)
+  canary-loop-branch-trace.stdout  (1074 lines, 35 AUDIT-061-BR records)
+  canary-loop-branch-trace.stderr  (89 lines, wine/vkd3d setup + final fault)
+  ours-mem-watch.stderr            (2 MEM-WATCH records identifying handle slots)
+  ours-mem-watch.stdout            (empty)
+  ours-signaler.jsonl              (95 lr-trace records on wrapper PCs)
+  ours-handles.{stdout,stderr}     (probe for handle dump; --halt-on-deadlock didn't trigger)
+  ours-trace-handles-summary.log   (21 lines: focus startup + 8 ExCreateThread spawns)
+  divergence-analysis.md           (per-iter table, hypothesis, S5 leads)
+  writer-report-v4.md              (this file)
+```
+
+No canary instrumentation diff this session. No `fix-canary-s4.diff`.
+
+## Summary of S1 → S2 → S3 → S4 arc
+
+- **S1** (2026-05-20 AM): identified canary tid=10 as the signaler;
+  claimed ours lacks this thread (FALSIFIED by S2).
+- **S2** (2026-05-20 noon): spawn-chain runs identically on ours tid=5;
+  refined to "wrong-handle selection" downstream (REFINED by S3).
+- **S3** (2026-05-20 PM): ours runs identical PC/LR chain but with
+  ~5× fewer iterations. Producer-loop underrun classification.
+  Wedge handle never even created in ours's truncated boot.
+- **S4** (2026-05-20 evening): per-iteration branch-probe shows
+  **NEITHER engine ever exits the loop**. Divergence is in
+  `NtWaitForMultipleObjectsEx` return: ours r3=1 always (semaphore
+  acquired), canary r3=0x102 mostly (timeout). Root cause is
+  **semaphore-count state divergence** — ours's work-semaphore is
+  over-released relative to consume rate, OR ours's timeout never
+  fires before signal. Hypothesis: divergence inside `sub_82450B68`
+  dispatch body's self-release logic.
+
+The S5 question is no longer "which earlier kernel call differs" —
+it is "which sub-branch of `sub_82450B68` releases the semaphore in
+ours that canary's doesn't release in." Read-only branch-probe on
+sub_82450B68 body PCs.
--- a/audit-runs/audit-069-wait-signal-producer/writer-report-v5.md
+++ b/audit-runs/audit-069-wait-signal-producer/writer-report-v5.md
@@ -0,0 +1,122 @@
+# AUDIT-069 Session 5 — writer report (RECOVERED from captured data; agent timed out before authoring)
+
+Date: 2026-05-20.
+
+Status: The dispatched agent (`a9380b477f5cb4b3f`) ran ~50 min and timed out via API stream-idle error. The instrumentation, builds, and capture runs completed. The agent did NOT author the final analysis. This report is composed by the parent agent from the captured artifact files (canary-release-trace.log, ours-release-trace.jsonl, fix-canary-s5.diff).
+
+xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED. `sha256(git diff HEAD)` = `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357` UNCHANGED (matches S1-S4 end).
+
+## Canary handle identification
+
+Canary's work-semaphore: handle `0xF800003C` (single semaphore released across 414 events). Wrapper inside canary captures every release through `lr=0x824AB168` (the post-call PC inside `sub_824AB158`). To get the GUEST-side caller LR, S5 would need to probe at the wrapper-entry PC and capture the caller's LR; this was not done in this session.
+
+## Per-tid release counts
+
+### Canary (`canary-release-trace.log`, 414 events)
+
+| tid | count | role |
+|---:|---:|---|
+| 10 | 382 | worker (self-release inside dispatch fn) |
+| 18 | 14 | producer |
+| 17 | 9 | producer |
+| 6 | 7 | main thread |
+| 16 | 1 | producer |
+| 26 | 1 | producer |
+
+### Ours (`ours-release-trace.jsonl`, 99 events)
+
+| tid | count | role |
+|---:|---:|---|
+| 5 | 90 | worker (= canary tid=10 by entry/ctx identity) |
+| 1 | 8 | main thread (= canary tid=6) |
+| 13 | 1 | producer (the wedged thread) |
+
+## Per-LR release counts (ours only — canary lr field captured wrapper-internal addr, not useful)
+
+| ours lr | count | likely site |
+|---|---:|---|
+| 0x82450ce0 | 68 | inside sub_82450B68 dispatch fn (the dominant self-release) |
+| 0x82450d2c | 7 | second self-release in same fn |
+| 0x82450314 | 7 | sub_824502E0+0x34 (producer A) |
+| 0x8245ab70 | 7 | sub_8245ab40+0x30 (producer B) |
+| 0x824584cc | 4 | sub_82458480 area (producer C) |
+| 0x82458024 | 4 | sub_82458000 area (producer D) |
+| 0x824504c8 | 1 | sub_82450450+0x78 (producer E) |
+| 0x822f23ec | 1 | sub_822F23B0 area (main-thread producer F) |
+
+## Hypothesis verdict
+
+- **H1 (ours over-releases the work-semaphore)**: **FALSIFIED.** Ours releases 99 total vs canary 414 (24% of canary's rate). The worker self-release shows 90 in ours vs 382 in canary (24%). Ours does NOT over-release.
+
+- **H2 (canary processes a batch per iteration)**: **PARTIALLY SUPPORTED but insufficient.** Per-iteration rates (combining S4's iteration data):
+  - Canary: 4 iterations in 10s with 382 worker releases ≈ ~95 releases per iteration (HIGH variance, n=4 is too small)
+  - Ours: 91 iterations in ~60s with 90 worker releases ≈ 1 release per iteration
+
+  The per-iteration ratio is suggestive but the canary sample size remains too thin for a HIGH-confidence claim.
+
+- **H3 (new): SYSTEMIC under-production of work in ours.** Producer-tid releases:
+  - Canary: 32 events across 5 producer tids (16, 17, 18, 26 + main 6)
+  - Ours: 9 events across 2 producer tids (1, 13)
+
+  Ours has fewer producer threads contributing AND fewer events per producer. The bug isn't localized to a single fn or handle — it's distributed across the production-side of the work-queue. Ratio ~28%, consistent with the worker self-release ratio.
+
+## Reconciliation with S3
+
+S3 measured γ-signals: ours 81 / canary 492 (16%). S5 measures semaphore releases: ours 99 / canary 414 (24%). Same shape of disparity, slightly different ratio because the two events are at different points in the dispatch path. Both consistent with H3.
+
+## Confidence labels
+
+- Per-tid release counts (ours): HIGH (n=99 measured directly).
+- Per-tid release counts (canary): HIGH for the count itself (n=414 measured), MEDIUM for "which canary tid is the worker" (relies on S2's entry/ctx-identity mapping).
+- H1 falsification: HIGH.
+- H2 partial support: LOW (canary iteration data still n=4).
+- H3 (systemic under-production): MEDIUM-HIGH (consistent across two independent measurements — γ-signals from S3, releases from S5).
+
+## Methodology pattern note
+
+S1→S5 has been a sequence of progressively refined framings, each falsifying the prior:
+- S1: "spawn-layer bug" — falsified by S2.
+- S2: "wrong-handle queue" (per archive) — falsified by S3.
+- S3: "producer-loop underrun" — refined by S4 (it's not underrun, it's overrun per S4's branch-probe).
+- S4: "ours self-releases too much" → H1 — FALSIFIED by S5.
+- S5: H3 — "systemic under-production" — at least testable across multiple measurements, NOT yet a fix point.
+
+S5's H3 is not a localized bug. It says "ours's entire work-queue-producer ecosystem under-fires by ~24-28%". That's a symptom-description, not a root cause. The next session needs to identify WHICH producer fn fails to fire as often, and WHY.
+
+## S6 recommendation
+
+Given S5's H3, the next session should **identify the specific producer-tid divergence**, not continue investigating the dispatch fn. Compare:
+- Canary tid=18 (14 releases) vs ours's analog tid — does ours have an analog? Per-tid count divergence at the producer level.
+- Canary tid=17 (9 releases) — note: per S1, canary tid=17 is the thread that completes 16+ `sub_821CB030` calls (the wedge wait site). It contributes 9 work-semaphore releases as a producer. Ours's analog is tid=13 (the wedged thread, releases 1).
+
+**The wedge IS the producer divergence**: ours's tid=13 is wedged in `sub_821CB030+0x1AC` and can only release the semaphore 1× before blocking. Canary's tid=17 completes its loop and releases ~9×. So the system has been circular all along:
+- Worker (tid=5/10) needs work-items enqueued by producers.
+- One major producer is tid=13/17 (the cache thread).
+- tid=13 wedges in ours at sub_821CB030 because the worker doesn't process enough items to wake it.
+- Worker doesn't process enough items because tid=13 doesn't produce enough.
+
+This is **self-consistent with the AUDIT-049 framing**: the wedge is a producer-consumer ladder where one side can't progress without the other, and they share the work-semaphore at handle 0x1050.
+
+The TRUE first divergence point is upstream of all this: **whatever bootstraps the system so that tid=17 (canary's cache thread) completes its initial work cycle.** Canary's first releases at host_ns=6600 and 9503200 (tid=6 main) happen before tid=10 starts. Ours's tid=1 main also fires releases. The QUESTION: does ours's tid=1 release the right semaphore at the right host_ns?
+
+## S6 path
+
+Capture the **first N=20 release events on each engine, time-ordered**. Compare wallclock + tid + LR. Find the first event canary fires that ours does NOT fire (or vice versa). That's the bootstrap divergence.
+
+LOC: 0 ours, 0 canary (data already captured). Just analysis of the existing logs.
+
+## Cascade outcome
+
+- A (canary cvar implemented + captured): PASS HIGH
+- B (ours captured): PASS HIGH (existing --lr-trace)
+- C (cadence comparison): PASS MEDIUM (H1 falsified high-confidence; H2 partial-low; H3 medium-high)
+- D (root cause identified): N/A — narrowed but not pinpointed.
+
+3 PASS / 1 N/A.
+
+## Discipline
+
+- xenia-rs HEAD UNCHANGED.
+- Canary instrumentation 2 new files cvar-gated default-off (audit_70_semaphore_release_watch.h + .cc).
+- Canary cache will need restore from `/tmp/canary-cache-bak-audit-068` (agent timed out before doing so — manual cleanup needed).
+- `--mute=true` honored on canary runs.
--- a/audit-runs/audit-069-wait-signal-producer/writer-report.md
+++ b/audit-runs/audit-069-wait-signal-producer/writer-report.md
@@ -0,0 +1,271 @@
+# AUDIT-069 Session 1 — wait-signal producer identification
+
+Date: 2026-05-20
+Status: **LANDED — signaler tid + caller fns identified; AUDIT-066 circular framing FALSIFIED**
+
+## Headline
+
+The wait at `sub_821CB030+0x1AC` (PC `0x821CB1DC`) — the canonical
+AUDIT-049/065 wedge wait — fires in canary on two tids (worker tid=17 and
+cache-loader tid=26). Both wedges are signaled by **tid=10**, a worker
+thread spawned EARLY (via `sub_8244FF50` → `ExCreateThread(entry=sub_82450A28)`),
+NOT by any of the four workers spawned by `sub_825070F0`. This refutes
+AUDIT-066's circular framing ("γ-signaler running inside the 4 workers
+spawned by sub_825070F0"): the actual signaler reaches the production
+phase WITHOUT depending on sub_825070F0 firing.
+
+## Step 1 — wait site capture (canary)
+
+Probe: `--audit_61_branch_probe_pcs=0x821CB1DC --mute=true`, 180s cold.
+
+| tid | r3 (handle) | r4 (timeout) | r5 (wait_mode) | r6 (ctx) | r31 (stack) | lr |
+|----:|------------:|-------------:|---------------:|---------:|------------:|---:|
+| 17  | `F80000A4`  | `FFFFFFFF`   | `0` (auto)     | `BC65CEC0` | `7064FA70` | `0x821CB1D0` |
+| 26  | `F8000110`  | `FFFFFFFF`   | `0` (auto)     | `BC667F80` | `708FF990` | `0x821CB1D0` |
+
+Two distinct fires (one per logical caller). Both have r4=INFINITE timeout
+matching dossier. The lr=`0x821CB1D0` is `sub_821CB030+0x1A0` = the
+instruction AFTER the bl-wait — consistent with branch-probe firing at the
+basic-block-entry following the wait-call's return.
+
+Handle drift across cold runs is real: Step 1 vs Step 3 vs Step 4 trajectories
+produced wait handles `{F80000A0,F8000108}` / `{F80000A0,F8000108}` /
+`{F80000A4,F8000110}`. Per-run handles are still deterministic; the absolute
+ID is not.
+
+**Important framing correction**: The brief expected "~16 fires" per
+AUDIT-065. This was already partly retracted by AUDIT-066 (which observed
+that thid=17 "terminates via `ExTerminateThread(0)` WITHOUT ever calling
+Wait inside its cache loop"). Step 1 confirms AUDIT-066's correction:
+the wait at `+0x1AC` fires ~2× per boot (one for the work-queue load
+that ANON_Class_713383D7 work goes through; one for the cache-loader
+sister-flow). Not 16. The wait is the WORK-QUEUE wait, not a per-cache-file
+IO wait.
+
+Confidence: HIGH (probe fired, r3/r4/r5 match expected wait-call ABI,
+two distinct logical fires reproducible across cold runs).
+
+## Step 2 — instrumentation (canary, ~280 LOC additive)
+
+New `audit_69_*` cvars + slowpath module:
+- **cpu_flags.{h,cc}** (+23/+48 LOC, of which ~30 LOC are mine vs cumulative):
+  - `--audit_69_event_signal_watch` (CSV of guest handle IDs, max 4)
+  - `--audit_69_event_signal_native_ptr` (CSV of guest VAs, max 4)
+  - `--audit_69_log_all_sets` (bool — log EVERY XEvent::Set/Pulse fire)
+- **xenia-kernel/audit_69_event_signal_watch.h** (51 LOC) — fwd decls,
+  hot-path inline wrapper (single relaxed atomic load + branch).
+- **xenia-kernel/audit_69_event_signal_watch.cc** (193 LOC) — lazy parse +
+  UINT32_MAX sentinel + `XThread::TryGetCurrentThread()` for lr/tid capture.
+  Mirrors AUDIT-068's static-init gate pattern.
+- **xenia-kernel/xevent.cc** (+9 LOC) — hook at `XEvent::Set` and
+  `XEvent::Pulse` (the deepest convergence of Ke/Nt set + pulse paths).
+
+Reading-error registration: `XThread::GetCurrentThread()` asserts on host
+threads; first iteration used it and crashed. Fixed by switching to
+`TryGetCurrentThread()`. (Same lesson as AUDIT-067's bool-vs-pointer
+asymmetry but in a different fn.)
+
+Cumulative cross-run canary additions retained in tree (AUDIT-061/067/068/069).
+
+## Step 3 — correlated capture
+
+Run: cold, 180s, `--mute=true --audit_61_branch_probe_pcs=0x821CB1DC,0x824AA2F0,0x824AAF50 --audit_69_log_all_sets=true`.
+
+Volume: 122,165 log lines (Step 3) / 155,627 lines (Step 4 with wrapper probes).
+
+Wait fires (Step 4): 2 (tid=17, tid=26, as in Step 1 but with handle drift to F80000A4/F8000110).
+
+Signals on wedge handles (Step 4):
+
+| wedge handle (waited on) | wait tid | signal fires | signal lr | signaling fn | signal tid |
+|---|---|---|---|---|---|
+| `0xF80000A4` | 17 | **1** | `0x824AA304` | `sub_824AA2F0` (NtSetEvent wrapper) | **10** |
+| `0xF8000110` | 26 | **100** | `0x824AAFC8` | `sub_824AAF50` (a generic event-set-with-arg wrapper) | **10** |
+
+The 100 fires on F8000110 are repeats — auto-reset events fire on first
+signal; the rest are no-ops. Volume reflects how often the work-queue
+processes items targeting this synchronizer.
+
+## Step 4 — signaler-fn resolution (sylpheed.db cross-check)
+
+Wrapper-entry probe data for these two NtSet wrappers, filtered to tid=10:
+
+| wrapper | lr-of-caller | caller fn | tid=10 fire count |
+|---|---|---|---|
+| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DA44` | **`sub_8245D9D8`** (γ-signaler D-A per AUDIT-062) | 23 |
+| `sub_824AA2F0` (NtSetEvent wrapper) | `0x8245DB08` | **`sub_8245DA78`** (γ-signaler D-B per AUDIT-062) | 8 |
+| `sub_824AAF50` (Ke-style wrapper)   | `0x8245DC5C` | **`sub_8245DB40`** (NEW — not previously named) | 461 |
+
+`sub_824AAF50` disasm needs follow-up but lr=0x824AAFC8 = `sub_824AAF50+0x78`
+position is consistent with a `bl xeKeSetEvent` followed by status check
+in an N-arg helper. The wrapper takes `(handle, ptr, size)` and the
+internally-signaled event has a different handle from the input.
+
+Containing-fn cross-check (`sylpheed.db`):
+- `sub_8245D9D8` and `sub_8245DA78` are in the worker cluster
+  (0x82450000-0x8245C000). Per AUDIT-062: both are γ-signaler-D family,
+  hot from worker-side, missed by AUDIT-059/060 enumeration.
+- `sub_8245DB40` is in the same cluster; callers are `sub_824528A8+0x54`
+  and `sub_8245EE50+0x20` (both worker-cluster internal).
+- All three are reached from tid=10's body fn `sub_82450A68`, the
+  trampoline body for the entry `sub_82450A28` (which `ExCreateThread`
+  registers via `sub_8244FF50`).
+
+**tid=10 caller chain (canary)**:
+```
+sub_8244FEA8       (caller of sub_8244FF50; itself called from 11 sites)
+  → sub_8244FF50   (spawner — calls ExCreateThread w/ entry=sub_82450A28)
+                    → sub_82450A28  (thread-entry trampoline:
+                                     KeSetThreadPriority(-2, 3); bl sub_82450A68)
+                       → sub_82450A68  (worker dispatch loop)
+                         → ... γ-signalers D / DA78 / DB40
+```
+
+`sub_82450A28` is referenced as a data pointer at `0x8244FFF8` (inside
+`sub_8244FF50`). No call edges to it — it's purely a thread-entry data
+constant passed to ExCreateThread.
+
+## Step 5 — ours cross-reference
+
+All identified signaler fns (`sub_8245D9D8`, `sub_8245DA78`, `sub_8245DB40`,
+`sub_824AA2F0`, `sub_824AAF50`, `sub_82450A28`, `sub_8244FF50`) are GAME
+(XEX) code — not kernel-imports. In ours these execute under the JIT, with
+no host-side analog to compare. The relevant question is whether the
+trajectory in ours REACHES these PCs.
+
+Direct evidence from prior runs:
+
+**AUDIT-062 ours `--lr-trace=0x824aa2f0`** trace (`ours-ntset.jsonl`, 136
+fires across cold boot up to deadlock):
+- tid=6: 82 NtSet fires
+- tid=1: 28 fires
+- tid=5: 22 fires
+- tid=8: 2 fires
+- tid=13: 2 fires
+- **tid=10: 0 fires**
+
+ours NEVER spawns the canary-equivalent of tid=10 (the
+`sub_8244FF50/sub_82450A28/sub_82450A68` worker). This is consistent with
+AUDIT-057's "thread-gap" finding: ours has fewer threads than canary.
+
+Within ours, the γ-signalers DO fire — but on tid=5 (calling sub_824AA2F0
+from lr=`0x8245DA44` = `sub_8245D9D8+0x6C`) per AUDIT-062's
+`ours-ntset.jsonl:line 1`. AUDIT-062 already established these signal
+WRONG handles in ours (neighbors of `0x12AC` are signaled; the wedge
+handle itself is not).
+
+**Conclusion**: ours's signaler PCs exist and run, but on the wrong tids
+(no tid=10 equivalent), and target the wrong handles. The PRODUCER →
+SIGNALER chain in ours is structurally broken at the **thread-spawn**
+layer, not the kernel-import layer.
+
+Confidence (Step 5): MEDIUM-HIGH for the chain identification (data is
+internally consistent and matches AUDIT-062's prior independent capture).
+LOW on the ours-side resolution mechanism (this audit did not re-run
+ours; cross-ref is read-only against prior dumps which may be stale
+relative to current ours HEAD `e6d43a23…`).
+
+## AUDIT-066 framing refutation
+
+AUDIT-066 stated:
+
+> the producer-side signal for THAT event comes from a γ-signaler running
+> inside the 4 workers spawned by sub_825070F0 — per AUDIT-063's
+> static-reachability survey of NtSet wrapper callers.
+
+This is **falsified** by AUDIT-069 Step 3+4 evidence:
+
+1. The signaler runs on **tid=10**, spawned by `sub_8244FF50` via
+   `ExCreateThread(entry=sub_82450A28)`. This is NOT one of sub_825070F0's
+   4 workers.
+2. sub_8244FF50's caller chain does NOT require ANON_Class_713383D7's
+   vtable to be installed; it does NOT require sub_825070F0 to fire.
+3. The circular-bootstrap concern AUDIT-066 raised ("workers can't signal
+   until they spawn; they can't spawn until the wedge clears") was
+   structurally correct framing IF the signaler were inside the
+   sub_825070F0 4-worker family. Since the actual signaler is tid=10
+   (independently spawned), the circle is **broken** — the signaler IS
+   reachable without the wedge clearing.
+
+Reading-error class **#37**: static-reachability surveys (AUDIT-063 walked
+12 hops from sub_82452DC0 to NtSet wrapper callers) are scoped to a
+particular caller chain; they miss alternative producer paths reached via
+unrelated thread-spawn sites. Always probe at the runtime SIGNAL site to
+confirm which exact caller fired, not just which static path could fire.
+
+## Cascade outcome
+
+- **A** (capture wait site PC + r3=handle in canary): **PASS**. PC
+  `0x821CB1DC`, r3 captures the handle on first fire reproducibly.
+- **B** (capture signal fires on the wait targets): **PASS**. 1 fire on
+  F80000A4 (wedge handle 1), 100 fires on F8000110 (wedge handle 2).
+- **C** (resolve signaling fn + immediate caller fn): **PASS**.
+  `sub_824AA2F0` ← `sub_8245D9D8` / `sub_8245DA78` (γ-signaler D family);
+  `sub_824AAF50` ← `sub_8245DB40` (new). All on tid=10.
+- **D** (ours-side cross-ref): **PARTIAL**. tid=10 IS missing in ours
+  per existing AUDIT-062 data; γ-signalers DO fire but on wrong tids.
+  Did not re-run ours in this session (per task discipline; cross-ref
+  read-only against prior dumps).
+
+Net 3/4 PASS, 1/4 PARTIAL.
+
+## Discipline
+
+- xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED.
+  `git diff HEAD | sha256sum` at session start =
+  `ed30fd526643918f67311caff0a10d1346d73fd0c0323e02477883cf5ff20357`
+  and at session end IDENTICAL.
+- Canary patch is purely additive, cvar-gated default-off, UINT32_MAX
+  sentinel + std::once parse pattern (per AUDIT-068 discipline).
+- Every canary run used `--mute=true`.
+- Cache wiped before each cold run (4 cold runs total: Step 1 90s,
+  Step 1 180s rerun, Step 3 with handle watch, Step 3 with log_all_sets,
+  Step 4 with wrapper probes). Each cache moved to `/tmp/_audit_069_step*`
+  before next cold run.
+- Cache restoration from `/tmp/canary-cache-bak-audit-068` deferred to
+  session end (done after this report).
+
+## Artifacts
+
+```
+xenia-rs/audit-runs/audit-069-wait-signal-producer/
+  step1-wait-probe.log               (90s baseline; 2 wait fires)
+  step1-wait-probe.stdout
+  step1-wait-probe-180s.log          (180s rerun; 2 wait fires)
+  step1-wait-probe-180s.stdout
+  step3-signal-probe.log             (180s; first signal-watch test;
+                                      handles drifted, partial correlation)
+  step3-signal-probe.stdout
+  step3-correlated.log               (180s; log_all_sets; 120k signal fires)
+  step3-correlated.stdout
+  step4-wrapper-callers.log          (180s; log_all_sets + wrapper entries;
+                                      155k events; correlated lr-to-caller)
+  step4-wrapper-callers.stdout
+  fix-canary.diff                    (cumulative canary diff vs 6de80dffe)
+  writer-report.md                   (this file)
+```
+
+## Session 2 recommendation
+
+Two paths, both <100 LOC ours-side:
+
+**Path 1 (ours read-only probe + targeted root-cause)**: re-run ours with
+`--ctor-probe=0x82450A28` (the canary-tid=10 entry) — confirm it never
+fires. Then `--ctor-probe=0x8244FF50` (the spawner). If sub_8244FF50 also
+never fires, walk up its 11 callers in sylpheed.db — likely one of them
+gates on a flag/event that's not set in ours's early-boot trajectory.
+
+**Path 2 (canary additional capture)**: probe canary's tid=10 spawn
+sequence in detail. Add `audit_69_thread_spawn_watch` cvar that logs
+every ExCreateThread call with (entry_pc, ctx, suspend_flag, caller_lr).
+~40 LOC. Compare to ours's spawn list — find which call goes
+unfired in ours.
+
+Both paths are cheaper than continuing on the wedge directly. Path 1 is
+preferred: it stays on the ours side which is the failing engine.
+
+Predicted Session 2 cascade:
+- A (find sub_82450A28's first-non-fire ancestor in ours): 75-85%
+- B (identify the missing precondition for that ancestor): 50-60%
+- C (fix LOC in ours ≤ 50): 30-40%
+- D (draws>0): 15-25% (single wedge unlock)