# AUDIT-069 Session 4 — divergence analysis Date: 2026-05-20 xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED) ## Headline (HIGH confidence — direct per-iteration measurement) The S3 framing of "producer-loop underrun" was directionally right but mis-located the divergence. The loop in `sub_82450A68` **does not take an early-exit branch in either engine** — neither ours nor canary ever reaches `0x82450B50` (the exit path). Both stay in the loop indefinitely. The divergence is **WHAT the NtWaitForMultipleObjectsEx call returns at each iteration**: - **Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.** - **Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.** This refines the producer-loop classification: it is NOT loop-underrun (both engines's loops run continuously). It is a **semaphore-state divergence** — ours's work semaphore is over-released or never properly drained; canary's drains correctly and the wait times out per 16ms tick. ## Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt) ``` 0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68) 0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg) 0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours), [r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours). 0x82450A98: bl 0x824AB240 ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout 0x82450A9C-A0: cmplwi/beq cr6, r3, 0 → 0x82450B50 [EXIT-WAIT1: r3==0 → exit (stop signaled)] 0x82450AA4-A8: li r29,0; li r28,4 [FIRST-ITER body entry] 0x82450AAC: lwz r11, 212(r31) [BACK-EDGE TARGET; reads "fast-path flag"] 0x82450AB0-BC: cntlzw / extrwi / cmplwi / bne cr6, 0xAC8 [BR-A: flag@212!=0 → search path] 0x82450AC0-C4: li r4,5; b 0xB2C [BR-B: flag@212==0 → direct dispatch w/ r4=5] 0x82450AC8-CC: mr r30,r29; addi r11,r31,112 [search-path setup] 0x82450AD0-E0: lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8 [BR-C: candidate found] 0x82450AE4-F0: addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0 [BR-D: search continue] 0x82450AF4: b 0xB34 [BR-E: search exhausted → skip dispatch, re-wait] 0x82450AF8: lwz r11, 224(r31) [budget check] 0x82450AFC-00: cmplwi cr6, r11, 0; beq cr6, 0xB28 [BR-F: budget@224==0 → skip refresh] 0x82450B04-0C: lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28 [BR-G: budget cmp] 0x82450B10: bl 0x824AA830 [KeQueryPerformanceCounter; sub_824AA830] 0x82450B14-1C: lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34 [BR-H: budget exceeded → re-wait] 0x82450B20-24: stw r28, 220(r31); stw r29, 224(r31) 0x82450B28: mr r4, r30 0x82450B2C-30: mr r3, r31; bl 0x82450B68 [DISPATCH: calls γ-signaler family] 0x82450B34-44: li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240 [RE-WAIT] 0x82450B48-4C: cmplwi cr6, r3, 0; bne cr6, 0x82450AAC [BACK-EDGE: r3!=0 → loop] 0x82450B50-58: li r3,0; addi r1,r1,128; b 0x825F0FD8 [EXIT path] ``` ## Handle slots (ours, mem-watch confirmed) ``` [r31+88] = [0x828F3BC0] written at PC 0x8244FFB0 from NtCreateEvent → ours handle 0x104C [r31+92] = [0x828F3BC4] written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050 ``` Created in `sub_8244FF50` (the spawn helper) BEFORE ExCreateThread: - handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0) - handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF) This is a **stop-event + work-semaphore** pattern, NOT two events. NtWaitForMultipleObjectsEx with WaitAny: - r3 = WAIT_OBJECT_0 = 0 → handle[0] (stop event) signaled → EXIT - r3 = WAIT_OBJECT_0+1 = 1 → handle[1] (semaphore) acquired (decremented) → DO WORK - r3 = WAIT_TIMEOUT = 0x102 → 16ms elapsed with no signal → continue (poll) ## Per-PC iteration counts (HIGH confidence, direct branch-probe) | PC | path | ours fires | canary fires | ratio | |---|---|---:|---:|---:| | 0x82450AA4 | FIRST-ITER entry | 1 | 1 | 1× | | 0x82450AAC | BACK-EDGE target | 91 | 4 | (canary crashed early) | | 0x82450AC0 | BR-B: flag@212==0 direct-dispatch r4=5 | 2 | 0 | — | | 0x82450AC8 | BR-A: flag@212!=0 search path | 90 | 4 | — | | 0x82450AE4 | inner-search continue | 72 | 17 | — | | 0x82450AF4 | BR-E: search exhausted | 8 | 3 | — | | 0x82450AF8 | BR-C: candidate found | 82 | 1 | — | | 0x82450B04 | BR-F: budget skip | 81 | 0 | — | | 0x82450B10 | budget refresh (KeQuery) | 8 | 0 | — | | 0x82450B28 | dispatch entry (r4=r30) | 74 | 1 | — | | 0x82450B34 | re-wait entry | 92 | 4 | — | | **0x82450B50** | **EXIT path** | **0** | **0** | **never exits** | Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on exit. The relevant signal is in the **r3 distribution at the back-edge**, not the absolute counts. ## r3 distribution at the back-edge (HIGH confidence) ### Ours (91 captures at PC=0x82450AAC, lr=0x82450B48) ``` r3=0x00000001 × 91/91 (100%) r3=0x00000102 × 0/91 (0%) ``` ### Canary (4 captures at PC=0x82450AAC, lr=0x82450B48) ``` r3=0x00000001 × 1/4 (25%) r3=0x00000102 × 3/4 (75%) ``` Pattern visible in canary trace: first re-wait returns 0x1 (work available immediately), subsequent re-waits return 0x102 (timeout). ## The divergent guest-memory location The "divergent load" the user's framing predicted (a guest load reading some flag whose value differs ours-vs-canary) is **the wait return value, computed inside the kernel** — not a guest-memory load. The return r3 comes from `NtWaitForMultipleObjectsEx` (a kernel import). The kernel-side state that differs is the **WORK SEMAPHORE COUNT**: - Ours: count > 0 at every wait → wait succeeds (decrement, r3=1) - Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102) The semaphore count is influenced by: - `NtReleaseSemaphore(handle[1], 1)` calls (increments count by 1) - `NtWaitForMultipleObjectsEx` success on handle[1] (decrements by 1) So either: - (a) ours's NtReleaseSemaphore is called more aggressively than canary's - (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug) - (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug) - (d) ours's NtReleaseSemaphore over-releases (kind-extra count) ## NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs) ``` sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites), sub_822f2328, sub_823dd770, sub_823dd838, sub_823de4b8 (×3), sub_823df320, sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2) sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8) sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28) sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0, sub_8245AAF0, sub_8245ABD8, sub_8245AD00 ``` The most-suspicious sites for this audit are the three in the dispatch-loop module: `sub_82450218`, `sub_824503A0`, and the self-release in `sub_82450B68`. ## Most-recent kernel calls before the divergent load (ours tid=5) The "divergent load" is the kernel-side return of `NtWaitForMultipleObjectsEx`. No guest-memory load is the proximate cause. Most-recent kernel calls before each wait on ours tid=5 (from S3's ours-lr-trace data): - `sub_824AB158` ↔ `NtReleaseSemaphore` (via wrapper) - `sub_824AA2F0` ↔ `NtSetEvent` - `sub_824AAF50` ↔ `KeSetEvent`-style with ptr+size args - `sub_824AA830` ↔ `KeQueryPerformanceCounter`-like - `sub_824AB240` ↔ `NtWaitForMultipleObjectsEx` itself ## Hypothesis (MEDIUM-HIGH confidence) The semaphore is being **over-released** in ours. Specifically, one of the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8, or any of the 22 other release-call sites) is firing release more often than the dispatch loop is consuming work — OR — ours's wait kernel handler in `xenia-kernel/src/exports.rs` is not atomically decrementing the semaphore count on WAIT_OBJECT_0+N. Ranked S5 leads: 1. **Audit ours's `NtWaitForMultipleObjectsEx` handler implementation**: does it decrement the semaphore on success? (Likely yes — would regress many things otherwise. Test with a small probe.) 2. **Probe `NtReleaseSemaphore` call rate on handle 0x1050** in ours. Compare to canary on equivalent handle (some F8000xxx in canary). Hypothesis: ours releases more often per dispatch. 3. **Cross-check the canary equivalent handle**: canary uses `XSemaphore::native_object()` pseudo-handle for handle[1]. Use `audit_69_event_signal_watch` extension (or grep S1's `signal-probe-correlated.log` for KeReleaseSemaphore + the relevant ptr) to identify canary's semaphore handle ID, then run the same probe. ## Classification NOT a loop-exit-branch divergence (neither engine exits). NOT a missing-thread / missing-spawn divergence (S2 closed that). NOT a wrong-handle-selection divergence (S3 confirmed args match). It IS a **semaphore-state divergence**: ours's NtWaitForMultipleObjects keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in ours; zero in canary. ## Confidence flags | finding | confidence | reasoning | |---|---|---| | both loops never exit (B50 never fires) | HIGH | direct measurement | | ours r3=1 always at back-edge | HIGH | 91/91 captures direct measurement | | canary r3=0x102 mostly at back-edge | HIGH | 3/4 captures direct measurement | | handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF | HIGH | mem-watch + disasm confirmed | | handle[0] is NtCreateEvent | HIGH | disasm confirmed at 0x824A9F18 | | ours handle slot values 0x104C, 0x1050 | HIGH | mem-watch confirmed | | no exit-branch divergence in matching iter | HIGH | exit branch never taken in either | | semaphore-state divergence root cause | MEDIUM-HIGH | r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init) | | S5 path-1 (NtWaitForMultiple decrement bug) | MEDIUM | most likely culprit given kernel-side state divergence pattern, but other hypotheses still open |