Files
xenia-rs/audit-runs/audit-069-wait-signal-producer/s4/divergence-analysis.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

10 KiB
Raw Blame History

AUDIT-069 Session 4 — divergence analysis

Date: 2026-05-20 xenia-rs HEAD: e6d43a23ac393004d2e5adf2f0395fd0b5e6448b (UNCHANGED)

Headline (HIGH confidence — direct per-iteration measurement)

The S3 framing of "producer-loop underrun" was directionally right but mis-located the divergence. The loop in sub_82450A68 does not take an early-exit branch in either engine — neither ours nor canary ever reaches 0x82450B50 (the exit path). Both stay in the loop indefinitely.

The divergence is WHAT the NtWaitForMultipleObjectsEx call returns at each iteration:

  • Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.
  • Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.

This refines the producer-loop classification: it is NOT loop-underrun (both engines's loops run continuously). It is a semaphore-state divergence — ours's work semaphore is over-released or never properly drained; canary's drains correctly and the wait times out per 16ms tick.

Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt)

0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68)
0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg)
0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours),
                                  [r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours).
0x82450A98:  bl 0x824AB240  ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout
0x82450A9C-A0:  cmplwi/beq cr6, r3, 0  → 0x82450B50  [EXIT-WAIT1: r3==0 → exit (stop signaled)]
0x82450AA4-A8:  li r29,0; li r28,4    [FIRST-ITER body entry]
0x82450AAC:  lwz r11, 212(r31)          [BACK-EDGE TARGET; reads "fast-path flag"]
0x82450AB0-BC:  cntlzw / extrwi / cmplwi / bne cr6, 0xAC8  [BR-A: flag@212!=0 → search path]
0x82450AC0-C4:  li r4,5; b 0xB2C  [BR-B: flag@212==0 → direct dispatch w/ r4=5]
0x82450AC8-CC:  mr r30,r29; addi r11,r31,112  [search-path setup]
0x82450AD0-E0:  lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8  [BR-C: candidate found]
0x82450AE4-F0:  addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0  [BR-D: search continue]
0x82450AF4:  b 0xB34  [BR-E: search exhausted → skip dispatch, re-wait]
0x82450AF8:  lwz r11, 224(r31)  [budget check]
0x82450AFC-00:  cmplwi cr6, r11, 0; beq cr6, 0xB28  [BR-F: budget@224==0 → skip refresh]
0x82450B04-0C:  lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28  [BR-G: budget cmp]
0x82450B10:  bl 0x824AA830  [KeQueryPerformanceCounter; sub_824AA830]
0x82450B14-1C:  lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34  [BR-H: budget exceeded → re-wait]
0x82450B20-24:  stw r28, 220(r31); stw r29, 224(r31)
0x82450B28:  mr r4, r30
0x82450B2C-30:  mr r3, r31; bl 0x82450B68  [DISPATCH: calls γ-signaler family]
0x82450B34-44:  li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240  [RE-WAIT]
0x82450B48-4C:  cmplwi cr6, r3, 0; bne cr6, 0x82450AAC  [BACK-EDGE: r3!=0 → loop]
0x82450B50-58:  li r3,0; addi r1,r1,128; b 0x825F0FD8  [EXIT path]

Handle slots (ours, mem-watch confirmed)

[r31+88] = [0x828F3BC0]  written at PC 0x8244FFB0 from NtCreateEvent  → ours handle 0x104C
[r31+92] = [0x828F3BC4]  written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050

Created in sub_8244FF50 (the spawn helper) BEFORE ExCreateThread:

  • handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0)
  • handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)

This is a stop-event + work-semaphore pattern, NOT two events. NtWaitForMultipleObjectsEx with WaitAny:

  • r3 = WAIT_OBJECT_0 = 0 → handle[0] (stop event) signaled → EXIT
  • r3 = WAIT_OBJECT_0+1 = 1 → handle[1] (semaphore) acquired (decremented) → DO WORK
  • r3 = WAIT_TIMEOUT = 0x102 → 16ms elapsed with no signal → continue (poll)

Per-PC iteration counts (HIGH confidence, direct branch-probe)

PC path ours fires canary fires ratio
0x82450AA4 FIRST-ITER entry 1 1 1×
0x82450AAC BACK-EDGE target 91 4 (canary crashed early)
0x82450AC0 BR-B: flag@212==0 direct-dispatch r4=5 2 0
0x82450AC8 BR-A: flag@212!=0 search path 90 4
0x82450AE4 inner-search continue 72 17
0x82450AF4 BR-E: search exhausted 8 3
0x82450AF8 BR-C: candidate found 82 1
0x82450B04 BR-F: budget skip 81 0
0x82450B10 budget refresh (KeQuery) 8 0
0x82450B28 dispatch entry (r4=r30) 74 1
0x82450B34 re-wait entry 92 4
0x82450B50 EXIT path 0 0 never exits

Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on exit. The relevant signal is in the r3 distribution at the back-edge, not the absolute counts.

r3 distribution at the back-edge (HIGH confidence)

Ours (91 captures at PC=0x82450AAC, lr=0x82450B48)

r3=0x00000001 × 91/91 (100%)
r3=0x00000102 ×  0/91 (0%)

Canary (4 captures at PC=0x82450AAC, lr=0x82450B48)

r3=0x00000001 × 1/4 (25%)
r3=0x00000102 × 3/4 (75%)

Pattern visible in canary trace: first re-wait returns 0x1 (work available immediately), subsequent re-waits return 0x102 (timeout).

The divergent guest-memory location

The "divergent load" the user's framing predicted (a guest load reading some flag whose value differs ours-vs-canary) is the wait return value, computed inside the kernel — not a guest-memory load. The return r3 comes from NtWaitForMultipleObjectsEx (a kernel import).

The kernel-side state that differs is the WORK SEMAPHORE COUNT:

  • Ours: count > 0 at every wait → wait succeeds (decrement, r3=1)
  • Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102)

The semaphore count is influenced by:

  • NtReleaseSemaphore(handle[1], 1) calls (increments count by 1)
  • NtWaitForMultipleObjectsEx success on handle[1] (decrements by 1)

So either:

  • (a) ours's NtReleaseSemaphore is called more aggressively than canary's
  • (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug)
  • (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug)
  • (d) ours's NtReleaseSemaphore over-releases (kind-extra count)

NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs)

sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites),
sub_822f2328,
sub_823dd770, sub_823dd838, sub_823de4b8 (×3),
sub_823df320,
sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2)
sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8)
sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28)
sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0,
sub_8245AAF0, sub_8245ABD8, sub_8245AD00

The most-suspicious sites for this audit are the three in the dispatch-loop module: sub_82450218, sub_824503A0, and the self-release in sub_82450B68.

Most-recent kernel calls before the divergent load (ours tid=5)

The "divergent load" is the kernel-side return of NtWaitForMultipleObjectsEx. No guest-memory load is the proximate cause. Most-recent kernel calls before each wait on ours tid=5 (from S3's ours-lr-trace data):

  • sub_824AB158NtReleaseSemaphore (via wrapper)
  • sub_824AA2F0NtSetEvent
  • sub_824AAF50KeSetEvent-style with ptr+size args
  • sub_824AA830KeQueryPerformanceCounter-like
  • sub_824AB240NtWaitForMultipleObjectsEx itself

Hypothesis (MEDIUM-HIGH confidence)

The semaphore is being over-released in ours. Specifically, one of the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8, or any of the 22 other release-call sites) is firing release more often than the dispatch loop is consuming work — OR — ours's wait kernel handler in xenia-kernel/src/exports.rs is not atomically decrementing the semaphore count on WAIT_OBJECT_0+N.

Ranked S5 leads:

  1. Audit ours's NtWaitForMultipleObjectsEx handler implementation: does it decrement the semaphore on success? (Likely yes — would regress many things otherwise. Test with a small probe.)
  2. Probe NtReleaseSemaphore call rate on handle 0x1050 in ours. Compare to canary on equivalent handle (some F8000xxx in canary). Hypothesis: ours releases more often per dispatch.
  3. Cross-check the canary equivalent handle: canary uses XSemaphore::native_object() pseudo-handle for handle[1]. Use audit_69_event_signal_watch extension (or grep S1's signal-probe-correlated.log for KeReleaseSemaphore + the relevant ptr) to identify canary's semaphore handle ID, then run the same probe.

Classification

NOT a loop-exit-branch divergence (neither engine exits). NOT a missing-thread / missing-spawn divergence (S2 closed that). NOT a wrong-handle-selection divergence (S3 confirmed args match).

It IS a semaphore-state divergence: ours's NtWaitForMultipleObjects keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in ours; zero in canary.

Confidence flags

finding confidence reasoning
both loops never exit (B50 never fires) HIGH direct measurement
ours r3=1 always at back-edge HIGH 91/91 captures direct measurement
canary r3=0x102 mostly at back-edge HIGH 3/4 captures direct measurement
handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF HIGH mem-watch + disasm confirmed
handle[0] is NtCreateEvent HIGH disasm confirmed at 0x824A9F18
ours handle slot values 0x104C, 0x1050 HIGH mem-watch confirmed
no exit-branch divergence in matching iter HIGH exit branch never taken in either
semaphore-state divergence root cause MEDIUM-HIGH r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init)
S5 path-1 (NtWaitForMultiple decrement bug) MEDIUM most likely culprit given kernel-side state divergence pattern, but other hypotheses still open