Files
xenia-rs/audit-runs/audit-069-wait-signal-producer/s4/divergence-analysis.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

210 lines
10 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# AUDIT-069 Session 4 — divergence analysis
Date: 2026-05-20
xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED)
## Headline (HIGH confidence — direct per-iteration measurement)
The S3 framing of "producer-loop underrun" was directionally right but
mis-located the divergence. The loop in `sub_82450A68` **does not take
an early-exit branch in either engine** — neither ours nor canary ever
reaches `0x82450B50` (the exit path). Both stay in the loop indefinitely.
The divergence is **WHAT the NtWaitForMultipleObjectsEx call returns at
each iteration**:
- **Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.**
- **Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.**
This refines the producer-loop classification: it is NOT loop-underrun
(both engines's loops run continuously). It is a **semaphore-state
divergence** — ours's work semaphore is over-released or never properly
drained; canary's drains correctly and the wait times out per 16ms tick.
## Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt)
```
0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68)
0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg)
0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours),
[r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours).
0x82450A98: bl 0x824AB240 ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout
0x82450A9C-A0: cmplwi/beq cr6, r3, 0 → 0x82450B50 [EXIT-WAIT1: r3==0 → exit (stop signaled)]
0x82450AA4-A8: li r29,0; li r28,4 [FIRST-ITER body entry]
0x82450AAC: lwz r11, 212(r31) [BACK-EDGE TARGET; reads "fast-path flag"]
0x82450AB0-BC: cntlzw / extrwi / cmplwi / bne cr6, 0xAC8 [BR-A: flag@212!=0 → search path]
0x82450AC0-C4: li r4,5; b 0xB2C [BR-B: flag@212==0 → direct dispatch w/ r4=5]
0x82450AC8-CC: mr r30,r29; addi r11,r31,112 [search-path setup]
0x82450AD0-E0: lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8 [BR-C: candidate found]
0x82450AE4-F0: addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0 [BR-D: search continue]
0x82450AF4: b 0xB34 [BR-E: search exhausted → skip dispatch, re-wait]
0x82450AF8: lwz r11, 224(r31) [budget check]
0x82450AFC-00: cmplwi cr6, r11, 0; beq cr6, 0xB28 [BR-F: budget@224==0 → skip refresh]
0x82450B04-0C: lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28 [BR-G: budget cmp]
0x82450B10: bl 0x824AA830 [KeQueryPerformanceCounter; sub_824AA830]
0x82450B14-1C: lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34 [BR-H: budget exceeded → re-wait]
0x82450B20-24: stw r28, 220(r31); stw r29, 224(r31)
0x82450B28: mr r4, r30
0x82450B2C-30: mr r3, r31; bl 0x82450B68 [DISPATCH: calls γ-signaler family]
0x82450B34-44: li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240 [RE-WAIT]
0x82450B48-4C: cmplwi cr6, r3, 0; bne cr6, 0x82450AAC [BACK-EDGE: r3!=0 → loop]
0x82450B50-58: li r3,0; addi r1,r1,128; b 0x825F0FD8 [EXIT path]
```
## Handle slots (ours, mem-watch confirmed)
```
[r31+88] = [0x828F3BC0] written at PC 0x8244FFB0 from NtCreateEvent → ours handle 0x104C
[r31+92] = [0x828F3BC4] written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050
```
Created in `sub_8244FF50` (the spawn helper) BEFORE ExCreateThread:
- handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0)
- handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)
This is a **stop-event + work-semaphore** pattern, NOT two events.
NtWaitForMultipleObjectsEx with WaitAny:
- r3 = WAIT_OBJECT_0 = 0 → handle[0] (stop event) signaled → EXIT
- r3 = WAIT_OBJECT_0+1 = 1 → handle[1] (semaphore) acquired (decremented) → DO WORK
- r3 = WAIT_TIMEOUT = 0x102 → 16ms elapsed with no signal → continue (poll)
## Per-PC iteration counts (HIGH confidence, direct branch-probe)
| PC | path | ours fires | canary fires | ratio |
|---|---|---:|---:|---:|
| 0x82450AA4 | FIRST-ITER entry | 1 | 1 | 1× |
| 0x82450AAC | BACK-EDGE target | 91 | 4 | (canary crashed early) |
| 0x82450AC0 | BR-B: flag@212==0 direct-dispatch r4=5 | 2 | 0 | — |
| 0x82450AC8 | BR-A: flag@212!=0 search path | 90 | 4 | — |
| 0x82450AE4 | inner-search continue | 72 | 17 | — |
| 0x82450AF4 | BR-E: search exhausted | 8 | 3 | — |
| 0x82450AF8 | BR-C: candidate found | 82 | 1 | — |
| 0x82450B04 | BR-F: budget skip | 81 | 0 | — |
| 0x82450B10 | budget refresh (KeQuery) | 8 | 0 | — |
| 0x82450B28 | dispatch entry (r4=r30) | 74 | 1 | — |
| 0x82450B34 | re-wait entry | 92 | 4 | — |
| **0x82450B50** | **EXIT path** | **0** | **0** | **never exits** |
Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on
exit. The relevant signal is in the **r3 distribution at the back-edge**,
not the absolute counts.
## r3 distribution at the back-edge (HIGH confidence)
### Ours (91 captures at PC=0x82450AAC, lr=0x82450B48)
```
r3=0x00000001 × 91/91 (100%)
r3=0x00000102 × 0/91 (0%)
```
### Canary (4 captures at PC=0x82450AAC, lr=0x82450B48)
```
r3=0x00000001 × 1/4 (25%)
r3=0x00000102 × 3/4 (75%)
```
Pattern visible in canary trace: first re-wait returns 0x1 (work
available immediately), subsequent re-waits return 0x102 (timeout).
## The divergent guest-memory location
The "divergent load" the user's framing predicted (a guest load reading
some flag whose value differs ours-vs-canary) is **the wait return
value, computed inside the kernel** — not a guest-memory load. The
return r3 comes from `NtWaitForMultipleObjectsEx` (a kernel import).
The kernel-side state that differs is the **WORK SEMAPHORE COUNT**:
- Ours: count > 0 at every wait → wait succeeds (decrement, r3=1)
- Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102)
The semaphore count is influenced by:
- `NtReleaseSemaphore(handle[1], 1)` calls (increments count by 1)
- `NtWaitForMultipleObjectsEx` success on handle[1] (decrements by 1)
So either:
- (a) ours's NtReleaseSemaphore is called more aggressively than canary's
- (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug)
- (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug)
- (d) ours's NtReleaseSemaphore over-releases (kind-extra count)
## NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs)
```
sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites),
sub_822f2328,
sub_823dd770, sub_823dd838, sub_823de4b8 (×3),
sub_823df320,
sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2)
sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8)
sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28)
sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0,
sub_8245AAF0, sub_8245ABD8, sub_8245AD00
```
The most-suspicious sites for this audit are the three in the
dispatch-loop module: `sub_82450218`, `sub_824503A0`, and the
self-release in `sub_82450B68`.
## Most-recent kernel calls before the divergent load (ours tid=5)
The "divergent load" is the kernel-side return of `NtWaitForMultipleObjectsEx`.
No guest-memory load is the proximate cause. Most-recent kernel calls
before each wait on ours tid=5 (from S3's ours-lr-trace data):
- `sub_824AB158``NtReleaseSemaphore` (via wrapper)
- `sub_824AA2F0``NtSetEvent`
- `sub_824AAF50``KeSetEvent`-style with ptr+size args
- `sub_824AA830``KeQueryPerformanceCounter`-like
- `sub_824AB240``NtWaitForMultipleObjectsEx` itself
## Hypothesis (MEDIUM-HIGH confidence)
The semaphore is being **over-released** in ours. Specifically, one of
the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8,
or any of the 22 other release-call sites) is firing release more often
than the dispatch loop is consuming work — OR — ours's wait kernel
handler in `xenia-kernel/src/exports.rs` is not atomically decrementing
the semaphore count on WAIT_OBJECT_0+N.
Ranked S5 leads:
1. **Audit ours's `NtWaitForMultipleObjectsEx` handler implementation**:
does it decrement the semaphore on success? (Likely yes — would
regress many things otherwise. Test with a small probe.)
2. **Probe `NtReleaseSemaphore` call rate on handle 0x1050** in ours.
Compare to canary on equivalent handle (some F8000xxx in canary).
Hypothesis: ours releases more often per dispatch.
3. **Cross-check the canary equivalent handle**: canary uses
`XSemaphore::native_object()` pseudo-handle for handle[1]. Use
`audit_69_event_signal_watch` extension (or grep S1's
`signal-probe-correlated.log` for KeReleaseSemaphore + the relevant
ptr) to identify canary's semaphore handle ID, then run the same probe.
## Classification
NOT a loop-exit-branch divergence (neither engine exits).
NOT a missing-thread / missing-spawn divergence (S2 closed that).
NOT a wrong-handle-selection divergence (S3 confirmed args match).
It IS a **semaphore-state divergence**: ours's NtWaitForMultipleObjects
keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's
returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in
ours; zero in canary.
## Confidence flags
| finding | confidence | reasoning |
|---|---|---|
| both loops never exit (B50 never fires) | HIGH | direct measurement |
| ours r3=1 always at back-edge | HIGH | 91/91 captures direct measurement |
| canary r3=0x102 mostly at back-edge | HIGH | 3/4 captures direct measurement |
| handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF | HIGH | mem-watch + disasm confirmed |
| handle[0] is NtCreateEvent | HIGH | disasm confirmed at 0x824A9F18 |
| ours handle slot values 0x104C, 0x1050 | HIGH | mem-watch confirmed |
| no exit-branch divergence in matching iter | HIGH | exit branch never taken in either |
| semaphore-state divergence root cause | MEDIUM-HIGH | r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init) |
| S5 path-1 (NtWaitForMultiple decrement bug) | MEDIUM | most likely culprit given kernel-side state divergence pattern, but other hypotheses still open |