handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,209 @@
# AUDIT-069 Session 4 — divergence analysis
Date: 2026-05-20
xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED)
## Headline (HIGH confidence — direct per-iteration measurement)
The S3 framing of "producer-loop underrun" was directionally right but
mis-located the divergence. The loop in `sub_82450A68` **does not take
an early-exit branch in either engine** — neither ours nor canary ever
reaches `0x82450B50` (the exit path). Both stay in the loop indefinitely.
The divergence is **WHAT the NtWaitForMultipleObjectsEx call returns at
each iteration**:
- **Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.**
- **Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.**
This refines the producer-loop classification: it is NOT loop-underrun
(both engines's loops run continuously). It is a **semaphore-state
divergence** — ours's work semaphore is over-released or never properly
drained; canary's drains correctly and the wait times out per 16ms tick.
## Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt)
```
0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68)
0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg)
0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours),
[r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours).
0x82450A98: bl 0x824AB240 ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout
0x82450A9C-A0: cmplwi/beq cr6, r3, 0 → 0x82450B50 [EXIT-WAIT1: r3==0 → exit (stop signaled)]
0x82450AA4-A8: li r29,0; li r28,4 [FIRST-ITER body entry]
0x82450AAC: lwz r11, 212(r31) [BACK-EDGE TARGET; reads "fast-path flag"]
0x82450AB0-BC: cntlzw / extrwi / cmplwi / bne cr6, 0xAC8 [BR-A: flag@212!=0 → search path]
0x82450AC0-C4: li r4,5; b 0xB2C [BR-B: flag@212==0 → direct dispatch w/ r4=5]
0x82450AC8-CC: mr r30,r29; addi r11,r31,112 [search-path setup]
0x82450AD0-E0: lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8 [BR-C: candidate found]
0x82450AE4-F0: addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0 [BR-D: search continue]
0x82450AF4: b 0xB34 [BR-E: search exhausted → skip dispatch, re-wait]
0x82450AF8: lwz r11, 224(r31) [budget check]
0x82450AFC-00: cmplwi cr6, r11, 0; beq cr6, 0xB28 [BR-F: budget@224==0 → skip refresh]
0x82450B04-0C: lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28 [BR-G: budget cmp]
0x82450B10: bl 0x824AA830 [KeQueryPerformanceCounter; sub_824AA830]
0x82450B14-1C: lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34 [BR-H: budget exceeded → re-wait]
0x82450B20-24: stw r28, 220(r31); stw r29, 224(r31)
0x82450B28: mr r4, r30
0x82450B2C-30: mr r3, r31; bl 0x82450B68 [DISPATCH: calls γ-signaler family]
0x82450B34-44: li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240 [RE-WAIT]
0x82450B48-4C: cmplwi cr6, r3, 0; bne cr6, 0x82450AAC [BACK-EDGE: r3!=0 → loop]
0x82450B50-58: li r3,0; addi r1,r1,128; b 0x825F0FD8 [EXIT path]
```
## Handle slots (ours, mem-watch confirmed)
```
[r31+88] = [0x828F3BC0] written at PC 0x8244FFB0 from NtCreateEvent → ours handle 0x104C
[r31+92] = [0x828F3BC4] written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050
```
Created in `sub_8244FF50` (the spawn helper) BEFORE ExCreateThread:
- handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0)
- handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)
This is a **stop-event + work-semaphore** pattern, NOT two events.
NtWaitForMultipleObjectsEx with WaitAny:
- r3 = WAIT_OBJECT_0 = 0 → handle[0] (stop event) signaled → EXIT
- r3 = WAIT_OBJECT_0+1 = 1 → handle[1] (semaphore) acquired (decremented) → DO WORK
- r3 = WAIT_TIMEOUT = 0x102 → 16ms elapsed with no signal → continue (poll)
## Per-PC iteration counts (HIGH confidence, direct branch-probe)
| PC | path | ours fires | canary fires | ratio |
|---|---|---:|---:|---:|
| 0x82450AA4 | FIRST-ITER entry | 1 | 1 | 1× |
| 0x82450AAC | BACK-EDGE target | 91 | 4 | (canary crashed early) |
| 0x82450AC0 | BR-B: flag@212==0 direct-dispatch r4=5 | 2 | 0 | — |
| 0x82450AC8 | BR-A: flag@212!=0 search path | 90 | 4 | — |
| 0x82450AE4 | inner-search continue | 72 | 17 | — |
| 0x82450AF4 | BR-E: search exhausted | 8 | 3 | — |
| 0x82450AF8 | BR-C: candidate found | 82 | 1 | — |
| 0x82450B04 | BR-F: budget skip | 81 | 0 | — |
| 0x82450B10 | budget refresh (KeQuery) | 8 | 0 | — |
| 0x82450B28 | dispatch entry (r4=r30) | 74 | 1 | — |
| 0x82450B34 | re-wait entry | 92 | 4 | — |
| **0x82450B50** | **EXIT path** | **0** | **0** | **never exits** |
Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on
exit. The relevant signal is in the **r3 distribution at the back-edge**,
not the absolute counts.
## r3 distribution at the back-edge (HIGH confidence)
### Ours (91 captures at PC=0x82450AAC, lr=0x82450B48)
```
r3=0x00000001 × 91/91 (100%)
r3=0x00000102 × 0/91 (0%)
```
### Canary (4 captures at PC=0x82450AAC, lr=0x82450B48)
```
r3=0x00000001 × 1/4 (25%)
r3=0x00000102 × 3/4 (75%)
```
Pattern visible in canary trace: first re-wait returns 0x1 (work
available immediately), subsequent re-waits return 0x102 (timeout).
## The divergent guest-memory location
The "divergent load" the user's framing predicted (a guest load reading
some flag whose value differs ours-vs-canary) is **the wait return
value, computed inside the kernel** — not a guest-memory load. The
return r3 comes from `NtWaitForMultipleObjectsEx` (a kernel import).
The kernel-side state that differs is the **WORK SEMAPHORE COUNT**:
- Ours: count > 0 at every wait → wait succeeds (decrement, r3=1)
- Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102)
The semaphore count is influenced by:
- `NtReleaseSemaphore(handle[1], 1)` calls (increments count by 1)
- `NtWaitForMultipleObjectsEx` success on handle[1] (decrements by 1)
So either:
- (a) ours's NtReleaseSemaphore is called more aggressively than canary's
- (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug)
- (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug)
- (d) ours's NtReleaseSemaphore over-releases (kind-extra count)
## NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs)
```
sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites),
sub_822f2328,
sub_823dd770, sub_823dd838, sub_823de4b8 (×3),
sub_823df320,
sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2)
sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8)
sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28)
sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0,
sub_8245AAF0, sub_8245ABD8, sub_8245AD00
```
The most-suspicious sites for this audit are the three in the
dispatch-loop module: `sub_82450218`, `sub_824503A0`, and the
self-release in `sub_82450B68`.
## Most-recent kernel calls before the divergent load (ours tid=5)
The "divergent load" is the kernel-side return of `NtWaitForMultipleObjectsEx`.
No guest-memory load is the proximate cause. Most-recent kernel calls
before each wait on ours tid=5 (from S3's ours-lr-trace data):
- `sub_824AB158``NtReleaseSemaphore` (via wrapper)
- `sub_824AA2F0``NtSetEvent`
- `sub_824AAF50``KeSetEvent`-style with ptr+size args
- `sub_824AA830``KeQueryPerformanceCounter`-like
- `sub_824AB240``NtWaitForMultipleObjectsEx` itself
## Hypothesis (MEDIUM-HIGH confidence)
The semaphore is being **over-released** in ours. Specifically, one of
the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8,
or any of the 22 other release-call sites) is firing release more often
than the dispatch loop is consuming work — OR — ours's wait kernel
handler in `xenia-kernel/src/exports.rs` is not atomically decrementing
the semaphore count on WAIT_OBJECT_0+N.
Ranked S5 leads:
1. **Audit ours's `NtWaitForMultipleObjectsEx` handler implementation**:
does it decrement the semaphore on success? (Likely yes — would
regress many things otherwise. Test with a small probe.)
2. **Probe `NtReleaseSemaphore` call rate on handle 0x1050** in ours.
Compare to canary on equivalent handle (some F8000xxx in canary).
Hypothesis: ours releases more often per dispatch.
3. **Cross-check the canary equivalent handle**: canary uses
`XSemaphore::native_object()` pseudo-handle for handle[1]. Use
`audit_69_event_signal_watch` extension (or grep S1's
`signal-probe-correlated.log` for KeReleaseSemaphore + the relevant
ptr) to identify canary's semaphore handle ID, then run the same probe.
## Classification
NOT a loop-exit-branch divergence (neither engine exits).
NOT a missing-thread / missing-spawn divergence (S2 closed that).
NOT a wrong-handle-selection divergence (S3 confirmed args match).
It IS a **semaphore-state divergence**: ours's NtWaitForMultipleObjects
keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's
returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in
ours; zero in canary.
## Confidence flags
| finding | confidence | reasoning |
|---|---|---|
| both loops never exit (B50 never fires) | HIGH | direct measurement |
| ours r3=1 always at back-edge | HIGH | 91/91 captures direct measurement |
| canary r3=0x102 mostly at back-edge | HIGH | 3/4 captures direct measurement |
| handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF | HIGH | mem-watch + disasm confirmed |
| handle[0] is NtCreateEvent | HIGH | disasm confirmed at 0x824A9F18 |
| ours handle slot values 0x104C, 0x1050 | HIGH | mem-watch confirmed |
| no exit-branch divergence in matching iter | HIGH | exit branch never taken in either |
| semaphore-state divergence root cause | MEDIUM-HIGH | r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init) |
| S5 path-1 (NtWaitForMultiple decrement bug) | MEDIUM | most likely culprit given kernel-side state divergence pattern, but other hypotheses still open |