handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,209 @@
|
||||
# AUDIT-069 Session 4 — divergence analysis
|
||||
|
||||
Date: 2026-05-20
|
||||
xenia-rs HEAD: `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` (UNCHANGED)
|
||||
|
||||
## Headline (HIGH confidence — direct per-iteration measurement)
|
||||
|
||||
The S3 framing of "producer-loop underrun" was directionally right but
|
||||
mis-located the divergence. The loop in `sub_82450A68` **does not take
|
||||
an early-exit branch in either engine** — neither ours nor canary ever
|
||||
reaches `0x82450B50` (the exit path). Both stay in the loop indefinitely.
|
||||
|
||||
The divergence is **WHAT the NtWaitForMultipleObjectsEx call returns at
|
||||
each iteration**:
|
||||
|
||||
- **Ours: r3 = 1 (WAIT_OBJECT_0+1, semaphore signaled) EVERY iteration.**
|
||||
- **Canary: r3 = 0x102 (WAIT_TIMEOUT) mostly, r3 = 1 occasionally.**
|
||||
|
||||
This refines the producer-loop classification: it is NOT loop-underrun
|
||||
(both engines's loops run continuously). It is a **semaphore-state
|
||||
divergence** — ours's work semaphore is over-released or never properly
|
||||
drained; canary's drains correctly and the wait times out per 16ms tick.
|
||||
|
||||
## Loop structure (sub_82450A68 disasm at s4/sub_82450A68-disasm.txt)
|
||||
|
||||
```
|
||||
0x82450A28: sub_82450A28 = thread entry (KeSetThreadPriority(-2, 3); bl sub_82450A68)
|
||||
0x82450A68: prolog (mflr, alloc 128B frame, r31=ctx_arg)
|
||||
0x82450A78-94: stack handle array [r1+80]=[r31+88]=handle[0]=STOP_EVENT (=0x104C in ours),
|
||||
[r1+84]=[r31+92]=handle[1]=WORK_SEMAPHORE (=0x1050 in ours).
|
||||
0x82450A98: bl 0x824AB240 ; NtWaitForMultipleObjectsEx wrapper, 16ms timeout
|
||||
0x82450A9C-A0: cmplwi/beq cr6, r3, 0 → 0x82450B50 [EXIT-WAIT1: r3==0 → exit (stop signaled)]
|
||||
0x82450AA4-A8: li r29,0; li r28,4 [FIRST-ITER body entry]
|
||||
0x82450AAC: lwz r11, 212(r31) [BACK-EDGE TARGET; reads "fast-path flag"]
|
||||
0x82450AB0-BC: cntlzw / extrwi / cmplwi / bne cr6, 0xAC8 [BR-A: flag@212!=0 → search path]
|
||||
0x82450AC0-C4: li r4,5; b 0xB2C [BR-B: flag@212==0 → direct dispatch w/ r4=5]
|
||||
0x82450AC8-CC: mr r30,r29; addi r11,r31,112 [search-path setup]
|
||||
0x82450AD0-E0: lwz r10,0(r11); cntlzw; extrwi; cmplwi; beq cr6, 0xAF8 [BR-C: candidate found]
|
||||
0x82450AE4-F0: addi r30,1; addi r11,20; cmplwi cr6, r30, 5; blt cr6, 0xAD0 [BR-D: search continue]
|
||||
0x82450AF4: b 0xB34 [BR-E: search exhausted → skip dispatch, re-wait]
|
||||
0x82450AF8: lwz r11, 224(r31) [budget check]
|
||||
0x82450AFC-00: cmplwi cr6, r11, 0; beq cr6, 0xB28 [BR-F: budget@224==0 → skip refresh]
|
||||
0x82450B04-0C: lwz r11, 220(r31); cmpw cr6, r11, r30; bge cr6, 0xB28 [BR-G: budget cmp]
|
||||
0x82450B10: bl 0x824AA830 [KeQueryPerformanceCounter; sub_824AA830]
|
||||
0x82450B14-1C: lwz r11,224(r31); cmplw cr6,r3,r11; blt cr6, 0xB34 [BR-H: budget exceeded → re-wait]
|
||||
0x82450B20-24: stw r28, 220(r31); stw r29, 224(r31)
|
||||
0x82450B28: mr r4, r30
|
||||
0x82450B2C-30: mr r3, r31; bl 0x82450B68 [DISPATCH: calls γ-signaler family]
|
||||
0x82450B34-44: li r6,16; li r5,0; addi r4,r1,80; li r3,2; bl 0x824AB240 [RE-WAIT]
|
||||
0x82450B48-4C: cmplwi cr6, r3, 0; bne cr6, 0x82450AAC [BACK-EDGE: r3!=0 → loop]
|
||||
0x82450B50-58: li r3,0; addi r1,r1,128; b 0x825F0FD8 [EXIT path]
|
||||
```
|
||||
|
||||
## Handle slots (ours, mem-watch confirmed)
|
||||
|
||||
```
|
||||
[r31+88] = [0x828F3BC0] written at PC 0x8244FFB0 from NtCreateEvent → ours handle 0x104C
|
||||
[r31+92] = [0x828F3BC4] written at PC 0x8244FFCC from NtCreateSemaphore → ours handle 0x1050
|
||||
```
|
||||
|
||||
Created in `sub_8244FF50` (the spawn helper) BEFORE ExCreateThread:
|
||||
- handle[0] = NtCreateEvent(EventType=NotificationEvent, InitialState=0)
|
||||
- handle[1] = NtCreateSemaphore(InitialCount=0, MaximumCount=0x7FFFFFFF)
|
||||
|
||||
This is a **stop-event + work-semaphore** pattern, NOT two events.
|
||||
NtWaitForMultipleObjectsEx with WaitAny:
|
||||
- r3 = WAIT_OBJECT_0 = 0 → handle[0] (stop event) signaled → EXIT
|
||||
- r3 = WAIT_OBJECT_0+1 = 1 → handle[1] (semaphore) acquired (decremented) → DO WORK
|
||||
- r3 = WAIT_TIMEOUT = 0x102 → 16ms elapsed with no signal → continue (poll)
|
||||
|
||||
## Per-PC iteration counts (HIGH confidence, direct branch-probe)
|
||||
|
||||
| PC | path | ours fires | canary fires | ratio |
|
||||
|---|---|---:|---:|---:|
|
||||
| 0x82450AA4 | FIRST-ITER entry | 1 | 1 | 1× |
|
||||
| 0x82450AAC | BACK-EDGE target | 91 | 4 | (canary crashed early) |
|
||||
| 0x82450AC0 | BR-B: flag@212==0 direct-dispatch r4=5 | 2 | 0 | — |
|
||||
| 0x82450AC8 | BR-A: flag@212!=0 search path | 90 | 4 | — |
|
||||
| 0x82450AE4 | inner-search continue | 72 | 17 | — |
|
||||
| 0x82450AF4 | BR-E: search exhausted | 8 | 3 | — |
|
||||
| 0x82450AF8 | BR-C: candidate found | 82 | 1 | — |
|
||||
| 0x82450B04 | BR-F: budget skip | 81 | 0 | — |
|
||||
| 0x82450B10 | budget refresh (KeQuery) | 8 | 0 | — |
|
||||
| 0x82450B28 | dispatch entry (r4=r30) | 74 | 1 | — |
|
||||
| 0x82450B34 | re-wait entry | 92 | 4 | — |
|
||||
| **0x82450B50** | **EXIT path** | **0** | **0** | **never exits** |
|
||||
|
||||
Canary's run was cut short at ~5 iterations by a vkd3d-proton fault on
|
||||
exit. The relevant signal is in the **r3 distribution at the back-edge**,
|
||||
not the absolute counts.
|
||||
|
||||
## r3 distribution at the back-edge (HIGH confidence)
|
||||
|
||||
### Ours (91 captures at PC=0x82450AAC, lr=0x82450B48)
|
||||
|
||||
```
|
||||
r3=0x00000001 × 91/91 (100%)
|
||||
r3=0x00000102 × 0/91 (0%)
|
||||
```
|
||||
|
||||
### Canary (4 captures at PC=0x82450AAC, lr=0x82450B48)
|
||||
|
||||
```
|
||||
r3=0x00000001 × 1/4 (25%)
|
||||
r3=0x00000102 × 3/4 (75%)
|
||||
```
|
||||
|
||||
Pattern visible in canary trace: first re-wait returns 0x1 (work
|
||||
available immediately), subsequent re-waits return 0x102 (timeout).
|
||||
|
||||
## The divergent guest-memory location
|
||||
|
||||
The "divergent load" the user's framing predicted (a guest load reading
|
||||
some flag whose value differs ours-vs-canary) is **the wait return
|
||||
value, computed inside the kernel** — not a guest-memory load. The
|
||||
return r3 comes from `NtWaitForMultipleObjectsEx` (a kernel import).
|
||||
|
||||
The kernel-side state that differs is the **WORK SEMAPHORE COUNT**:
|
||||
|
||||
- Ours: count > 0 at every wait → wait succeeds (decrement, r3=1)
|
||||
- Canary: count = 0 at every wait (mostly) → wait times out (r3=0x102)
|
||||
|
||||
The semaphore count is influenced by:
|
||||
- `NtReleaseSemaphore(handle[1], 1)` calls (increments count by 1)
|
||||
- `NtWaitForMultipleObjectsEx` success on handle[1] (decrements by 1)
|
||||
|
||||
So either:
|
||||
- (a) ours's NtReleaseSemaphore is called more aggressively than canary's
|
||||
- (b) ours's NtWaitForMultipleObjectsEx doesn't decrement on success (kernel bug)
|
||||
- (c) ours's NtCreateSemaphore creates with InitialCount > 0 (creation bug)
|
||||
- (d) ours's NtReleaseSemaphore over-releases (kind-extra count)
|
||||
|
||||
## NtReleaseSemaphore callers (15 unique fns from sylpheed.db xrefs)
|
||||
|
||||
```
|
||||
sub_822c6748, sub_822c6808, sub_822c8b50 (×6 inline call sites),
|
||||
sub_822f2328,
|
||||
sub_823dd770, sub_823dd838, sub_823de4b8 (×3),
|
||||
sub_823df320,
|
||||
sub_82450218 ← in dispatch-loop module (callers: sub_82452DC0 ×2)
|
||||
sub_824503A0 ← in dispatch-loop module (callers: sub_82452690, sub_8245E1D8)
|
||||
sub_82450B68 ← THE DISPATCH FUNCTION ITSELF (×2 internal release sites at 0xCDC, 0xD28)
|
||||
sub_824569C0 (j-call), sub_82457FE0, sub_82458468, sub_824591C0,
|
||||
sub_8245AAF0, sub_8245ABD8, sub_8245AD00
|
||||
```
|
||||
|
||||
The most-suspicious sites for this audit are the three in the
|
||||
dispatch-loop module: `sub_82450218`, `sub_824503A0`, and the
|
||||
self-release in `sub_82450B68`.
|
||||
|
||||
## Most-recent kernel calls before the divergent load (ours tid=5)
|
||||
|
||||
The "divergent load" is the kernel-side return of `NtWaitForMultipleObjectsEx`.
|
||||
No guest-memory load is the proximate cause. Most-recent kernel calls
|
||||
before each wait on ours tid=5 (from S3's ours-lr-trace data):
|
||||
|
||||
- `sub_824AB158` ↔ `NtReleaseSemaphore` (via wrapper)
|
||||
- `sub_824AA2F0` ↔ `NtSetEvent`
|
||||
- `sub_824AAF50` ↔ `KeSetEvent`-style with ptr+size args
|
||||
- `sub_824AA830` ↔ `KeQueryPerformanceCounter`-like
|
||||
- `sub_824AB240` ↔ `NtWaitForMultipleObjectsEx` itself
|
||||
|
||||
## Hypothesis (MEDIUM-HIGH confidence)
|
||||
|
||||
The semaphore is being **over-released** in ours. Specifically, one of
|
||||
the producer-side enqueue paths (sub_82452DC0, sub_82452690, sub_8245E1D8,
|
||||
or any of the 22 other release-call sites) is firing release more often
|
||||
than the dispatch loop is consuming work — OR — ours's wait kernel
|
||||
handler in `xenia-kernel/src/exports.rs` is not atomically decrementing
|
||||
the semaphore count on WAIT_OBJECT_0+N.
|
||||
|
||||
Ranked S5 leads:
|
||||
|
||||
1. **Audit ours's `NtWaitForMultipleObjectsEx` handler implementation**:
|
||||
does it decrement the semaphore on success? (Likely yes — would
|
||||
regress many things otherwise. Test with a small probe.)
|
||||
2. **Probe `NtReleaseSemaphore` call rate on handle 0x1050** in ours.
|
||||
Compare to canary on equivalent handle (some F8000xxx in canary).
|
||||
Hypothesis: ours releases more often per dispatch.
|
||||
3. **Cross-check the canary equivalent handle**: canary uses
|
||||
`XSemaphore::native_object()` pseudo-handle for handle[1]. Use
|
||||
`audit_69_event_signal_watch` extension (or grep S1's
|
||||
`signal-probe-correlated.log` for KeReleaseSemaphore + the relevant
|
||||
ptr) to identify canary's semaphore handle ID, then run the same probe.
|
||||
|
||||
## Classification
|
||||
|
||||
NOT a loop-exit-branch divergence (neither engine exits).
|
||||
NOT a missing-thread / missing-spawn divergence (S2 closed that).
|
||||
NOT a wrong-handle-selection divergence (S3 confirmed args match).
|
||||
|
||||
It IS a **semaphore-state divergence**: ours's NtWaitForMultipleObjects
|
||||
keeps returning WAIT_OBJECT_0+1 (semaphore signaled) where canary's
|
||||
returns WAIT_TIMEOUT. The semaphore count is non-zero at wait-entry in
|
||||
ours; zero in canary.
|
||||
|
||||
## Confidence flags
|
||||
|
||||
| finding | confidence | reasoning |
|
||||
|---|---|---|
|
||||
| both loops never exit (B50 never fires) | HIGH | direct measurement |
|
||||
| ours r3=1 always at back-edge | HIGH | 91/91 captures direct measurement |
|
||||
| canary r3=0x102 mostly at back-edge | HIGH | 3/4 captures direct measurement |
|
||||
| handle[1] is NtCreateSemaphore w/ InitialCount=0, Max=0x7FFFFFFF | HIGH | mem-watch + disasm confirmed |
|
||||
| handle[0] is NtCreateEvent | HIGH | disasm confirmed at 0x824A9F18 |
|
||||
| ours handle slot values 0x104C, 0x1050 | HIGH | mem-watch confirmed |
|
||||
| no exit-branch divergence in matching iter | HIGH | exit branch never taken in either |
|
||||
| semaphore-state divergence root cause | MEDIUM-HIGH | r3 differs → wait kernel return differs → semaphore state must differ; haven't directly proved which (over-release vs no-decrement vs wrong-init) |
|
||||
| S5 path-1 (NtWaitForMultiple decrement bug) | MEDIUM | most likely culprit given kernel-side state divergence pattern, but other hypotheses still open |
|
||||
@@ -0,0 +1,80 @@
|
||||
0x82450a28: mflr r12
|
||||
0x82450a2c: stw r12, -8(r1)
|
||||
0x82450a30: std r31, -16(r1)
|
||||
0x82450a34: stwu r1, -96(r1)
|
||||
0x82450a38: mr r31, r3
|
||||
0x82450a3c: li r4, 3
|
||||
0x82450a40: li r3, -2
|
||||
0x82450a44: bl 0x824AA658
|
||||
0x82450a48: mr r3, r31
|
||||
0x82450a4c: bl 0x82450A68
|
||||
0x82450a50: addi r1, r1, 96
|
||||
0x82450a54: lwz r12, -8(r1)
|
||||
0x82450a58: mtlr r12
|
||||
0x82450a5c: ld r31, -16(r1)
|
||||
0x82450a60: blr
|
||||
0x82450a64: .long 0x00000000
|
||||
0x82450a68: mflr r12
|
||||
0x82450a6c: bl 0x825F0F88
|
||||
0x82450a70: stwu r1, -128(r1)
|
||||
0x82450a74: mr r31, r3
|
||||
0x82450a78: li r6, 16
|
||||
0x82450a7c: li r5, 0
|
||||
0x82450a80: addi r4, r1, 80
|
||||
0x82450a84: li r3, 2
|
||||
0x82450a88: lwz r11, 88(r31)
|
||||
0x82450a8c: stw r11, 80(r1)
|
||||
0x82450a90: lwz r11, 92(r31)
|
||||
0x82450a94: stw r11, 84(r1)
|
||||
0x82450a98: bl 0x824AB240
|
||||
0x82450a9c: cmplwi cr6, r3, 0x0
|
||||
0x82450aa0: beq cr6, 0x82450B50
|
||||
0x82450aa4: li r29, 0
|
||||
0x82450aa8: li r28, 4
|
||||
0x82450aac: lwz r11, 212(r31)
|
||||
0x82450ab0: cntlzw r11, r11
|
||||
0x82450ab4: extrwi r11, r11, 1, 26
|
||||
0x82450ab8: cmplwi cr6, r11, 0x0
|
||||
0x82450abc: bne cr6, 0x82450AC8
|
||||
0x82450ac0: li r4, 5
|
||||
0x82450ac4: b 0x82450B2C
|
||||
0x82450ac8: mr r30, r29
|
||||
0x82450acc: addi r11, r31, 112
|
||||
0x82450ad0: lwz r10, 0(r11)
|
||||
0x82450ad4: cntlzw r10, r10
|
||||
0x82450ad8: extrwi r10, r10, 1, 26
|
||||
0x82450adc: cmplwi cr6, r10, 0x0
|
||||
0x82450ae0: beq cr6, 0x82450AF8
|
||||
0x82450ae4: addi r30, r30, 1
|
||||
0x82450ae8: addi r11, r11, 20
|
||||
0x82450aec: cmplwi cr6, r30, 0x5
|
||||
0x82450af0: blt cr6, 0x82450AD0
|
||||
0x82450af4: b 0x82450B34
|
||||
0x82450af8: lwz r11, 224(r31)
|
||||
0x82450afc: cmplwi cr6, r11, 0x0
|
||||
0x82450b00: beq cr6, 0x82450B28
|
||||
0x82450b04: lwz r11, 220(r31)
|
||||
0x82450b08: cmpw cr6, r11, r30
|
||||
0x82450b0c: bge cr6, 0x82450B28
|
||||
0x82450b10: bl 0x824AA830
|
||||
0x82450b14: lwz r11, 224(r31)
|
||||
0x82450b18: cmplw cr6, r3, r11
|
||||
0x82450b1c: blt cr6, 0x82450B34
|
||||
0x82450b20: stw r28, 220(r31)
|
||||
0x82450b24: stw r29, 224(r31)
|
||||
0x82450b28: mr r4, r30
|
||||
0x82450b2c: mr r3, r31
|
||||
0x82450b30: bl 0x82450B68
|
||||
0x82450b34: li r6, 16
|
||||
0x82450b38: li r5, 0
|
||||
0x82450b3c: addi r4, r1, 80
|
||||
0x82450b40: li r3, 2
|
||||
0x82450b44: bl 0x824AB240
|
||||
0x82450b48: cmplwi cr6, r3, 0x0
|
||||
0x82450b4c: bne cr6, 0x82450AAC
|
||||
0x82450b50: li r3, 0
|
||||
0x82450b54: addi r1, r1, 128
|
||||
0x82450b58: b 0x825F0FD8
|
||||
0x82450b5c: .long 0x00000000
|
||||
0x82450b60: lwz r18, 9792(r31)
|
||||
0x82450b64: lwz r16, 13880(r14)
|
||||
Reference in New Issue
Block a user