handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
177
audit-runs/phase-xaudio-resume/escalation.md
Normal file
177
audit-runs/phase-xaudio-resume/escalation.md
Normal file
@@ -0,0 +1,177 @@
|
||||
# Phase XAudio-Resume — ESCALATION (case IV)
|
||||
|
||||
**Date**: 2026-05-19
|
||||
**Outcome**: Resume mechanism is correctly implemented. The 60% missing event volume
|
||||
is gated on a DOWNSTREAM application-level spin-poll, not on the resume itself.
|
||||
No engine change landed.
|
||||
|
||||
## Canary's resume mechanism (Step 1+2)
|
||||
|
||||
For each suspended XAudio worker (`entry_pc=0x824D2878` aff=16 → tid=14;
|
||||
`entry_pc=0x824D2940` aff=32 → tid=15), canary tid=6 (main) emits an identical
|
||||
6-call sequence immediately after `ExCreateThread`:
|
||||
|
||||
```
|
||||
canary tid=6 idx=106750..106766 (host_ns 1726.0..1726.2 ms)
|
||||
106750 import.call ExCreateThread
|
||||
106751 kernel.call ExCreateThread
|
||||
106752 handle.create (raw_handle 0x???????? — tid=14 handle)
|
||||
106753 thread.create (entry_pc=0x824d2878, suspended=true)
|
||||
106754 kernel.return ExCreateThread
|
||||
106755 import.call ObReferenceObjectByHandle
|
||||
106756 kernel.call ObReferenceObjectByHandle
|
||||
106757 kernel.return ObReferenceObjectByHandle
|
||||
106758 import.call KeSetBasePriorityThread
|
||||
106759 kernel.call KeSetBasePriorityThread
|
||||
106760 kernel.return KeSetBasePriorityThread
|
||||
106761 import.call KeResumeThread ← RESUME (xboxkrnl ord 146)
|
||||
106762 kernel.call KeResumeThread
|
||||
106763 kernel.return KeResumeThread
|
||||
106764 import.call ObDereferenceObject
|
||||
106765 kernel.call ObDereferenceObject
|
||||
106766 kernel.return ObDereferenceObject
|
||||
```
|
||||
|
||||
Block repeats verbatim at idx 106767-106783 for `entry_pc=0x824D2940`. Containing
|
||||
function is `XAudioRegisterRenderDriverClient` (visible at idx 106817).
|
||||
|
||||
## Ours's behavior at the matched site (Step 3)
|
||||
|
||||
Cold ours (-n 500M, --halt-on-deadlock, fresh cache wipe), checked against
|
||||
`/tmp/ours-xaudio.jsonl` (121,569 events captured before halt):
|
||||
|
||||
```
|
||||
ours tid=1 idx=106756..106786 (host_ns 1626 ms — boot is ~100 ms ahead of canary)
|
||||
106756 import.call ExCreateThread
|
||||
106757 kernel.call ExCreateThread
|
||||
106758 handle.create
|
||||
106759 thread.create (entry_pc=0x824d2878, suspended=true) ← matches canary
|
||||
106760 kernel.return ExCreateThread
|
||||
...
|
||||
106767 import.call KeResumeThread ← RESUME fires
|
||||
106768 kernel.call KeResumeThread
|
||||
106769 kernel.return KeResumeThread
|
||||
...
|
||||
106776 thread.create (entry_pc=0x824d2940, suspended=true) ← matches canary
|
||||
...
|
||||
106784 import.call KeResumeThread ← second RESUME fires
|
||||
106785 kernel.call KeResumeThread
|
||||
106786 kernel.return KeResumeThread
|
||||
```
|
||||
|
||||
ours's per-tid first-events (cold) for the spawned children:
|
||||
|
||||
```
|
||||
tid=9 (=canary tid=14, entry 0x824d2878): 77 events, idx 0..76 identical to canary tid=14
|
||||
tid=10 (=canary tid=15, entry 0x824d2940): 17 events, idx 0..16 identical to canary tid=15
|
||||
```
|
||||
|
||||
Ours's tid=9 / tid=10 EXECUTE the canary-matching XAudio init sequence:
|
||||
`KeWaitForSingleObject (with immediate signal)` → spinlock/IRQL cycle →
|
||||
`XAudioGetVoiceCategoryVolumeChangeMask` → `KeReleaseSemaphore` →
|
||||
more IRQL cycles. Then halt.
|
||||
|
||||
Halt-on-deadlock diagnostic shows tids 9 and 10 in state **Ready** at
|
||||
`pc=0x824d1404 lr=0x824d22b4` — they are NOT blocked on a missing kernel
|
||||
API, they are inside a guest-side spin-poll loop:
|
||||
|
||||
```
|
||||
0x824d1400: beqlr cr6 ; return if poll succeeded
|
||||
0x824d1404: cmpd cr6, r10, r11 ; r10 vs r11
|
||||
0x824d1408: beq cr6, 0x824D1420 ; ok-branch
|
||||
0x824d140c: mr r31, r31 ; nop (yield hint)
|
||||
0x824d1410: ld r11, 0(r4) ; reload [r4]
|
||||
0x824d1414: cmpdi cr6, r11, 0
|
||||
0x824d1418: bne cr6, 0x824D1404 ; if nonzero, loop
|
||||
0x824d141c: blr
|
||||
```
|
||||
|
||||
`r4 = r31+356` (caller pushes `addi r4, r31, 356` at 0x824d22a8). The threads
|
||||
are spin-polling guest memory at `[r31+356]` waiting for it to reach 0.
|
||||
|
||||
## Classification: case IV (not I / II / III)
|
||||
|
||||
The plan's original classification anticipated:
|
||||
- (I) ours doesn't reach the spawn LR ← refuted: spawn fires at idx 106756/106773
|
||||
- (II) ours reaches spawn but no resume ← refuted: KeResumeThread fires at idx 106768/106785
|
||||
- (III) ours's NtResumeThread is misimplemented ← refuted: `resume_ref()` correctly
|
||||
clears `Blocked(BlockReason::Suspended)` → `Ready`; halt diagnostic confirms
|
||||
post-resume Ready state and identical first-77/17 events to canary
|
||||
|
||||
**Actual classification (IV)**: Resume succeeds; XAudio threads start running and
|
||||
execute their init sequence verbatim against canary; then enter a guest-side
|
||||
application spin-poll on `[r31+356]` that never resolves in ours. The producer
|
||||
of the 0-write to that location is part of canary's audio/GPU host bridge chain
|
||||
that AUDIT-048 only partially restored (cascades A/B/D landed; cascade C —
|
||||
XAudioSubmitRenderDriverFrame — remained `0` per that audit's own assessment).
|
||||
|
||||
## Why the 60% volume claim doesn't follow from a resume-only fix
|
||||
|
||||
Phase NonMatch's "60% missing event volume" attribution to XAudio assumed
|
||||
the threads simply weren't running. They ARE running — they emit identical
|
||||
first events, get scheduled, and reach the spin loop. The volume bottleneck
|
||||
is the post-init *steady-state pump*: canary's 6.15 M tid=14 events come from
|
||||
26,126 repeated iterations of the `XAudioGetVoiceCategoryVolumeChangeMask` /
|
||||
`KeReleaseSemaphore` / IRQL-cycle loop, each iteration gated on the host
|
||||
bridge clearing the `[r31+356]` flag. With the flag stuck non-zero in ours,
|
||||
the loop never re-enters; only the single first iteration (idx 0-76) ever
|
||||
executes. No quantum of resume-side change is going to unstick this.
|
||||
|
||||
## Out-of-scope for this session
|
||||
|
||||
Per session authorization, fixing the host-bridge memory-write that clears
|
||||
`[r31+356]` requires touching xenia-apu/xenia-gpu host code, which is
|
||||
explicitly forbidden ("the host bridge is separate"). Therefore no engine
|
||||
change lands in this session.
|
||||
|
||||
## Progression metric (re-validation gate, baseline-only)
|
||||
|
||||
Not re-measured for a change — there was no change. Pre-existing baseline
|
||||
remains the C+23+absorber state (`23cf4c4cbf61a577caa4118ab2308ba6` /
|
||||
`ba5b5e07…` depending on Phase D stage). swaps and draws unchanged. Per-chain
|
||||
matched-prefixes from MEMORY.md remain:
|
||||
- main tid=6→1: 105,046 (with Phase D D-extension absorber)
|
||||
- sister chains 11/32/4/41/16: preserved
|
||||
|
||||
## Recommended next attack target
|
||||
|
||||
The remaining XAudio gate is **AUDIT-048 cascade C**: producer of the
|
||||
`[r31+356]=0` write. This is the part of the audio host-bridge chain that did
|
||||
NOT land in AUDIT-048. It likely involves:
|
||||
|
||||
1. `XAudioSubmitRenderDriverFrame` host-side callback firing the buffer-complete
|
||||
event with a side effect that decrements/clears a counter at offset 356 of
|
||||
the XAudio client struct.
|
||||
2. `KeReleaseSemaphore` on a paired semaphore that produces the host-side
|
||||
buffer-complete notification.
|
||||
|
||||
A targeted re-attack would:
|
||||
|
||||
1. Read xenia-canary's `apu/audio_system.cc` + `apu/xma_decoder.cc` to find
|
||||
the host-side write that clears `r31+356` (likely an XAUDIO_CLIENT_STATE
|
||||
struct field).
|
||||
2. Mirror it in xenia-rs's `xaudio.rs` / audio worker context.
|
||||
3. Re-validate the cold cycle. swaps may move 1→2 if the audio pump reaches
|
||||
the renderer fence; draws likely remain 0 (audio ≠ renderer per AUDIT-048).
|
||||
|
||||
That work is the AUDIT-048-cascade-C completion task, NOT the resume gate.
|
||||
It's the natural sister of the deferred sub_825070F0 main-gate Path P.
|
||||
|
||||
## Per-chain delta (no change this session)
|
||||
|
||||
| chain | pre | post | delta |
|
||||
|------:|----:|-----:|------:|
|
||||
| tid=6→1 main | 105,046 | 105,046 | 0 |
|
||||
| tid=11→11 | preserved | preserved | 0 |
|
||||
| tid=14→9 XAudio | 41 | 41 | 0 |
|
||||
| tid=15→10 XAudio | 16 | 16 | 0 |
|
||||
| tid=4→4 | preserved | preserved | 0 |
|
||||
| tid=16→16 | preserved | preserved | 0 |
|
||||
|
||||
## Artifacts
|
||||
|
||||
- `tid6_window.json` — canary tid=6 events idx 106700..108200 around the
|
||||
XAudio spawn burst
|
||||
- `tid14_first.json` / `tid15_first.json` — canary tid=14/15 first 120 events
|
||||
- `extract_window.py` — extraction script
|
||||
- `escalation.md` — this file
|
||||
47
audit-runs/phase-xaudio-resume/extract_window.py
Normal file
47
audit-runs/phase-xaudio-resume/extract_window.py
Normal file
@@ -0,0 +1,47 @@
|
||||
#!/usr/bin/env python3
|
||||
"""Extract canary tid=6 events around the XAudio spawn window (idx 106750-107800)
|
||||
and tid=14/15 first events. Looking for the resume mechanism."""
|
||||
import json
|
||||
import os
|
||||
|
||||
PATH = "/home/fabi/RE - Project Sylpheed/xenia-canary/build-cross/bin/Windows/Debug/canary-jitter-1.jsonl"
|
||||
|
||||
tid6_events = [] # idx 106700 to 107800
|
||||
tid14_first = []
|
||||
tid15_first = []
|
||||
tid14_count = 0
|
||||
tid15_count = 0
|
||||
seen_tid6_high = False
|
||||
|
||||
with open(PATH, "rb") as f:
|
||||
for line in f:
|
||||
try:
|
||||
obj = json.loads(line)
|
||||
except Exception:
|
||||
continue
|
||||
tid = obj.get("tid")
|
||||
idx = obj.get("tid_event_idx", obj.get("idx"))
|
||||
if tid == 6 and idx is not None and 106700 <= idx <= 108200:
|
||||
tid6_events.append(obj)
|
||||
if idx > 108000:
|
||||
seen_tid6_high = True
|
||||
elif tid == 14 and tid14_count < 120:
|
||||
tid14_first.append(obj)
|
||||
tid14_count += 1
|
||||
elif tid == 15 and tid15_count < 120:
|
||||
tid15_first.append(obj)
|
||||
tid15_count += 1
|
||||
if seen_tid6_high and tid14_count >= 120 and tid15_count >= 120:
|
||||
break
|
||||
|
||||
OUT = os.path.dirname(os.path.abspath(__file__))
|
||||
with open(os.path.join(OUT, "tid6_window.json"), "w") as f:
|
||||
json.dump(tid6_events, f, indent=2)
|
||||
with open(os.path.join(OUT, "tid14_first.json"), "w") as f:
|
||||
json.dump(tid14_first, f, indent=2)
|
||||
with open(os.path.join(OUT, "tid15_first.json"), "w") as f:
|
||||
json.dump(tid15_first, f, indent=2)
|
||||
|
||||
print(f"tid6 window: {len(tid6_events)}")
|
||||
print(f"tid14 first: {len(tid14_first)}")
|
||||
print(f"tid15 first: {len(tid15_first)}")
|
||||
1848
audit-runs/phase-xaudio-resume/tid14_first.json
Normal file
1848
audit-runs/phase-xaudio-resume/tid14_first.json
Normal file
File diff suppressed because it is too large
Load Diff
1850
audit-runs/phase-xaudio-resume/tid15_first.json
Normal file
1850
audit-runs/phase-xaudio-resume/tid15_first.json
Normal file
File diff suppressed because it is too large
Load Diff
19979
audit-runs/phase-xaudio-resume/tid6_window.json
Normal file
19979
audit-runs/phase-xaudio-resume/tid6_window.json
Normal file
File diff suppressed because it is too large
Load Diff
Reference in New Issue
Block a user