handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,196 @@
# Phase Host-Audio-Bridge — Investigation (2026-05-19)
**Outcome**: AUDIT-048 cascade C NOT landed this session. Root cause is upstream
state divergence in XAudio voice struct initialization, NOT a missing host-side
write. Progression metric (swaps=1, draws=0) unchanged. Per-chain matched-prefix
unchanged.
## Diagnosis
### Canary semantics (verified)
- `xenia-canary/src/xenia/apu/audio_system.cc:84-159``AudioSystem::WorkerThreadMain`
is a HOST thread that loops: `WaitAny(client_semaphores_)``processor_->Execute(callback)`.
Semaphores are seeded by `RegisterClient` line 210: `client_semaphore->Release(queued_frames_=8, nullptr)`.
- After SDL plays a frame, `sdl_audio_driver.cc:199` calls `semaphore_->Release(1)` — re-arming
the loop. With `--mute=true`, SDL still consumes frames via `SDLCallback` and still releases.
- **There is NO host-side write to a guest field at offset +356**. The XAudio voice struct
at `r31` is GUEST-allocated and managed entirely by the GUEST callback code at
`0x824D6640` and the XAudio worker thread bodies at `0x824D2878 / 0x824D2940`.
### Ours's current state
- AUDIT-048 Plan B (dedicated guest worker thread, parked on synthetic handle, injected
by ticker) is wired in `xenia-kernel/src/exports.rs:4048-4168` + `xenia-kernel/src/xaudio.rs`
+ `xenia-app/src/main.rs:3461-3536`.
- Ours's tid=11 (entry=0x824D6640 = the registered callback) DOES execute. Per
deadlock dump: `pc=0x824d2a94 lr=0x824d2a94 state=Blocked(WaitAny { handles:
[0x82928B04, 0x82928AE0] })` — the callback called `KeWaitForMultipleObjects`
on two guest dispatchers and is now waiting.
- `xaudio.callback.delivered=1` — only one injection, because `is_in_callback`
stays true while tid=11 is blocked on the real handles (the saved context is
only cleared on `LR_HALT_SENTINEL` return, which tid=11 never reaches).
### The spin loop in tid=9/10 (the XAudio worker guest threads)
PCs `0x824D1400-0x824D141C` (canary tid=14 / ours tid=9):
```
0x824d1400: beqlr cr6 ; return if cr6.eq (success)
0x824d1404: cmpd cr6, r10, r11 ; compare r10 with r11
0x824d1408: beq cr6, 0x824D1420 ; ok-branch on match
0x824d140c: mr r31, r31 ; yield nop
0x824d1410: ld r11, 0(r4) ; reload [r4]
0x824d1414: cmpdi cr6, r11, 0 ; check r11 == 0
0x824d1418: bne cr6, 0x824D1404 ; loop if nonzero
0x824d141c: blr ; return on r11 == 0
```
LR=0x824D22B4 (caller does `addi r4, r31, 356; bl ...; <0x824D22B4 is next>`).
### Live runtime probe (ours, 100M instr, --dump-addr 0x42511040 and 0x42510edc)
At halt (tid=9 still spinning at pc=0x824d140c):
- `r3 = r31 = 0x42510edc` (an XAudio voice/driver struct in heap-mapped guest mem).
- `r4 = r31 + 0x164 (=356) = 0x42511040`.
- `r10 = 0x01010000` (expected success value).
- Last-known `r11 = 0x00000000` (from the load) — **but the spin continues**, so
the value at `[r4]` keeps changing? Or the snapshot doesn't reflect steady state.
Memory dump at `0x42511040`:
```
+0x00: 01 00 00 00 00 00 00 00 → ld interpretation = 0x0100000000000000
+0x10: 00 00 00 03 42 51 10 54 → linked-list head with 3 entries
+0x20...: list nodes with prev/next pointers in 0x4251xxxx range
```
This is a **GUEST-OWNED linked-list / voice-state struct**. Byte 0 = 0x01 is
clearly a "state flag" that distinguishes the poll target. `ld` reads 8 bytes
BE → 0x0100000000000000. r10 expected is 0x01010000 (zero-extended). r11 read
is 0x0100000000000000. Not equal, not zero → spin.
Memory dump at `r31=0x42510edc`:
```
+0x00: 82 00 6c f4 → VTABLE POINTER (code at .rdata 0x82006cf4)
+0x04: 00 00 00 02 → refcount or count
+0x08: 42 51 0e c0 → back-pointer
+0x40: 41 ea 0d 5c → matches XAudio register callback_arg
```
So `r31` is the XAudio voice object Sylpheed allocates and passes as the callback
argument. The vtable at `0x82006cf4` is `ANON_Class_*` per `sylpheed.db`. The
voice owns a linked list (head at +0x164) that tracks audio buffers / voice state.
### Why ours diverges from canary
Ours's tid=9 sees `[r31+356]` as `0x0100000000000000`; canary's tid=14 sees it as
`0x0000000000000000` (or `0x0000000001010000` matching r10). Both engines run
identical guest code starting from the same .data values. So the divergence
must be a **kernel-call return value** OR a **memory write** that happens
between thread spawn and the spin loop.
Per cross-trace of tid=9 events idx 0..76 vs canary tid=14 events 0..~80,
the **kernel return values match** (KeWaitForSingleObject→0, KeRaiseIrqlToDpcLevel
returns the same sequence 0,2,2,2 etc., KfLowerIrql→0). The setup chain emits
the same events. **But host_ns wall-clock diverges**: canary's KeWaitForSingleObject
blocks for ~85ms (1727→1813 ms); ours's wait returns in ~7 microseconds
(1603890→1603904 ns).
### The root cause class
This is **upstream scheduling divergence**, not host-bridge missing:
1. In **canary**: tid=11 (host AudioSystem WorkerThreadMain) starts FIRST and runs
the callback at 0x824D6640. The callback modifies the XAudio voice struct
(clearing byte at +0x164). Then tid=14 spawns, hits the spin, sees zero,
proceeds.
2. In **ours**: tid=9/10 are spawned by main and resumed via `KeResumeThread`.
They start running BEFORE the audio ticker (period 48,000 instructions) ever
fires. tid=9 hits the spin loop with the struct in its uninitialized state
(byte +0x164 = 0x01). Stuck forever.
The audio worker (tid=11) DOES eventually get injected and runs, but by then
tid=9/10 are stuck in the spin loop and the callback blocks on guest dispatchers
that only tid=9/10 can signal — circular deadlock.
## Why a host-side write is the wrong fix
The session brief hypothesizes a missing host-side write to clear `[r31+356]`.
This is not correct:
- Canary's host audio worker does NOT write to any guest VA in the +356 range.
It only calls `processor_->Execute(callback)` and waits on its host semaphore.
- The byte at offset 0x164 of the voice struct is touched only by GUEST code
(the callback or the worker functions). No host code in either engine reaches
into that field.
- The "missing write" framing came from assuming the host audio worker does
something analogous to SubmitFrame's buffer-complete bookkeeping. SubmitFrame
only acks the host SDL driver semaphore (line 199); it does not modify the
voice struct at +356.
Writing 0 to `[r31+356]` from host code would be a band-aid that crosses
reading-error #23 (matching divergent guest behavior) and risks corrupting
the voice struct's invariants.
## What the correct fix shape would be
To make ours converge to canary's behavior, the audio worker callback at
0x824D6640 needs to RUN AND COMPLETE before tid=9/10 reach the spin loop.
**Option A — Force-fire callback at register time**: Inside
`xaudio_register_render_driver`, after spawning tid=11, synchronously execute
the callback to completion (treat as a synchronous shim). Tricky because the
callback calls KeWaitForMultipleObjects which would block.
**Option B — Defer spawn of tid=9/10 in guest**: Not feasible — guest controls
spawn timing.
**Option C — Inject the callback eagerly + spin tid=11 forward**: Tick the
audio loop hundreds of times immediately at register. But tid=11 blocks on
guest objects that need tid=9/10 to be running already.
**Option D — Match canary's actual concurrency model**: Spawn tid=11 as a
native HOST thread that runs `processor_->Execute(callback)`-equivalent. This
is a significant rework of the threading model.
**Option E — Identify the specific guest write that clears +0x164 in canary**:
Disassemble sub_824D6640 (the callback) and find the store. Then ensure ours's
execution of the callback reaches that store before tid=9/10 spin. Requires
fixing the deeper scheduling-ordering issue.
None of these is a 30-150 LOC fix. All require either:
- Architectural threading-model changes
- Sub-cycle ordering control between guest threads
- Deep guest-code disassembly + emulation of the byte-clear path
## Recommendation
This session declines to land a fix. The session brief's hypothesis (missing
host-side write to clear [r31+356]) is empirically wrong: the byte is owned
by guest code in both engines. The deferred AUDIT-048 cascade C is correctly
deferred — the necessary work is **scheduling-ordering matching**, not
host-bridge wiring.
Next-session recommendation: probe-instrument the byte at `r31+0x164` for the
XAudio voice (around `0x42511040` ours, similar address canary) on FIRST guest
write. Identify in canary trace which PC writes the byte and on which tid.
That's the actual fix target.
## Per-chain delta (no change)
| chain | pre | post | delta |
|------:|----:|-----:|------:|
| main tid=6→1 | 105,046 | 105,046 | 0 |
| sister tid=14→9 | 41 | 41 | 0 |
| sister tid=15→10 | 16 | 16 | 0 |
| sister tid=4→4 | preserved | preserved | 0 |
| sister tid=11→11 | preserved | preserved | 0 |
| sister tid=16→16 | preserved | preserved | 0 |
| **swaps** | **1** | **1** | **0** |
| **draws** | **0** | **0** | **0** |
## Artifacts
- `investigation.md` (this file)
- Cold trace: `/tmp/ours-cold.jsonl` (121k events, halt-on-deadlock after 100M instr cap)
- Memory dumps captured via `--dump-addr 0x42511040` and `--dump-addr 0x42510edc`
- Phase B `image_canonical_sha256 = ea8d160e…` UNCHANGED (no engine modification)