Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
197 lines
9.3 KiB
Markdown
197 lines
9.3 KiB
Markdown
# Phase Host-Audio-Bridge — Investigation (2026-05-19)
|
|
|
|
**Outcome**: AUDIT-048 cascade C NOT landed this session. Root cause is upstream
|
|
state divergence in XAudio voice struct initialization, NOT a missing host-side
|
|
write. Progression metric (swaps=1, draws=0) unchanged. Per-chain matched-prefix
|
|
unchanged.
|
|
|
|
## Diagnosis
|
|
|
|
### Canary semantics (verified)
|
|
|
|
- `xenia-canary/src/xenia/apu/audio_system.cc:84-159` — `AudioSystem::WorkerThreadMain`
|
|
is a HOST thread that loops: `WaitAny(client_semaphores_)` → `processor_->Execute(callback)`.
|
|
Semaphores are seeded by `RegisterClient` line 210: `client_semaphore->Release(queued_frames_=8, nullptr)`.
|
|
- After SDL plays a frame, `sdl_audio_driver.cc:199` calls `semaphore_->Release(1)` — re-arming
|
|
the loop. With `--mute=true`, SDL still consumes frames via `SDLCallback` and still releases.
|
|
- **There is NO host-side write to a guest field at offset +356**. The XAudio voice struct
|
|
at `r31` is GUEST-allocated and managed entirely by the GUEST callback code at
|
|
`0x824D6640` and the XAudio worker thread bodies at `0x824D2878 / 0x824D2940`.
|
|
|
|
### Ours's current state
|
|
|
|
- AUDIT-048 Plan B (dedicated guest worker thread, parked on synthetic handle, injected
|
|
by ticker) is wired in `xenia-kernel/src/exports.rs:4048-4168` + `xenia-kernel/src/xaudio.rs`
|
|
+ `xenia-app/src/main.rs:3461-3536`.
|
|
- Ours's tid=11 (entry=0x824D6640 = the registered callback) DOES execute. Per
|
|
deadlock dump: `pc=0x824d2a94 lr=0x824d2a94 state=Blocked(WaitAny { handles:
|
|
[0x82928B04, 0x82928AE0] })` — the callback called `KeWaitForMultipleObjects`
|
|
on two guest dispatchers and is now waiting.
|
|
- `xaudio.callback.delivered=1` — only one injection, because `is_in_callback`
|
|
stays true while tid=11 is blocked on the real handles (the saved context is
|
|
only cleared on `LR_HALT_SENTINEL` return, which tid=11 never reaches).
|
|
|
|
### The spin loop in tid=9/10 (the XAudio worker guest threads)
|
|
|
|
PCs `0x824D1400-0x824D141C` (canary tid=14 / ours tid=9):
|
|
```
|
|
0x824d1400: beqlr cr6 ; return if cr6.eq (success)
|
|
0x824d1404: cmpd cr6, r10, r11 ; compare r10 with r11
|
|
0x824d1408: beq cr6, 0x824D1420 ; ok-branch on match
|
|
0x824d140c: mr r31, r31 ; yield nop
|
|
0x824d1410: ld r11, 0(r4) ; reload [r4]
|
|
0x824d1414: cmpdi cr6, r11, 0 ; check r11 == 0
|
|
0x824d1418: bne cr6, 0x824D1404 ; loop if nonzero
|
|
0x824d141c: blr ; return on r11 == 0
|
|
```
|
|
|
|
LR=0x824D22B4 (caller does `addi r4, r31, 356; bl ...; <0x824D22B4 is next>`).
|
|
|
|
### Live runtime probe (ours, 100M instr, --dump-addr 0x42511040 and 0x42510edc)
|
|
|
|
At halt (tid=9 still spinning at pc=0x824d140c):
|
|
- `r3 = r31 = 0x42510edc` (an XAudio voice/driver struct in heap-mapped guest mem).
|
|
- `r4 = r31 + 0x164 (=356) = 0x42511040`.
|
|
- `r10 = 0x01010000` (expected success value).
|
|
- Last-known `r11 = 0x00000000` (from the load) — **but the spin continues**, so
|
|
the value at `[r4]` keeps changing? Or the snapshot doesn't reflect steady state.
|
|
|
|
Memory dump at `0x42511040`:
|
|
```
|
|
+0x00: 01 00 00 00 00 00 00 00 → ld interpretation = 0x0100000000000000
|
|
+0x10: 00 00 00 03 42 51 10 54 → linked-list head with 3 entries
|
|
+0x20...: list nodes with prev/next pointers in 0x4251xxxx range
|
|
```
|
|
|
|
This is a **GUEST-OWNED linked-list / voice-state struct**. Byte 0 = 0x01 is
|
|
clearly a "state flag" that distinguishes the poll target. `ld` reads 8 bytes
|
|
BE → 0x0100000000000000. r10 expected is 0x01010000 (zero-extended). r11 read
|
|
is 0x0100000000000000. Not equal, not zero → spin.
|
|
|
|
Memory dump at `r31=0x42510edc`:
|
|
```
|
|
+0x00: 82 00 6c f4 → VTABLE POINTER (code at .rdata 0x82006cf4)
|
|
+0x04: 00 00 00 02 → refcount or count
|
|
+0x08: 42 51 0e c0 → back-pointer
|
|
+0x40: 41 ea 0d 5c → matches XAudio register callback_arg
|
|
```
|
|
|
|
So `r31` is the XAudio voice object Sylpheed allocates and passes as the callback
|
|
argument. The vtable at `0x82006cf4` is `ANON_Class_*` per `sylpheed.db`. The
|
|
voice owns a linked list (head at +0x164) that tracks audio buffers / voice state.
|
|
|
|
### Why ours diverges from canary
|
|
|
|
Ours's tid=9 sees `[r31+356]` as `0x0100000000000000`; canary's tid=14 sees it as
|
|
`0x0000000000000000` (or `0x0000000001010000` matching r10). Both engines run
|
|
identical guest code starting from the same .data values. So the divergence
|
|
must be a **kernel-call return value** OR a **memory write** that happens
|
|
between thread spawn and the spin loop.
|
|
|
|
Per cross-trace of tid=9 events idx 0..76 vs canary tid=14 events 0..~80,
|
|
the **kernel return values match** (KeWaitForSingleObject→0, KeRaiseIrqlToDpcLevel
|
|
returns the same sequence 0,2,2,2 etc., KfLowerIrql→0). The setup chain emits
|
|
the same events. **But host_ns wall-clock diverges**: canary's KeWaitForSingleObject
|
|
blocks for ~85ms (1727→1813 ms); ours's wait returns in ~7 microseconds
|
|
(1603890→1603904 ns).
|
|
|
|
### The root cause class
|
|
|
|
This is **upstream scheduling divergence**, not host-bridge missing:
|
|
|
|
1. In **canary**: tid=11 (host AudioSystem WorkerThreadMain) starts FIRST and runs
|
|
the callback at 0x824D6640. The callback modifies the XAudio voice struct
|
|
(clearing byte at +0x164). Then tid=14 spawns, hits the spin, sees zero,
|
|
proceeds.
|
|
2. In **ours**: tid=9/10 are spawned by main and resumed via `KeResumeThread`.
|
|
They start running BEFORE the audio ticker (period 48,000 instructions) ever
|
|
fires. tid=9 hits the spin loop with the struct in its uninitialized state
|
|
(byte +0x164 = 0x01). Stuck forever.
|
|
|
|
The audio worker (tid=11) DOES eventually get injected and runs, but by then
|
|
tid=9/10 are stuck in the spin loop and the callback blocks on guest dispatchers
|
|
that only tid=9/10 can signal — circular deadlock.
|
|
|
|
## Why a host-side write is the wrong fix
|
|
|
|
The session brief hypothesizes a missing host-side write to clear `[r31+356]`.
|
|
This is not correct:
|
|
|
|
- Canary's host audio worker does NOT write to any guest VA in the +356 range.
|
|
It only calls `processor_->Execute(callback)` and waits on its host semaphore.
|
|
- The byte at offset 0x164 of the voice struct is touched only by GUEST code
|
|
(the callback or the worker functions). No host code in either engine reaches
|
|
into that field.
|
|
- The "missing write" framing came from assuming the host audio worker does
|
|
something analogous to SubmitFrame's buffer-complete bookkeeping. SubmitFrame
|
|
only acks the host SDL driver semaphore (line 199); it does not modify the
|
|
voice struct at +356.
|
|
|
|
Writing 0 to `[r31+356]` from host code would be a band-aid that crosses
|
|
reading-error #23 (matching divergent guest behavior) and risks corrupting
|
|
the voice struct's invariants.
|
|
|
|
## What the correct fix shape would be
|
|
|
|
To make ours converge to canary's behavior, the audio worker callback at
|
|
0x824D6640 needs to RUN AND COMPLETE before tid=9/10 reach the spin loop.
|
|
|
|
**Option A — Force-fire callback at register time**: Inside
|
|
`xaudio_register_render_driver`, after spawning tid=11, synchronously execute
|
|
the callback to completion (treat as a synchronous shim). Tricky because the
|
|
callback calls KeWaitForMultipleObjects which would block.
|
|
|
|
**Option B — Defer spawn of tid=9/10 in guest**: Not feasible — guest controls
|
|
spawn timing.
|
|
|
|
**Option C — Inject the callback eagerly + spin tid=11 forward**: Tick the
|
|
audio loop hundreds of times immediately at register. But tid=11 blocks on
|
|
guest objects that need tid=9/10 to be running already.
|
|
|
|
**Option D — Match canary's actual concurrency model**: Spawn tid=11 as a
|
|
native HOST thread that runs `processor_->Execute(callback)`-equivalent. This
|
|
is a significant rework of the threading model.
|
|
|
|
**Option E — Identify the specific guest write that clears +0x164 in canary**:
|
|
Disassemble sub_824D6640 (the callback) and find the store. Then ensure ours's
|
|
execution of the callback reaches that store before tid=9/10 spin. Requires
|
|
fixing the deeper scheduling-ordering issue.
|
|
|
|
None of these is a 30-150 LOC fix. All require either:
|
|
- Architectural threading-model changes
|
|
- Sub-cycle ordering control between guest threads
|
|
- Deep guest-code disassembly + emulation of the byte-clear path
|
|
|
|
## Recommendation
|
|
|
|
This session declines to land a fix. The session brief's hypothesis (missing
|
|
host-side write to clear [r31+356]) is empirically wrong: the byte is owned
|
|
by guest code in both engines. The deferred AUDIT-048 cascade C is correctly
|
|
deferred — the necessary work is **scheduling-ordering matching**, not
|
|
host-bridge wiring.
|
|
|
|
Next-session recommendation: probe-instrument the byte at `r31+0x164` for the
|
|
XAudio voice (around `0x42511040` ours, similar address canary) on FIRST guest
|
|
write. Identify in canary trace which PC writes the byte and on which tid.
|
|
That's the actual fix target.
|
|
|
|
## Per-chain delta (no change)
|
|
|
|
| chain | pre | post | delta |
|
|
|------:|----:|-----:|------:|
|
|
| main tid=6→1 | 105,046 | 105,046 | 0 |
|
|
| sister tid=14→9 | 41 | 41 | 0 |
|
|
| sister tid=15→10 | 16 | 16 | 0 |
|
|
| sister tid=4→4 | preserved | preserved | 0 |
|
|
| sister tid=11→11 | preserved | preserved | 0 |
|
|
| sister tid=16→16 | preserved | preserved | 0 |
|
|
| **swaps** | **1** | **1** | **0** |
|
|
| **draws** | **0** | **0** | **0** |
|
|
|
|
## Artifacts
|
|
|
|
- `investigation.md` (this file)
|
|
- Cold trace: `/tmp/ours-cold.jsonl` (121k events, halt-on-deadlock after 100M instr cap)
|
|
- Memory dumps captured via `--dump-addr 0x42511040` and `--dump-addr 0x42510edc`
|
|
- Phase B `image_canonical_sha256 = ea8d160e…` UNCHANGED (no engine modification)
|