Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.3 KiB
Phase Host-Audio-Bridge — Investigation (2026-05-19)
Outcome: AUDIT-048 cascade C NOT landed this session. Root cause is upstream state divergence in XAudio voice struct initialization, NOT a missing host-side write. Progression metric (swaps=1, draws=0) unchanged. Per-chain matched-prefix unchanged.
Diagnosis
Canary semantics (verified)
xenia-canary/src/xenia/apu/audio_system.cc:84-159—AudioSystem::WorkerThreadMainis a HOST thread that loops:WaitAny(client_semaphores_)→processor_->Execute(callback). Semaphores are seeded byRegisterClientline 210:client_semaphore->Release(queued_frames_=8, nullptr).- After SDL plays a frame,
sdl_audio_driver.cc:199callssemaphore_->Release(1)— re-arming the loop. With--mute=true, SDL still consumes frames viaSDLCallbackand still releases. - There is NO host-side write to a guest field at offset +356. The XAudio voice struct
at
r31is GUEST-allocated and managed entirely by the GUEST callback code at0x824D6640and the XAudio worker thread bodies at0x824D2878 / 0x824D2940.
Ours's current state
- AUDIT-048 Plan B (dedicated guest worker thread, parked on synthetic handle, injected
by ticker) is wired in
xenia-kernel/src/exports.rs:4048-4168+xenia-kernel/src/xaudio.rsxenia-app/src/main.rs:3461-3536.
- Ours's tid=11 (entry=0x824D6640 = the registered callback) DOES execute. Per
deadlock dump:
pc=0x824d2a94 lr=0x824d2a94 state=Blocked(WaitAny { handles: [0x82928B04, 0x82928AE0] })— the callback calledKeWaitForMultipleObjectson two guest dispatchers and is now waiting. xaudio.callback.delivered=1— only one injection, becauseis_in_callbackstays true while tid=11 is blocked on the real handles (the saved context is only cleared onLR_HALT_SENTINELreturn, which tid=11 never reaches).
The spin loop in tid=9/10 (the XAudio worker guest threads)
PCs 0x824D1400-0x824D141C (canary tid=14 / ours tid=9):
0x824d1400: beqlr cr6 ; return if cr6.eq (success)
0x824d1404: cmpd cr6, r10, r11 ; compare r10 with r11
0x824d1408: beq cr6, 0x824D1420 ; ok-branch on match
0x824d140c: mr r31, r31 ; yield nop
0x824d1410: ld r11, 0(r4) ; reload [r4]
0x824d1414: cmpdi cr6, r11, 0 ; check r11 == 0
0x824d1418: bne cr6, 0x824D1404 ; loop if nonzero
0x824d141c: blr ; return on r11 == 0
LR=0x824D22B4 (caller does addi r4, r31, 356; bl ...; <0x824D22B4 is next>).
Live runtime probe (ours, 100M instr, --dump-addr 0x42511040 and 0x42510edc)
At halt (tid=9 still spinning at pc=0x824d140c):
r3 = r31 = 0x42510edc(an XAudio voice/driver struct in heap-mapped guest mem).r4 = r31 + 0x164 (=356) = 0x42511040.r10 = 0x01010000(expected success value).- Last-known
r11 = 0x00000000(from the load) — but the spin continues, so the value at[r4]keeps changing? Or the snapshot doesn't reflect steady state.
Memory dump at 0x42511040:
+0x00: 01 00 00 00 00 00 00 00 → ld interpretation = 0x0100000000000000
+0x10: 00 00 00 03 42 51 10 54 → linked-list head with 3 entries
+0x20...: list nodes with prev/next pointers in 0x4251xxxx range
This is a GUEST-OWNED linked-list / voice-state struct. Byte 0 = 0x01 is
clearly a "state flag" that distinguishes the poll target. ld reads 8 bytes
BE → 0x0100000000000000. r10 expected is 0x01010000 (zero-extended). r11 read
is 0x0100000000000000. Not equal, not zero → spin.
Memory dump at r31=0x42510edc:
+0x00: 82 00 6c f4 → VTABLE POINTER (code at .rdata 0x82006cf4)
+0x04: 00 00 00 02 → refcount or count
+0x08: 42 51 0e c0 → back-pointer
+0x40: 41 ea 0d 5c → matches XAudio register callback_arg
So r31 is the XAudio voice object Sylpheed allocates and passes as the callback
argument. The vtable at 0x82006cf4 is ANON_Class_* per sylpheed.db. The
voice owns a linked list (head at +0x164) that tracks audio buffers / voice state.
Why ours diverges from canary
Ours's tid=9 sees [r31+356] as 0x0100000000000000; canary's tid=14 sees it as
0x0000000000000000 (or 0x0000000001010000 matching r10). Both engines run
identical guest code starting from the same .data values. So the divergence
must be a kernel-call return value OR a memory write that happens
between thread spawn and the spin loop.
Per cross-trace of tid=9 events idx 0..76 vs canary tid=14 events 0..~80, the kernel return values match (KeWaitForSingleObject→0, KeRaiseIrqlToDpcLevel returns the same sequence 0,2,2,2 etc., KfLowerIrql→0). The setup chain emits the same events. But host_ns wall-clock diverges: canary's KeWaitForSingleObject blocks for ~85ms (1727→1813 ms); ours's wait returns in ~7 microseconds (1603890→1603904 ns).
The root cause class
This is upstream scheduling divergence, not host-bridge missing:
- In canary: tid=11 (host AudioSystem WorkerThreadMain) starts FIRST and runs the callback at 0x824D6640. The callback modifies the XAudio voice struct (clearing byte at +0x164). Then tid=14 spawns, hits the spin, sees zero, proceeds.
- In ours: tid=9/10 are spawned by main and resumed via
KeResumeThread. They start running BEFORE the audio ticker (period 48,000 instructions) ever fires. tid=9 hits the spin loop with the struct in its uninitialized state (byte +0x164 = 0x01). Stuck forever.
The audio worker (tid=11) DOES eventually get injected and runs, but by then tid=9/10 are stuck in the spin loop and the callback blocks on guest dispatchers that only tid=9/10 can signal — circular deadlock.
Why a host-side write is the wrong fix
The session brief hypothesizes a missing host-side write to clear [r31+356].
This is not correct:
- Canary's host audio worker does NOT write to any guest VA in the +356 range.
It only calls
processor_->Execute(callback)and waits on its host semaphore. - The byte at offset 0x164 of the voice struct is touched only by GUEST code (the callback or the worker functions). No host code in either engine reaches into that field.
- The "missing write" framing came from assuming the host audio worker does something analogous to SubmitFrame's buffer-complete bookkeeping. SubmitFrame only acks the host SDL driver semaphore (line 199); it does not modify the voice struct at +356.
Writing 0 to [r31+356] from host code would be a band-aid that crosses
reading-error #23 (matching divergent guest behavior) and risks corrupting
the voice struct's invariants.
What the correct fix shape would be
To make ours converge to canary's behavior, the audio worker callback at 0x824D6640 needs to RUN AND COMPLETE before tid=9/10 reach the spin loop.
Option A — Force-fire callback at register time: Inside
xaudio_register_render_driver, after spawning tid=11, synchronously execute
the callback to completion (treat as a synchronous shim). Tricky because the
callback calls KeWaitForMultipleObjects which would block.
Option B — Defer spawn of tid=9/10 in guest: Not feasible — guest controls spawn timing.
Option C — Inject the callback eagerly + spin tid=11 forward: Tick the audio loop hundreds of times immediately at register. But tid=11 blocks on guest objects that need tid=9/10 to be running already.
Option D — Match canary's actual concurrency model: Spawn tid=11 as a
native HOST thread that runs processor_->Execute(callback)-equivalent. This
is a significant rework of the threading model.
Option E — Identify the specific guest write that clears +0x164 in canary: Disassemble sub_824D6640 (the callback) and find the store. Then ensure ours's execution of the callback reaches that store before tid=9/10 spin. Requires fixing the deeper scheduling-ordering issue.
None of these is a 30-150 LOC fix. All require either:
- Architectural threading-model changes
- Sub-cycle ordering control between guest threads
- Deep guest-code disassembly + emulation of the byte-clear path
Recommendation
This session declines to land a fix. The session brief's hypothesis (missing host-side write to clear [r31+356]) is empirically wrong: the byte is owned by guest code in both engines. The deferred AUDIT-048 cascade C is correctly deferred — the necessary work is scheduling-ordering matching, not host-bridge wiring.
Next-session recommendation: probe-instrument the byte at r31+0x164 for the
XAudio voice (around 0x42511040 ours, similar address canary) on FIRST guest
write. Identify in canary trace which PC writes the byte and on which tid.
That's the actual fix target.
Per-chain delta (no change)
| chain | pre | post | delta |
|---|---|---|---|
| main tid=6→1 | 105,046 | 105,046 | 0 |
| sister tid=14→9 | 41 | 41 | 0 |
| sister tid=15→10 | 16 | 16 | 0 |
| sister tid=4→4 | preserved | preserved | 0 |
| sister tid=11→11 | preserved | preserved | 0 |
| sister tid=16→16 | preserved | preserved | 0 |
| swaps | 1 | 1 | 0 |
| draws | 0 | 0 | 0 |
Artifacts
investigation.md(this file)- Cold trace:
/tmp/ours-cold.jsonl(121k events, halt-on-deadlock after 100M instr cap) - Memory dumps captured via
--dump-addr 0x42511040and--dump-addr 0x42510edc - Phase B
image_canonical_sha256 = ea8d160e…UNCHANGED (no engine modification)