Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.5 KiB
Phase Host-Audio-Eager — Investigation (2026-05-19)
Phase 0: Plan
Canary's XHostThread setup (verified from source)
AudioSystem::AudioSystem(xenia-canary/src/xenia/apu/audio_system.cc:48-69): constructs 8 host semaphores (client_semaphores_[i]) at engine init time. Each isSemaphore::Create(initial=0, max=queued_frames_=8).AudioSystem::Setup(line 77-98): spawnsXHostThread "Audio Worker"runningWorkerThreadMainIMMEDIATELY. This is a HOST OS thread, not a guest thread. Runs continuously throughout engine lifetime.WorkerThreadMain(line 100-159): loops onWaitAny(client_semaphores_, ...)→ on wake, callsprocessor_->Execute(thread_state, client_callback, args)which runs the guest callback IN-LINE on the host worker thread.RegisterClient(line 202-237): the moment a client registers, itclient_semaphore->Release(queued_frames_=8)(line 210), seeding 8 semaphore permits. The already-running worker thread then drains these in a tight loop: callback runs, returns, semaphore decremented, repeat. 8 callback invocations happen BEFORERegisterClienteven returns (or shortly after).- After SDL plays a frame,
sdl_audio_driver.cc:199releases ONE permit, re-arming the loop. Under--mute=true, SDL still drains and releases.
Ours's current ticker model (verified from source)
main.rs:2125-2131(round prologue): each round, if any client is registered,xaudio.tick_instr(stats.instruction_count)adds the delta of executed instructions to an accumulator; when accumulator crossesXAUDIO_INSTR_PERIOD=48_000, it enqueues one fire per registered client and decrements the accumulator.main.rs:2135-2137:try_inject_audio_callbackthen pulls one fire off the queue and injects it into the dedicated audio worker thread (parked on a synthetic handle), but only ifis_in_callback()is false (mutex with graphics interrupts).- Worker thread spawned at register time (
exports.rs:4084-4160) with PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips state to ServicingIrq withpc=callback_pc, runs callback, returns to LR_HALT, restore path re-blocks worker on synthetic.
The ordering problem
Sylpheed boot sequence (verified per prior agent's traces):
- tid=1 main calls
XAudioRegisterRenderDriverClient→ ours registers client at slot 0, spawns worker (tid=11), enqueues NOTHING. - tid=1 main continues executing thousands of instructions.
- tid=1 main calls
ExCreateThreadfor XAudio worker threads → tid=9 and tid=10 spawn. They start spinning on the uninitialized voice struct at[r31+356]. - 48,000 instructions after register, the ticker finally fires, enqueueing one buffer-complete callback.
- Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback
calls
KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])and blocks. These dispatchers can only be signaled by tid=9/10. - tid=9/10 are stuck spinning → tid=11 stuck waiting → circular deadlock.
In canary: the worker is HOST-threaded and starts running BEFORE
tid=1 even reaches the register call. Register seeds 8 permits → worker
drains 8 callback invocations. By the time tid=14/15 spawn, the voice
struct's [r31+356] field has been modified by 8 callback runs and is
in a state where tid=14/15 take a different (non-spinning) control-flow
path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those
dispatchers — because the voice state is different.
Implementation steps
-
At
XAudioRegisterRenderDriverClientafter worker spawn succeeds, eagerly enqueue 8 fires (matching canaryqueued_frames_=8) intostate.xaudio.pending. The ticker's existing per-round drain plus the existingtry_inject_audio_callbackwill then deliver these 8 callbacks across subsequent rounds — but they will fire WITHIN the first few thousand instructions of register-return, well before tid=9/10 spawn. -
Eagerly fire the audio injector once at the END of the register handler. The round prologue normally calls
try_inject_audio_callbackonce per round; this gives us +1 immediate fire to maximize the chance of callback completion before tid=1 continues to spawn tid=9/10. -
Update
enqueue_all_activeto NOT enqueue if queue is at cap (it already does this; we just rely on it). -
Add 2-3 unit tests covering the eager-seed behavior in
XAudioState. -
Document the change in the existing register-handler block comment.
Risks
- Determinism shift: cold digest WILL change (8 extra fires re-order the round prologue's audio injection). Capture new digest, validate 3× reproducibility.
- Worker blocks on first callback (per prior agent's
diagnosis): if tid=11's first callback blocks immediately on
KeWaitForMultipleObjects, then queue depth 8 doesn't matter — fires 2-8 sit unused becauseis_in_callback()stays true. In that case progression metric won't move. This is an empirical question, not predictable from static analysis. The brief explicitly says "if the fix lands cleanly but progression doesn't move, that's the answer." - Phase B image_canonical_sha256: unchanged (no changes to image-load path).
- Sister chains: tid=14→9 / tid=15→10 are the targets. Other chains (tid=11/16/4) may shift due to scheduling re-ordering.
Phase 1: Execution log (filled during implementation)
[See fix.diff for the actual code changes]
Phase 2: Validation (filled after cold runs)
[See re-validation.md and digests/]
Phase 3: Outcome (filled after measurement)
[See summary.md]