# Phase Host-Audio-Eager — Investigation (2026-05-19) ## Phase 0: Plan ### Canary's XHostThread setup (verified from source) - `AudioSystem::AudioSystem` (`xenia-canary/src/xenia/apu/audio_system.cc:48-69`): constructs 8 host semaphores (`client_semaphores_[i]`) at engine init time. Each is `Semaphore::Create(initial=0, max=queued_frames_=8)`. - `AudioSystem::Setup` (line 77-98): spawns `XHostThread "Audio Worker"` running `WorkerThreadMain` IMMEDIATELY. This is a HOST OS thread, not a guest thread. Runs continuously throughout engine lifetime. - `WorkerThreadMain` (line 100-159): loops on `WaitAny(client_semaphores_, ...)` → on wake, calls `processor_->Execute(thread_state, client_callback, args)` which runs the guest callback IN-LINE on the host worker thread. - `RegisterClient` (line 202-237): the moment a client registers, it `client_semaphore->Release(queued_frames_=8)` (line 210), seeding 8 semaphore permits. The already-running worker thread then drains these in a tight loop: callback runs, returns, semaphore decremented, repeat. 8 callback invocations happen BEFORE `RegisterClient` even returns (or shortly after). - After SDL plays a frame, `sdl_audio_driver.cc:199` releases ONE permit, re-arming the loop. Under `--mute=true`, SDL still drains and releases. ### Ours's current ticker model (verified from source) - `main.rs:2125-2131` (round prologue): each round, if any client is registered, `xaudio.tick_instr(stats.instruction_count)` adds the delta of executed instructions to an accumulator; when accumulator crosses `XAUDIO_INSTR_PERIOD=48_000`, it enqueues one fire per registered client and decrements the accumulator. - `main.rs:2135-2137`: `try_inject_audio_callback` then pulls one fire off the queue and injects it into the dedicated audio worker thread (parked on a synthetic handle), but only if `is_in_callback()` is false (mutex with graphics interrupts). - Worker thread spawned at register time (`exports.rs:4084-4160`) with PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips state to ServicingIrq with `pc=callback_pc`, runs callback, returns to LR_HALT, restore path re-blocks worker on synthetic. ### The ordering problem Sylpheed boot sequence (verified per prior agent's traces): 1. tid=1 main calls `XAudioRegisterRenderDriverClient` → ours registers client at slot 0, spawns worker (tid=11), enqueues NOTHING. 2. tid=1 main continues executing thousands of instructions. 3. tid=1 main calls `ExCreateThread` for XAudio worker threads → tid=9 and tid=10 spawn. They start spinning on the uninitialized voice struct at `[r31+356]`. 4. **48,000 instructions after register**, the ticker finally fires, enqueueing one buffer-complete callback. 5. Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback calls `KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])` and blocks. These dispatchers can only be signaled by tid=9/10. 6. tid=9/10 are stuck spinning → tid=11 stuck waiting → **circular deadlock**. In canary: the worker is HOST-threaded and starts running BEFORE tid=1 even reaches the register call. Register seeds 8 permits → worker drains 8 callback invocations. By the time tid=14/15 spawn, the voice struct's `[r31+356]` field has been modified by 8 callback runs and is in a state where tid=14/15 take a different (non-spinning) control-flow path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those dispatchers — because the voice state is different. ### Implementation steps 1. At `XAudioRegisterRenderDriverClient` after worker spawn succeeds, eagerly enqueue 8 fires (matching canary `queued_frames_=8`) into `state.xaudio.pending`. The ticker's existing per-round drain plus the existing `try_inject_audio_callback` will then deliver these 8 callbacks across subsequent rounds — but they will fire WITHIN the first few thousand instructions of register-return, well before tid=9/10 spawn. 2. Eagerly fire the audio injector once at the END of the register handler. The round prologue normally calls `try_inject_audio_callback` once per round; this gives us +1 immediate fire to maximize the chance of callback completion before tid=1 continues to spawn tid=9/10. 3. Update `enqueue_all_active` to NOT enqueue if queue is at cap (it already does this; we just rely on it). 4. Add 2-3 unit tests covering the eager-seed behavior in `XAudioState`. 5. Document the change in the existing register-handler block comment. ### Risks - **Determinism shift**: cold digest WILL change (8 extra fires re-order the round prologue's audio injection). Capture new digest, validate 3× reproducibility. - **Worker blocks on first callback** (per prior agent's diagnosis): if tid=11's first callback blocks immediately on `KeWaitForMultipleObjects`, then queue depth 8 doesn't matter — fires 2-8 sit unused because `is_in_callback()` stays true. In that case progression metric won't move. This is an empirical question, not predictable from static analysis. The brief explicitly says "if the fix lands cleanly but progression doesn't move, that's the answer." - **Phase B image_canonical_sha256**: unchanged (no changes to image-load path). - **Sister chains**: tid=14→9 / tid=15→10 are the targets. Other chains (tid=11/16/4) may shift due to scheduling re-ordering. ## Phase 1: Execution log (filled during implementation) [See fix.diff for the actual code changes] ## Phase 2: Validation (filled after cold runs) [See re-validation.md and digests/] ## Phase 3: Outcome (filled after measurement) [See summary.md]