Files
xenia-rs/audit-runs/phase-host-audio-eager/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.5 KiB
Raw Blame History

Phase Host-Audio-Eager — Investigation (2026-05-19)

Phase 0: Plan

Canary's XHostThread setup (verified from source)

  • AudioSystem::AudioSystem (xenia-canary/src/xenia/apu/audio_system.cc:48-69): constructs 8 host semaphores (client_semaphores_[i]) at engine init time. Each is Semaphore::Create(initial=0, max=queued_frames_=8).
  • AudioSystem::Setup (line 77-98): spawns XHostThread "Audio Worker" running WorkerThreadMain IMMEDIATELY. This is a HOST OS thread, not a guest thread. Runs continuously throughout engine lifetime.
  • WorkerThreadMain (line 100-159): loops on WaitAny(client_semaphores_, ...) → on wake, calls processor_->Execute(thread_state, client_callback, args) which runs the guest callback IN-LINE on the host worker thread.
  • RegisterClient (line 202-237): the moment a client registers, it client_semaphore->Release(queued_frames_=8) (line 210), seeding 8 semaphore permits. The already-running worker thread then drains these in a tight loop: callback runs, returns, semaphore decremented, repeat. 8 callback invocations happen BEFORE RegisterClient even returns (or shortly after).
  • After SDL plays a frame, sdl_audio_driver.cc:199 releases ONE permit, re-arming the loop. Under --mute=true, SDL still drains and releases.

Ours's current ticker model (verified from source)

  • main.rs:2125-2131 (round prologue): each round, if any client is registered, xaudio.tick_instr(stats.instruction_count) adds the delta of executed instructions to an accumulator; when accumulator crosses XAUDIO_INSTR_PERIOD=48_000, it enqueues one fire per registered client and decrements the accumulator.
  • main.rs:2135-2137: try_inject_audio_callback then pulls one fire off the queue and injects it into the dedicated audio worker thread (parked on a synthetic handle), but only if is_in_callback() is false (mutex with graphics interrupts).
  • Worker thread spawned at register time (exports.rs:4084-4160) with PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips state to ServicingIrq with pc=callback_pc, runs callback, returns to LR_HALT, restore path re-blocks worker on synthetic.

The ordering problem

Sylpheed boot sequence (verified per prior agent's traces):

  1. tid=1 main calls XAudioRegisterRenderDriverClient → ours registers client at slot 0, spawns worker (tid=11), enqueues NOTHING.
  2. tid=1 main continues executing thousands of instructions.
  3. tid=1 main calls ExCreateThread for XAudio worker threads → tid=9 and tid=10 spawn. They start spinning on the uninitialized voice struct at [r31+356].
  4. 48,000 instructions after register, the ticker finally fires, enqueueing one buffer-complete callback.
  5. Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback calls KeWaitForMultipleObjects([0x82928B04, 0x82928AE0]) and blocks. These dispatchers can only be signaled by tid=9/10.
  6. tid=9/10 are stuck spinning → tid=11 stuck waiting → circular deadlock.

In canary: the worker is HOST-threaded and starts running BEFORE tid=1 even reaches the register call. Register seeds 8 permits → worker drains 8 callback invocations. By the time tid=14/15 spawn, the voice struct's [r31+356] field has been modified by 8 callback runs and is in a state where tid=14/15 take a different (non-spinning) control-flow path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those dispatchers — because the voice state is different.

Implementation steps

  1. At XAudioRegisterRenderDriverClient after worker spawn succeeds, eagerly enqueue 8 fires (matching canary queued_frames_=8) into state.xaudio.pending. The ticker's existing per-round drain plus the existing try_inject_audio_callback will then deliver these 8 callbacks across subsequent rounds — but they will fire WITHIN the first few thousand instructions of register-return, well before tid=9/10 spawn.

  2. Eagerly fire the audio injector once at the END of the register handler. The round prologue normally calls try_inject_audio_callback once per round; this gives us +1 immediate fire to maximize the chance of callback completion before tid=1 continues to spawn tid=9/10.

  3. Update enqueue_all_active to NOT enqueue if queue is at cap (it already does this; we just rely on it).

  4. Add 2-3 unit tests covering the eager-seed behavior in XAudioState.

  5. Document the change in the existing register-handler block comment.

Risks

  • Determinism shift: cold digest WILL change (8 extra fires re-order the round prologue's audio injection). Capture new digest, validate 3× reproducibility.
  • Worker blocks on first callback (per prior agent's diagnosis): if tid=11's first callback blocks immediately on KeWaitForMultipleObjects, then queue depth 8 doesn't matter — fires 2-8 sit unused because is_in_callback() stays true. In that case progression metric won't move. This is an empirical question, not predictable from static analysis. The brief explicitly says "if the fix lands cleanly but progression doesn't move, that's the answer."
  • Phase B image_canonical_sha256: unchanged (no changes to image-load path).
  • Sister chains: tid=14→9 / tid=15→10 are the targets. Other chains (tid=11/16/4) may shift due to scheduling re-ordering.

Phase 1: Execution log (filled during implementation)

[See fix.diff for the actual code changes]

Phase 2: Validation (filled after cold runs)

[See re-validation.md and digests/]

Phase 3: Outcome (filled after measurement)

[See summary.md]