Files
xenia-rs/audit-runs/phase-host-audio-eager/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

119 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase Host-Audio-Eager — Investigation (2026-05-19)
## Phase 0: Plan
### Canary's XHostThread setup (verified from source)
- `AudioSystem::AudioSystem` (`xenia-canary/src/xenia/apu/audio_system.cc:48-69`):
constructs 8 host semaphores (`client_semaphores_[i]`) at engine init time.
Each is `Semaphore::Create(initial=0, max=queued_frames_=8)`.
- `AudioSystem::Setup` (line 77-98): spawns `XHostThread "Audio Worker"`
running `WorkerThreadMain` IMMEDIATELY. This is a HOST OS thread, not a
guest thread. Runs continuously throughout engine lifetime.
- `WorkerThreadMain` (line 100-159): loops on `WaitAny(client_semaphores_, ...)`
→ on wake, calls `processor_->Execute(thread_state, client_callback, args)`
which runs the guest callback IN-LINE on the host worker thread.
- `RegisterClient` (line 202-237): the moment a client registers, it
`client_semaphore->Release(queued_frames_=8)` (line 210), seeding 8
semaphore permits. The already-running worker thread then drains these
in a tight loop: callback runs, returns, semaphore decremented, repeat.
8 callback invocations happen BEFORE `RegisterClient` even returns (or
shortly after).
- After SDL plays a frame, `sdl_audio_driver.cc:199` releases ONE permit,
re-arming the loop. Under `--mute=true`, SDL still drains and releases.
### Ours's current ticker model (verified from source)
- `main.rs:2125-2131` (round prologue): each round, if any client is
registered, `xaudio.tick_instr(stats.instruction_count)` adds the delta
of executed instructions to an accumulator; when accumulator crosses
`XAUDIO_INSTR_PERIOD=48_000`, it enqueues one fire per registered
client and decrements the accumulator.
- `main.rs:2135-2137`: `try_inject_audio_callback` then pulls one fire
off the queue and injects it into the dedicated audio worker thread
(parked on a synthetic handle), but only if `is_in_callback()` is false
(mutex with graphics interrupts).
- Worker thread spawned at register time (`exports.rs:4084-4160`) with
PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips
state to ServicingIrq with `pc=callback_pc`, runs callback, returns
to LR_HALT, restore path re-blocks worker on synthetic.
### The ordering problem
Sylpheed boot sequence (verified per prior agent's traces):
1. tid=1 main calls `XAudioRegisterRenderDriverClient` → ours registers
client at slot 0, spawns worker (tid=11), enqueues NOTHING.
2. tid=1 main continues executing thousands of instructions.
3. tid=1 main calls `ExCreateThread` for XAudio worker threads → tid=9
and tid=10 spawn. They start spinning on the uninitialized voice
struct at `[r31+356]`.
4. **48,000 instructions after register**, the ticker finally fires,
enqueueing one buffer-complete callback.
5. Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback
calls `KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])` and
blocks. These dispatchers can only be signaled by tid=9/10.
6. tid=9/10 are stuck spinning → tid=11 stuck waiting → **circular
deadlock**.
In canary: the worker is HOST-threaded and starts running BEFORE
tid=1 even reaches the register call. Register seeds 8 permits → worker
drains 8 callback invocations. By the time tid=14/15 spawn, the voice
struct's `[r31+356]` field has been modified by 8 callback runs and is
in a state where tid=14/15 take a different (non-spinning) control-flow
path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those
dispatchers — because the voice state is different.
### Implementation steps
1. At `XAudioRegisterRenderDriverClient` after worker spawn succeeds,
eagerly enqueue 8 fires (matching canary `queued_frames_=8`) into
`state.xaudio.pending`. The ticker's existing per-round drain plus
the existing `try_inject_audio_callback` will then deliver these
8 callbacks across subsequent rounds — but they will fire WITHIN
the first few thousand instructions of register-return, well
before tid=9/10 spawn.
2. Eagerly fire the audio injector once at the END of the register
handler. The round prologue normally calls
`try_inject_audio_callback` once per round; this gives us +1
immediate fire to maximize the chance of callback completion
before tid=1 continues to spawn tid=9/10.
3. Update `enqueue_all_active` to NOT enqueue if queue is at cap (it
already does this; we just rely on it).
4. Add 2-3 unit tests covering the eager-seed behavior in
`XAudioState`.
5. Document the change in the existing register-handler block comment.
### Risks
- **Determinism shift**: cold digest WILL change (8 extra fires
re-order the round prologue's audio injection). Capture new
digest, validate 3× reproducibility.
- **Worker blocks on first callback** (per prior agent's
diagnosis): if tid=11's first callback blocks immediately on
`KeWaitForMultipleObjects`, then queue depth 8 doesn't matter —
fires 2-8 sit unused because `is_in_callback()` stays true. In
that case progression metric won't move. This is an empirical
question, not predictable from static analysis. The brief
explicitly says "if the fix lands cleanly but progression
doesn't move, that's the answer."
- **Phase B image_canonical_sha256**: unchanged (no changes to
image-load path).
- **Sister chains**: tid=14→9 / tid=15→10 are the targets. Other
chains (tid=11/16/4) may shift due to scheduling re-ordering.
## Phase 1: Execution log (filled during implementation)
[See fix.diff for the actual code changes]
## Phase 2: Validation (filled after cold runs)
[See re-validation.md and digests/]
## Phase 3: Outcome (filled after measurement)
[See summary.md]