handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
118
audit-runs/phase-host-audio-eager/investigation.md
Normal file
118
audit-runs/phase-host-audio-eager/investigation.md
Normal file
@@ -0,0 +1,118 @@
|
||||
# Phase Host-Audio-Eager — Investigation (2026-05-19)
|
||||
|
||||
## Phase 0: Plan
|
||||
|
||||
### Canary's XHostThread setup (verified from source)
|
||||
|
||||
- `AudioSystem::AudioSystem` (`xenia-canary/src/xenia/apu/audio_system.cc:48-69`):
|
||||
constructs 8 host semaphores (`client_semaphores_[i]`) at engine init time.
|
||||
Each is `Semaphore::Create(initial=0, max=queued_frames_=8)`.
|
||||
- `AudioSystem::Setup` (line 77-98): spawns `XHostThread "Audio Worker"`
|
||||
running `WorkerThreadMain` IMMEDIATELY. This is a HOST OS thread, not a
|
||||
guest thread. Runs continuously throughout engine lifetime.
|
||||
- `WorkerThreadMain` (line 100-159): loops on `WaitAny(client_semaphores_, ...)`
|
||||
→ on wake, calls `processor_->Execute(thread_state, client_callback, args)`
|
||||
which runs the guest callback IN-LINE on the host worker thread.
|
||||
- `RegisterClient` (line 202-237): the moment a client registers, it
|
||||
`client_semaphore->Release(queued_frames_=8)` (line 210), seeding 8
|
||||
semaphore permits. The already-running worker thread then drains these
|
||||
in a tight loop: callback runs, returns, semaphore decremented, repeat.
|
||||
8 callback invocations happen BEFORE `RegisterClient` even returns (or
|
||||
shortly after).
|
||||
- After SDL plays a frame, `sdl_audio_driver.cc:199` releases ONE permit,
|
||||
re-arming the loop. Under `--mute=true`, SDL still drains and releases.
|
||||
|
||||
### Ours's current ticker model (verified from source)
|
||||
|
||||
- `main.rs:2125-2131` (round prologue): each round, if any client is
|
||||
registered, `xaudio.tick_instr(stats.instruction_count)` adds the delta
|
||||
of executed instructions to an accumulator; when accumulator crosses
|
||||
`XAUDIO_INSTR_PERIOD=48_000`, it enqueues one fire per registered
|
||||
client and decrements the accumulator.
|
||||
- `main.rs:2135-2137`: `try_inject_audio_callback` then pulls one fire
|
||||
off the queue and injects it into the dedicated audio worker thread
|
||||
(parked on a synthetic handle), but only if `is_in_callback()` is false
|
||||
(mutex with graphics interrupts).
|
||||
- Worker thread spawned at register time (`exports.rs:4084-4160`) with
|
||||
PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips
|
||||
state to ServicingIrq with `pc=callback_pc`, runs callback, returns
|
||||
to LR_HALT, restore path re-blocks worker on synthetic.
|
||||
|
||||
### The ordering problem
|
||||
|
||||
Sylpheed boot sequence (verified per prior agent's traces):
|
||||
1. tid=1 main calls `XAudioRegisterRenderDriverClient` → ours registers
|
||||
client at slot 0, spawns worker (tid=11), enqueues NOTHING.
|
||||
2. tid=1 main continues executing thousands of instructions.
|
||||
3. tid=1 main calls `ExCreateThread` for XAudio worker threads → tid=9
|
||||
and tid=10 spawn. They start spinning on the uninitialized voice
|
||||
struct at `[r31+356]`.
|
||||
4. **48,000 instructions after register**, the ticker finally fires,
|
||||
enqueueing one buffer-complete callback.
|
||||
5. Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback
|
||||
calls `KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])` and
|
||||
blocks. These dispatchers can only be signaled by tid=9/10.
|
||||
6. tid=9/10 are stuck spinning → tid=11 stuck waiting → **circular
|
||||
deadlock**.
|
||||
|
||||
In canary: the worker is HOST-threaded and starts running BEFORE
|
||||
tid=1 even reaches the register call. Register seeds 8 permits → worker
|
||||
drains 8 callback invocations. By the time tid=14/15 spawn, the voice
|
||||
struct's `[r31+356]` field has been modified by 8 callback runs and is
|
||||
in a state where tid=14/15 take a different (non-spinning) control-flow
|
||||
path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those
|
||||
dispatchers — because the voice state is different.
|
||||
|
||||
### Implementation steps
|
||||
|
||||
1. At `XAudioRegisterRenderDriverClient` after worker spawn succeeds,
|
||||
eagerly enqueue 8 fires (matching canary `queued_frames_=8`) into
|
||||
`state.xaudio.pending`. The ticker's existing per-round drain plus
|
||||
the existing `try_inject_audio_callback` will then deliver these
|
||||
8 callbacks across subsequent rounds — but they will fire WITHIN
|
||||
the first few thousand instructions of register-return, well
|
||||
before tid=9/10 spawn.
|
||||
|
||||
2. Eagerly fire the audio injector once at the END of the register
|
||||
handler. The round prologue normally calls
|
||||
`try_inject_audio_callback` once per round; this gives us +1
|
||||
immediate fire to maximize the chance of callback completion
|
||||
before tid=1 continues to spawn tid=9/10.
|
||||
|
||||
3. Update `enqueue_all_active` to NOT enqueue if queue is at cap (it
|
||||
already does this; we just rely on it).
|
||||
|
||||
4. Add 2-3 unit tests covering the eager-seed behavior in
|
||||
`XAudioState`.
|
||||
|
||||
5. Document the change in the existing register-handler block comment.
|
||||
|
||||
### Risks
|
||||
|
||||
- **Determinism shift**: cold digest WILL change (8 extra fires
|
||||
re-order the round prologue's audio injection). Capture new
|
||||
digest, validate 3× reproducibility.
|
||||
- **Worker blocks on first callback** (per prior agent's
|
||||
diagnosis): if tid=11's first callback blocks immediately on
|
||||
`KeWaitForMultipleObjects`, then queue depth 8 doesn't matter —
|
||||
fires 2-8 sit unused because `is_in_callback()` stays true. In
|
||||
that case progression metric won't move. This is an empirical
|
||||
question, not predictable from static analysis. The brief
|
||||
explicitly says "if the fix lands cleanly but progression
|
||||
doesn't move, that's the answer."
|
||||
- **Phase B image_canonical_sha256**: unchanged (no changes to
|
||||
image-load path).
|
||||
- **Sister chains**: tid=14→9 / tid=15→10 are the targets. Other
|
||||
chains (tid=11/16/4) may shift due to scheduling re-ordering.
|
||||
|
||||
## Phase 1: Execution log (filled during implementation)
|
||||
|
||||
[See fix.diff for the actual code changes]
|
||||
|
||||
## Phase 2: Validation (filled after cold runs)
|
||||
|
||||
[See re-validation.md and digests/]
|
||||
|
||||
## Phase 3: Outcome (filled after measurement)
|
||||
|
||||
[See summary.md]
|
||||
Reference in New Issue
Block a user