handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,137 @@
# Phase A diff report
**This report is the output of Phase A's diff harness. Divergences
shown here are INPUT for Phase B (first-divergence localization),
not findings of Phase A.** Phase A's job is to make the harness
itself correct, not to analyze what it surfaces.
## Summary
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|---|---|---|---|---|
| 4 | 11 | 11 | 20000 | 11 | — | 0/0 | 0/0 |
| 6 | 1 | 102424 | 250000 | 108507 | 102424 | 0/0 | 0/0 |
| 7 | 2 | 32 | 32 | 33 | — | 0/0 | 0/0 |
| 12 | 7 | 4 | 20000 | 5 | 4 | 0/0 | 0/0 |
| 14 | 9 | 41 | 20000 | 77 | 41 | 0/0 | 0/0 |
| 15 | 10 | 16 | 20000 | 17 | — | 0/1 | 0/0 |
*`floating_create (c/o)` counts shared-global `handle.create` events absorbed by Phase C+18 cross-tid SID matching. `floating_wait (c/o)` counts `wait.begin` events on shared-global dispatchers absorbed by Phase C+21 (scheduling-jitter window — canary's contention slow path may fire while ours fast-paths or vice versa). See schema-v1.md §"Shared-global SIDs" and §"Wait-begin floating absorb".*
## canary_tid=4 → ours_tid=11
No divergence within the 11 compared events (canary has 20000, ours has 11).
## canary_tid=6 → ours_tid=1
First divergence at `tid_event_idx=102424`: payload.return_value: canary=0 ours=18446744072635809807
**Pre-context (last 5 matching events):**
```
canary: [102419] import.call RtlInitAnsiString
ours: [102419] import.call RtlInitAnsiString
canary: [102420] kernel.call RtlInitAnsiString
ours: [102420] kernel.call RtlInitAnsiString
canary: [102421] kernel.return RtlInitAnsiString
ours: [102421] kernel.return RtlInitAnsiString
canary: [102422] import.call NtQueryFullAttributesFile
ours: [102422] import.call NtQueryFullAttributesFile
canary: [102423] kernel.call NtQueryFullAttributesFile
ours: [102423] kernel.call NtQueryFullAttributesFile
```
**Divergent event:**
```
canary: [102424] kernel.return NtQueryFullAttributesFile
ours: [102424] kernel.return NtQueryFullAttributesFile
```
**Next event after the divergence (if any):**
```
canary: [102425] import.call RtlEnterCriticalSection
ours: [102425] import.call RtlNtStatusToDosError
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1523806700, "kind": "kernel.return", "payload": {"name": "NtQueryFullAttributesFile", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102424}
{"deterministic": true, "engine": "ours", "guest_cycle": 5391947, "host_ns": 466907956, "kind": "kernel.return", "payload": {"name": "NtQueryFullAttributesFile", "return_value": 18446744072635809807, "side_effects": [], "status": "0xc000000f"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102424}
```
## canary_tid=7 → ours_tid=2
No divergence within the 32 compared events (canary has 32, ours has 33).
## canary_tid=12 → ours_tid=7
First divergence at `tid_event_idx=4`: payload.return_value: canary=258 ours=0
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
canary: [2] handle.create sid=c49d8f0ab90401ea
ours: [2] handle.create sid=6e3d96c5a52bf429
canary: [3] wait.begin {'handles_semantic_ids': ['c49d8f0ab90401ea'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
ours: [3] wait.begin {'handles_semantic_ids': ['6e3d96c5a52bf429'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
```
**Divergent event:**
```
canary: [4] kernel.return KeWaitForSingleObject
ours: [4] kernel.return KeWaitForSingleObject
```
**Next event after the divergence (if any):**
```
canary: [5] import.call RtlEnterCriticalSection
ours: <end of stream>
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1680989100, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 258, "side_effects": [], "status": "0x00000102"}, "schema_version": 1, "tid": 12, "tid_event_idx": 4}
{"deterministic": true, "engine": "ours", "guest_cycle": 30, "host_ns": 493470489, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 7, "tid_event_idx": 4}
```
## canary_tid=14 → ours_tid=9
First divergence at `tid_event_idx=41`: payload.ord: canary=503 ours=293
**Pre-context (last 5 matching events):**
```
canary: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
ours: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
canary: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
ours: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
canary: [38] import.call KfLowerIrql
ours: [38] import.call KfLowerIrql
canary: [39] kernel.call KfLowerIrql
ours: [39] kernel.call KfLowerIrql
canary: [40] kernel.return KfLowerIrql
ours: [40] kernel.return KfLowerIrql
```
**Divergent event:**
```
canary: [41] import.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [41] import.call RtlEnterCriticalSection
```
**Next event after the divergence (if any):**
```
canary: [42] kernel.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [42] kernel.call RtlEnterCriticalSection
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1855379000, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "XAudioGetVoiceCategoryVolumeChangeMask", "ord": 503}, "schema_version": 1, "tid": 14, "tid_event_idx": 41}
{"deterministic": true, "engine": "ours", "guest_cycle": 417, "host_ns": 1675672891, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlEnterCriticalSection", "ord": 293}, "schema_version": 1, "tid": 9, "tid_event_idx": 41}
```
## canary_tid=15 → ours_tid=10
No divergence within the 16 compared events (canary has 20000, ours has 17).

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,110 @@
diff --git a/crates/xenia-kernel/src/xaudio.rs b/crates/xenia-kernel/src/xaudio.rs
index c20fe94..cb09261 100644
--- a/crates/xenia-kernel/src/xaudio.rs
+++ b/crates/xenia-kernel/src/xaudio.rs
@@ -58,6 +58,24 @@ pub const XAUDIO_PERIOD: Duration = Duration::from_nanos(5_333_333);
/// queueing unbounded callbacks while injection is starved.
pub const XAUDIO_QUEUE_CAP: usize = 16;
+/// Phase HostAudioEager (2026-05-19): initial seeded fire count at
+/// `XAudioRegisterRenderDriverClient` time. Mirrors xenia-canary
+/// [`audio_system.cc:210`](../../../../xenia-canary/src/xenia/apu/audio_system.cc#L210)
+/// `client_semaphore->Release(queued_frames_=8, nullptr)` — the moment
+/// canary's `RegisterClient` returns, its already-running host worker
+/// thread has 8 buffer-complete fires queued to drain.
+///
+/// In ours, the dedicated guest audio worker (spawned at the same
+/// register call) can't be HOST-threaded; instead we seed the pending
+/// FIFO so the round prologue's `try_inject_audio_callback` injects
+/// the first callback on the very next round — well before tid=1
+/// reaches `ExCreateThread` for the XAudio worker threads (tid=14/15
+/// in canary, tid=9/10 in ours). This fixes the ordering issue where
+/// the 48k-instruction ticker delay let tid=9/10 spawn and enter
+/// their spin loop on the uninitialized voice struct before the
+/// callback could modify it.
+pub const XAUDIO_REGISTER_SEED_FIRES: usize = 8;
+
#[derive(Debug, Clone, Copy)]
pub struct XAudioClient {
pub callback_pc: u32,
@@ -155,6 +173,28 @@ impl XAudioState {
}
}
+ /// Phase HostAudioEager: enqueue `n` buffer-complete fires for a
+ /// specific client slot. Used by `XAudioRegisterRenderDriverClient`
+ /// to mirror canary's `client_semaphore->Release(queued_frames_)`
+ /// at register time. Capped by [`XAUDIO_QUEUE_CAP`] to avoid
+ /// unbounded growth if the caller seeds aggressively. Returns the
+ /// actual number of fires enqueued.
+ pub fn seed_fires_for(&mut self, index: usize, n: usize) -> usize {
+ if index >= XAUDIO_MAX_CLIENTS || self.clients[index].is_none() {
+ return 0;
+ }
+ let mut queued = 0;
+ for _ in 0..n {
+ if self.pending.len() >= XAUDIO_QUEUE_CAP {
+ self.dropped += 1;
+ break;
+ }
+ self.pending.push_back(index);
+ queued += 1;
+ }
+ queued
+ }
+
pub fn peek_next(&self) -> Option<usize> {
self.pending.front().copied()
}
@@ -320,6 +360,51 @@ mod tests {
assert!(s.last_instant.is_some());
}
+ #[test]
+ fn seed_fires_for_registered_slot_enqueues_n() {
+ let mut s = XAudioState::default();
+ let i = s.register(dummy_client(1)).unwrap();
+ let queued = s.seed_fires_for(i, XAUDIO_REGISTER_SEED_FIRES);
+ assert_eq!(queued, XAUDIO_REGISTER_SEED_FIRES);
+ assert_eq!(s.pending.len(), XAUDIO_REGISTER_SEED_FIRES);
+ // All enqueued fires reference our slot.
+ for _ in 0..XAUDIO_REGISTER_SEED_FIRES {
+ assert_eq!(s.take_next(), Some(i));
+ }
+ assert!(s.pending.is_empty());
+ }
+
+ #[test]
+ fn seed_fires_for_unregistered_slot_is_noop() {
+ let mut s = XAudioState::default();
+ // Slot 3 is empty.
+ let queued = s.seed_fires_for(3, 8);
+ assert_eq!(queued, 0);
+ assert!(s.pending.is_empty());
+ assert_eq!(s.dropped, 0);
+ }
+
+ #[test]
+ fn seed_fires_for_caps_at_queue_cap_and_counts_drops() {
+ let mut s = XAudioState::default();
+ let i = s.register(dummy_client(1)).unwrap();
+ let queued = s.seed_fires_for(i, XAUDIO_QUEUE_CAP * 4);
+ assert_eq!(queued, XAUDIO_QUEUE_CAP);
+ assert_eq!(s.pending.len(), XAUDIO_QUEUE_CAP);
+ // Excess fires are counted as dropped (per
+ // existing `enqueue_all_active` discipline).
+ assert!(s.dropped >= 1);
+ }
+
+ #[test]
+ fn seed_fires_for_out_of_range_index_is_noop() {
+ let mut s = XAudioState::default();
+ s.register(dummy_client(1)).unwrap();
+ let queued = s.seed_fires_for(XAUDIO_MAX_CLIENTS + 5, 4);
+ assert_eq!(queued, 0);
+ assert!(s.pending.is_empty());
+ }
+
#[test]
fn tick_wallclock_fires_after_period() {
let mut s = XAudioState::default();

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,118 @@
# Phase Host-Audio-Eager — Investigation (2026-05-19)
## Phase 0: Plan
### Canary's XHostThread setup (verified from source)
- `AudioSystem::AudioSystem` (`xenia-canary/src/xenia/apu/audio_system.cc:48-69`):
constructs 8 host semaphores (`client_semaphores_[i]`) at engine init time.
Each is `Semaphore::Create(initial=0, max=queued_frames_=8)`.
- `AudioSystem::Setup` (line 77-98): spawns `XHostThread "Audio Worker"`
running `WorkerThreadMain` IMMEDIATELY. This is a HOST OS thread, not a
guest thread. Runs continuously throughout engine lifetime.
- `WorkerThreadMain` (line 100-159): loops on `WaitAny(client_semaphores_, ...)`
→ on wake, calls `processor_->Execute(thread_state, client_callback, args)`
which runs the guest callback IN-LINE on the host worker thread.
- `RegisterClient` (line 202-237): the moment a client registers, it
`client_semaphore->Release(queued_frames_=8)` (line 210), seeding 8
semaphore permits. The already-running worker thread then drains these
in a tight loop: callback runs, returns, semaphore decremented, repeat.
8 callback invocations happen BEFORE `RegisterClient` even returns (or
shortly after).
- After SDL plays a frame, `sdl_audio_driver.cc:199` releases ONE permit,
re-arming the loop. Under `--mute=true`, SDL still drains and releases.
### Ours's current ticker model (verified from source)
- `main.rs:2125-2131` (round prologue): each round, if any client is
registered, `xaudio.tick_instr(stats.instruction_count)` adds the delta
of executed instructions to an accumulator; when accumulator crosses
`XAUDIO_INSTR_PERIOD=48_000`, it enqueues one fire per registered
client and decrements the accumulator.
- `main.rs:2135-2137`: `try_inject_audio_callback` then pulls one fire
off the queue and injects it into the dedicated audio worker thread
(parked on a synthetic handle), but only if `is_in_callback()` is false
(mutex with graphics interrupts).
- Worker thread spawned at register time (`exports.rs:4084-4160`) with
PC=callback_pc, parked Blocked(WaitAny[SYNTHETIC]). Injection flips
state to ServicingIrq with `pc=callback_pc`, runs callback, returns
to LR_HALT, restore path re-blocks worker on synthetic.
### The ordering problem
Sylpheed boot sequence (verified per prior agent's traces):
1. tid=1 main calls `XAudioRegisterRenderDriverClient` → ours registers
client at slot 0, spawns worker (tid=11), enqueues NOTHING.
2. tid=1 main continues executing thousands of instructions.
3. tid=1 main calls `ExCreateThread` for XAudio worker threads → tid=9
and tid=10 spawn. They start spinning on the uninitialized voice
struct at `[r31+356]`.
4. **48,000 instructions after register**, the ticker finally fires,
enqueueing one buffer-complete callback.
5. Audio worker tid=11 wakes, runs callback at 0x824D6640. The callback
calls `KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])` and
blocks. These dispatchers can only be signaled by tid=9/10.
6. tid=9/10 are stuck spinning → tid=11 stuck waiting → **circular
deadlock**.
In canary: the worker is HOST-threaded and starts running BEFORE
tid=1 even reaches the register call. Register seeds 8 permits → worker
drains 8 callback invocations. By the time tid=14/15 spawn, the voice
struct's `[r31+356]` field has been modified by 8 callback runs and is
in a state where tid=14/15 take a different (non-spinning) control-flow
path. Critically, IN CANARY THE CALLBACK DOES NOT BLOCK on those
dispatchers — because the voice state is different.
### Implementation steps
1. At `XAudioRegisterRenderDriverClient` after worker spawn succeeds,
eagerly enqueue 8 fires (matching canary `queued_frames_=8`) into
`state.xaudio.pending`. The ticker's existing per-round drain plus
the existing `try_inject_audio_callback` will then deliver these
8 callbacks across subsequent rounds — but they will fire WITHIN
the first few thousand instructions of register-return, well
before tid=9/10 spawn.
2. Eagerly fire the audio injector once at the END of the register
handler. The round prologue normally calls
`try_inject_audio_callback` once per round; this gives us +1
immediate fire to maximize the chance of callback completion
before tid=1 continues to spawn tid=9/10.
3. Update `enqueue_all_active` to NOT enqueue if queue is at cap (it
already does this; we just rely on it).
4. Add 2-3 unit tests covering the eager-seed behavior in
`XAudioState`.
5. Document the change in the existing register-handler block comment.
### Risks
- **Determinism shift**: cold digest WILL change (8 extra fires
re-order the round prologue's audio injection). Capture new
digest, validate 3× reproducibility.
- **Worker blocks on first callback** (per prior agent's
diagnosis): if tid=11's first callback blocks immediately on
`KeWaitForMultipleObjects`, then queue depth 8 doesn't matter —
fires 2-8 sit unused because `is_in_callback()` stays true. In
that case progression metric won't move. This is an empirical
question, not predictable from static analysis. The brief
explicitly says "if the fix lands cleanly but progression
doesn't move, that's the answer."
- **Phase B image_canonical_sha256**: unchanged (no changes to
image-load path).
- **Sister chains**: tid=14→9 / tid=15→10 are the targets. Other
chains (tid=11/16/4) may shift due to scheduling re-ordering.
## Phase 1: Execution log (filled during implementation)
[See fix.diff for the actual code changes]
## Phase 2: Validation (filled after cold runs)
[See re-validation.md and digests/]
## Phase 3: Outcome (filled after measurement)
[See summary.md]

View File

@@ -0,0 +1,127 @@
# Phase Host-Audio-Eager — Re-validation (2026-05-19)
## Progression metric (primary gate)
| metric | pre-fix baseline | post-fix | delta |
|-------:|:----------------:|:--------:|:-----:|
| **swaps** | 1 | **1** | **0** |
| **draws** | 0 | **0** | **0** |
**The progression metric did NOT move.** Despite landing the eager-seed
implementation cleanly with 3× reproducibility, neither swaps nor draws
advanced. This matches the prior agent's diagnosis: the audio worker
ordering issue is real, but the deeper root cause is voice-struct state
divergence — the audio callback at `0x824D6640` in ours blocks on
`KeWaitForMultipleObjects([0x82928B04, 0x82928AE0])` immediately
because the voice struct at `[r31+356]` reads `0x01` (ours) vs `0x00`
(canary). Pre-seeding 8 fires lets `try_inject_audio_callback` deliver
the first callback earlier, but the callback still blocks on the same
guest dispatchers — fires 2-8 sit in the queue because
`interrupts.is_in_callback()` stays true.
## 3× determinism
```
73e99d60029128b4d5c3dd98e540457d82a52b8a962e7495132be2be31411aca /tmp/digest_eager_1.json
73e99d60029128b4d5c3dd98e540457d82a52b8a962e7495132be2be31411aca /tmp/digest_eager_2.json
73e99d60029128b4d5c3dd98e540457d82a52b8a962e7495132be2be31411aca /tmp/digest_eager_3.json
```
All three cold runs produce byte-identical digest JSON. The
seed-at-register implementation is fully deterministic in lockstep
mode (the ticker accumulator gets pre-populated synchronously inside
the register handler, no host-thread non-determinism).
## Digest JSON
```json
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}
```
`imports`/`unimpl` unchanged from the C+22 baseline (40390/0).
## Phase B invariant
```
image_loaded_sha256 = ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18
```
UNCHANGED — Phase B is not affected by audio-runtime changes.
## Per-chain matched-prefix
100M-instruction cold trace vs canary baseline
(`xenia-rs/audit-runs/phase-d-stage1/canary-cvaroff-trunc.jsonl`
pre Phase D D-extension absorber, so main reads at the C+18
102,424 value, NOT the post-D-extension 105,046).
| chain | pre-fix | post-fix | delta | first divergence |
|------:|--------:|---------:|------:|:-----------------|
| canary tid=4 → ours tid=11 | 11 | **11** | 0 | (preserved) |
| canary tid=6 → ours tid=1 | 102,424 | **102,424** | **0** | `NtQueryFullAttributesFile` (C+18-era) |
| canary tid=7 → ours tid=2 | 32 | **32** | 0 | (preserved) |
| canary tid=12 → ours tid=7 | 4 | **4** | 0 | C+23 idx=4 |
| canary tid=14 → ours tid=9 | 41 | **41** | 0 | (no advance — primary target) |
| canary tid=15 → ours tid=10 | 16 | **16** | 0 | (no advance — primary target) |
The two primary targets (tid=14→9 and tid=15→10) were the audio worker
guest threads spinning on the uninitialized voice struct. Their
matched-prefix did NOT advance.
## Kernel tests
- Pre: 217 passed
- Post: **221 passed** (+4 new `seed_fires_for_*` tests)
- Failures: 0
- All existing tests pass
## Build
`cargo build --release` clean. One pre-existing dead-code warning
unrelated to this fix.
## Total LOC
| file | added | removed |
|------|------:|--------:|
| `crates/xenia-kernel/src/xaudio.rs` | 86 | 0 |
| `crates/xenia-kernel/src/exports.rs` | 18 | 5 |
| **total** | **104** | **5** |
Net ~100 LOC, of which ~60 LOC are tests + doc comments. Engine logic
delta is ~25 LOC.
## Conclusion
The implementation lands cleanly:
- 3× cold-deterministic
- Phase B unchanged
- All tests pass
- Sister chains preserved
But the progression metric (swaps/draws) did NOT move. This is an
HONEST NEGATIVE RESULT: the eager-seed approach addresses the symptom
(ticker delays the first callback) but not the root cause (the
callback at 0x824D6640 still blocks on guest dispatchers that only
tid=9/10 can signal, and tid=9/10 are stuck on a voice-struct field
that the callback would need to clear — but doesn't, because canary's
callback takes a DIFFERENT control-flow path that doesn't reach the
KeWaitForMultipleObjects in the first place).
The deeper fix requires either:
- Identifying the guest write that initializes `[r31+356]` to 0 in
canary's boot path and ensuring ours produces the same write.
- A true host-side audio worker thread that can run the callback
in a host context (substantial threading-model rework).
Both are out of scope for this session per the brief's "Don't widen
scope" tripstone.