Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

9.3 KiB

Raw Blame History

Phase Host-Audio-Bridge — Investigation (2026-05-19)

Outcome: AUDIT-048 cascade C NOT landed this session. Root cause is upstream state divergence in XAudio voice struct initialization, NOT a missing host-side write. Progression metric (swaps=1, draws=0) unchanged. Per-chain matched-prefix unchanged.

Diagnosis

Canary semantics (verified)

xenia-canary/src/xenia/apu/audio_system.cc:84-159 — AudioSystem::WorkerThreadMain is a HOST thread that loops: WaitAny(client_semaphores_) → processor_->Execute(callback). Semaphores are seeded by RegisterClient line 210: client_semaphore->Release(queued_frames_=8, nullptr).
After SDL plays a frame, sdl_audio_driver.cc:199 calls semaphore_->Release(1) — re-arming the loop. With --mute=true, SDL still consumes frames via SDLCallback and still releases.
There is NO host-side write to a guest field at offset +356. The XAudio voice struct at r31 is GUEST-allocated and managed entirely by the GUEST callback code at 0x824D6640 and the XAudio worker thread bodies at 0x824D2878 / 0x824D2940.

Ours's current state

AUDIT-048 Plan B (dedicated guest worker thread, parked on synthetic handle, injected by ticker) is wired in xenia-kernel/src/exports.rs:4048-4168 + xenia-kernel/src/xaudio.rs
- xenia-app/src/main.rs:3461-3536.
Ours's tid=11 (entry=0x824D6640 = the registered callback) DOES execute. Per deadlock dump: pc=0x824d2a94 lr=0x824d2a94 state=Blocked(WaitAny { handles: [0x82928B04, 0x82928AE0] }) — the callback called KeWaitForMultipleObjects on two guest dispatchers and is now waiting.
xaudio.callback.delivered=1 — only one injection, because is_in_callback stays true while tid=11 is blocked on the real handles (the saved context is only cleared on LR_HALT_SENTINEL return, which tid=11 never reaches).

The spin loop in tid=9/10 (the XAudio worker guest threads)

PCs 0x824D1400-0x824D141C (canary tid=14 / ours tid=9):

0x824d1400: beqlr cr6                ; return if cr6.eq (success)
0x824d1404: cmpd  cr6, r10, r11      ; compare r10 with r11
0x824d1408: beq   cr6, 0x824D1420    ; ok-branch on match
0x824d140c: mr    r31, r31            ; yield nop
0x824d1410: ld    r11, 0(r4)          ; reload [r4]
0x824d1414: cmpdi cr6, r11, 0        ; check r11 == 0
0x824d1418: bne   cr6, 0x824D1404    ; loop if nonzero
0x824d141c: blr                       ; return on r11 == 0

LR=0x824D22B4 (caller does addi r4, r31, 356; bl ...; <0x824D22B4 is next>).

Live runtime probe (ours, 100M instr, --dump-addr 0x42511040 and 0x42510edc)

At halt (tid=9 still spinning at pc=0x824d140c):

r3 = r31 = 0x42510edc (an XAudio voice/driver struct in heap-mapped guest mem).
r4 = r31 + 0x164 (=356) = 0x42511040.
r10 = 0x01010000 (expected success value).
Last-known r11 = 0x00000000 (from the load) — but the spin continues, so the value at [r4] keeps changing? Or the snapshot doesn't reflect steady state.

Memory dump at 0x42511040:

+0x00: 01 00 00 00 00 00 00 00   →  ld interpretation = 0x0100000000000000
+0x10: 00 00 00 03 42 51 10 54   →  linked-list head with 3 entries
+0x20...: list nodes with prev/next pointers in 0x4251xxxx range

This is a GUEST-OWNED linked-list / voice-state struct. Byte 0 = 0x01 is clearly a "state flag" that distinguishes the poll target. ld reads 8 bytes BE → 0x0100000000000000. r10 expected is 0x01010000 (zero-extended). r11 read is 0x0100000000000000. Not equal, not zero → spin.

Memory dump at r31=0x42510edc:

+0x00: 82 00 6c f4                →  VTABLE POINTER (code at .rdata 0x82006cf4)
+0x04: 00 00 00 02                →  refcount or count
+0x08: 42 51 0e c0                →  back-pointer
+0x40: 41 ea 0d 5c                →  matches XAudio register callback_arg

So r31 is the XAudio voice object Sylpheed allocates and passes as the callback argument. The vtable at 0x82006cf4 is ANON_Class_* per sylpheed.db. The voice owns a linked list (head at +0x164) that tracks audio buffers / voice state.

Why ours diverges from canary

Ours's tid=9 sees [r31+356] as 0x0100000000000000; canary's tid=14 sees it as 0x0000000000000000 (or 0x0000000001010000 matching r10). Both engines run identical guest code starting from the same .data values. So the divergence must be a kernel-call return value OR a memory write that happens between thread spawn and the spin loop.

Per cross-trace of tid=9 events idx 0..76 vs canary tid=14 events 0..~80, the kernel return values match (KeWaitForSingleObject→0, KeRaiseIrqlToDpcLevel returns the same sequence 0,2,2,2 etc., KfLowerIrql→0). The setup chain emits the same events. But host_ns wall-clock diverges: canary's KeWaitForSingleObject blocks for ~85ms (1727→1813 ms); ours's wait returns in ~7 microseconds (1603890→1603904 ns).

The root cause class

This is upstream scheduling divergence, not host-bridge missing:

In canary: tid=11 (host AudioSystem WorkerThreadMain) starts FIRST and runs the callback at 0x824D6640. The callback modifies the XAudio voice struct (clearing byte at +0x164). Then tid=14 spawns, hits the spin, sees zero, proceeds.
In ours: tid=9/10 are spawned by main and resumed via KeResumeThread. They start running BEFORE the audio ticker (period 48,000 instructions) ever fires. tid=9 hits the spin loop with the struct in its uninitialized state (byte +0x164 = 0x01). Stuck forever.

The audio worker (tid=11) DOES eventually get injected and runs, but by then tid=9/10 are stuck in the spin loop and the callback blocks on guest dispatchers that only tid=9/10 can signal — circular deadlock.

Why a host-side write is the wrong fix

The session brief hypothesizes a missing host-side write to clear [r31+356]. This is not correct:

Canary's host audio worker does NOT write to any guest VA in the +356 range. It only calls processor_->Execute(callback) and waits on its host semaphore.
The byte at offset 0x164 of the voice struct is touched only by GUEST code (the callback or the worker functions). No host code in either engine reaches into that field.
The "missing write" framing came from assuming the host audio worker does something analogous to SubmitFrame's buffer-complete bookkeeping. SubmitFrame only acks the host SDL driver semaphore (line 199); it does not modify the voice struct at +356.

Writing 0 to [r31+356] from host code would be a band-aid that crosses reading-error #23 (matching divergent guest behavior) and risks corrupting the voice struct's invariants.

What the correct fix shape would be

To make ours converge to canary's behavior, the audio worker callback at 0x824D6640 needs to RUN AND COMPLETE before tid=9/10 reach the spin loop.

Option A — Force-fire callback at register time: Inside xaudio_register_render_driver, after spawning tid=11, synchronously execute the callback to completion (treat as a synchronous shim). Tricky because the callback calls KeWaitForMultipleObjects which would block.

Option B — Defer spawn of tid=9/10 in guest: Not feasible — guest controls spawn timing.

Option C — Inject the callback eagerly + spin tid=11 forward: Tick the audio loop hundreds of times immediately at register. But tid=11 blocks on guest objects that need tid=9/10 to be running already.

Option D — Match canary's actual concurrency model: Spawn tid=11 as a native HOST thread that runs processor_->Execute(callback)-equivalent. This is a significant rework of the threading model.

Option E — Identify the specific guest write that clears +0x164 in canary: Disassemble sub_824D6640 (the callback) and find the store. Then ensure ours's execution of the callback reaches that store before tid=9/10 spin. Requires fixing the deeper scheduling-ordering issue.

None of these is a 30-150 LOC fix. All require either:

Architectural threading-model changes
Sub-cycle ordering control between guest threads
Deep guest-code disassembly + emulation of the byte-clear path

Recommendation

This session declines to land a fix. The session brief's hypothesis (missing host-side write to clear [r31+356]) is empirically wrong: the byte is owned by guest code in both engines. The deferred AUDIT-048 cascade C is correctly deferred — the necessary work is scheduling-ordering matching, not host-bridge wiring.

Next-session recommendation: probe-instrument the byte at r31+0x164 for the XAudio voice (around 0x42511040 ours, similar address canary) on FIRST guest write. Identify in canary trace which PC writes the byte and on which tid. That's the actual fix target.

Per-chain delta (no change)

chain	pre	post
main tid=6→1	105,046	105,046
sister tid=14→9	41	41
sister tid=15→10	16	16
sister tid=4→4	preserved	preserved
sister tid=11→11	preserved	preserved
sister tid=16→16	preserved	preserved
swaps	1	1
draws	0	0

Artifacts

investigation.md (this file)
Cold trace: /tmp/ours-cold.jsonl (121k events, halt-on-deadlock after 100M instr cap)
Memory dumps captured via --dump-addr 0x42511040 and --dump-addr 0x42510edc
Phase B image_canonical_sha256 = ea8d160e… UNCHANGED (no engine modification)

9.3 KiB Raw Blame History