Ours' `vd_swap` wrote its 64-dword XE_SWAP block at the guest's reserved
`buffer_ptr` slot AND then bumped the primary ring `CP_RB_WPTR` out-of-band
via `state.gpu.extend_write_ptr_by(64)`. That bump was a bug: `buffer_ptr`
(~0x4add6efc) is NOT inside the primary ring (base ~0x4adcd000, 8192 dwords)
— it lives ~10k dwords past it, in the renderer indirect-buffer region. The
bogus WPTR bump pushed the GPU read-pointer PAST the guest's real
write-pointer; the drain treated the overshoot as a circular wrap and
re-executed the splash's draw indirect-buffers ~2×, inflating draws to 78
(the real splash geometry is ~28 draws; 12 INDIRECT_BUFFERs vs the real 6).
Canary's `VdSwap_entry` (xenia-canary xboxkrnl_video.cc:518-548) writes the
fetch-constant patch + PM4_XE_SWAP + NOP pad into the reserved slot and
returns — it NEVER touches CP_RB_WPTR. The guest advances the primary ring
write-pointer itself via its own doorbell once it has populated the slot;
swap-complete CP interrupts come only from the game's in-stream PM4_INTERRUPT
packets, never from VdSwap.
This fix removes only the out-of-band `extend_write_ptr_by(64)` call, keeping
the buffer_ptr block write intact and byte-faithful to canary. Effect at
`--gpu-inline -n 50M`: draws 78→28, INDIRECT_BUFFER 12→6 (re-execution
artifact gone), swaps 4→2. The run now halts at ~19.27M instructions (worker
threads exit) instead of spinning to 50M, because removing the corruption
unmasks the real per-present-interrupt deadlock — the title loop needs a
per-present PM4_INTERRUPT that the stalled game never submits. That deadlock
is a SEPARATE, known gate tracked/addressed elsewhere; it is intentionally
NOT papered over here.
Re-baselined golden crates/xenia-app/tests/golden/sylpheed_n50m.json to the
new honest values (regenerated twice, byte-identical). sylpheed_n2m.json is
unaffected (draws=0 at 2M). cargo test --workspace: 675 passed.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sylpheed's title loop re-runs its per-frame manager update sub_821741C8
only when "clock B" ([controller+88], the swap count) changes. Clock B's
sole source is the CP swap-complete callback sub_824CE2B8, which bumps
[gfx+15160] via the TWO-LEVEL deref [[VdGlobalDevice]+0]+15160, where
VdGlobalDevice is the kernel variable export 0x01BE at guest .data
0x82000750.
Ours patched that import slot with literal 0 (the old "passed through to
Vd* shims, write 0" behaviour). Consequences, both confirmed at runtime:
* the guest's graphics init stores its D3D device object via
`stw r31, 0([0x82000750])` (sub_824C6DC0 @0x824C6F18) — with the slot
0, that store lands at address 0;
* the swap callback reads [[0x82000750]] = [0] = 0 and increments
[0+15160] (the null page) instead of the real device's swap counter.
So [gfx+15160] never moved, clock B stayed frozen at 0, sub_821741C8
fired exactly once, and the game submitted one render batch (the 78-draw
splash) then stalled.
Fix mirrors xenia-canary RegisterVideoExports (xboxkrnl_video.cc:557-564)
exactly: allocate a 4-byte cell, point the import slot at it, zero the
cell. The guest then stores its device into the cell, and the callback's
two-level deref resolves correctly. Verified: [0x82000750] now holds a
real cell whose [+0] is the device (gfx state), the swap callback bumps
[gfx+15160] 0->1, clock B advances, and the per-frame chain steps forward
(sub_821741C8 fires 1->2x, GamePart update sub_821C7CB8 0->1x).
Determinism: --gpu-inline digest re-baselined and byte-identical across
runs. The fix shifts the early execution trajectory (clock B unfreezing),
so the n50m golden moves imports 451500->178937 and instructions
50000001->50000014; draws/swaps/RTs/shaders unchanged (78/4/2/3). n2m
golden unchanged (early boot, pre-fix-effect). 675 workspace tests green;
sylpheed_n50m oracle green.
Note: this breaks the FIRST hard blocker (clock B could never advance at
all). Full per-frame sustain (draws past 78) needs a further step: each
GamePart update must submit a per-frame command buffer (with PM4_INTERRUPT)
during the asset-streaming phase to keep generating CP interrupts; ours
currently produces only the single seed interrupt from the initial batch,
so the chain advances once and re-stalls. Tracked for the next iterate.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Make ours' VdSwap present path faithful to xenia-canary `VdSwap_entry`
(xboxkrnl_video.cc:518-548): write the reserved 64-dword ring slot with a
PM4_TYPE0 fetch-constant patch + PM4_TYPE3(PM4_XE_SWAP) + NOP padding, then
let the natural drain consume the swap packet in command-stream order. Remove
the synthetic CP swap-complete interrupt that `notify_xe_swap` raised
out-of-band.
Root found this session (the actual present-path bug): ours' `notify_xe_swap`
pushed an `InterruptSource::Swap` (→ INTERRUPT_SOURCE_CP) interrupt directly
from the VdSwap HLE, decoupled from the GPU command stream. When that interrupt
reached the graphics ISR `sub_824BE9A0` before D3D had armed its swap-callback
slot (`[gfx+10772]+16` still the `0xBADF00D` placeholder), the ISR took its
error path and hit the assert "ERR[D3D]: Unanticipated CPU_INTERRUPT. Sign of a
corrupt command buffer?" (`bl sub_824C5DF0; twi` at 0x824BE9DC) — 2x per run on
master. Canary's VdSwap raises NO interrupt; swap-complete CP interrupts come
only from in-stream PM4_INTERRUPT packets, which are naturally ordered after the
callback-arming Type-0 writes. Routing the swap through the ring packet matches
that ordering and eliminates the trap (2 -> 0).
Canary oracle confirmation (muted, audit_mem_watch + audit_jit_prolog_pc):
canary's early/loading loop is present-driven — swap counter [gfx+15160]
(0xBE56CA38) advances ~per-vblank from vblank 65 onward, reaching 0xD02 (3330)
in ~60s via 6184 CP source=1 interrupts, with VdSwap called only ONCE. So the
present interrupts are entirely in-stream, not from the VdSwap export.
This is a correctness/faithfulness fix; it does NOT cascade. draws stay 78 at
200M and 1B because the upstream gate persists: the game submits one render
batch then stalls (renderer sub_82506xxx 0x; 2nd title thread 0x821748F0 never
spawns). The per-frame loop sub_822F1AA8 runs ~1207 iterations on vsync but
clock B (swap count) only advances ~once, so the manager update sub_821741C8
fires once. That is the iterate-2Q/2F title-pipeline gate, not a present/
interrupt bug. swaps 3 -> 4 (the in-stream PM4_XE_SWAP now drains).
Deterministic in inline mode (n50m --gpu-inline --stable-digest regenerated
byte-identical twice; golden re-baselined: swaps 3 -> 4). cargo test --workspace
675 passing.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-14 15:20:02 +02:00
4 changed files with 96 additions and 48 deletions
// The remaining vd_swap work (UI publish: shader blobs, constants,
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.