fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel

The Phase-C VdSwap PM4 ring path (commit 82f3d61) emits two "PM4_XE_SWAP not consumed by drain" warnings when running: exec sylpheed.iso --ui --quiet --halt-on-deadlock \ --parallel --reservations-table Lockstep -n 100M never trips it. Two distinct race windows: (a) Inline backend (--ui forces it): drain(mem, 4096) hit its fixed packet cap before reaching the PM4_XE_SWAP we'd just injected at the WPTR tail. With 6 CPU threads, the ring accumulates >4096 packets between vd_swap callbacks. (b) Threaded backend (--parallel without --ui): the worker's DrainFence handler has a 900 ms deadline and game-batched IBs (8-10 M packets observed) keep it from reaching the tail in any reasonable budget. If the worker eventually drained past the injected packet later, the safety-net direct notify would double-count. Three changes: * gpu_system.rs: new `drain_until_wptr(target, time_budget)` draining by the canary `WorkerThreadMain` predicate (read_offset != target) instead of a fixed packet count. 900 ms deadline mirrors the threaded DrainFence handler. * handle.rs: inline `drain_to_current_wptr` switches to `drain_until_wptr`. DrainFence handler publishes the digest mirror BEFORE replying so the CPU's post-drain `digest_snapshot` sees fresh stats. * exports.rs (vd_swap): skip the PM4 ring injection unconditionally and route swap notification through `notify_xe_swap` directly. Tail-injection is unreliable under --parallel for both backends. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001); draws=0 today so a stale slot 0 has no observable effect. Verification: * cargo test --workspace --release: 556 passing (unchanged). * Lockstep -n 100M --stable-digest: bit-identical to pre-fix master HEAD aa3f1d3. {instructions:100000002, imports:987685, unimpl:0, draws:0, swaps:2, ...} * check --parallel --reservations-table -n 30M: 0 warnings (was 2). swaps=2. * exec --gpu-inline --parallel --reservations-table -n 30M: 0 warnings (was 2 with drained=8M-10M observed). swaps=2. Audit IDs: GPUBUG-DRAIN-001 (closed), GPUBUG-FETCH-PATCH-001 (filed, deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:12:15 +02:00
parent aa3f1d344f
commit 7a1b6b3306
3 changed files with 121 additions and 86 deletions
--- a/crates/xenia-gpu/src/handle.rs
+++ b/crates/xenia-gpu/src/handle.rs
@@ -381,7 +381,16 @@ impl GpuBackend {
        match self {
            GpuBackend::Inline(s) => {
                s.sync_with_mmio();
-                s.drain(mem, 4096)
+                // GPUBUG-DRAIN-001: drain until target WPTR is reached (or
+                // 900 ms deadline), mirroring canary's `WorkerThreadMain` and
+                // the threaded `DrainFence` semantics. The previous fixed
+                // 4096-packet budget hit the cap under `--parallel`, where
+                // CPU runs many more PPC blocks per kernel-callback boundary
+                // and the ring backs up past 4096 packets before vd_swap
+                // fires; the safety-net fallback warning fired twice for
+                // each Sylpheed run.
+                let target = s.mmio.cp_rb_wptr.load(Ordering::Acquire);
+                s.drain_until_wptr(mem, target, Duration::from_millis(900))
            }
            GpuBackend::Threaded(h) => {
                let target_wptr = h.mmio.cp_rb_wptr.load(Ordering::Acquire);
@@ -563,6 +572,23 @@ impl GpuWorker {
                                ExecOutcome::Idle | ExecOutcome::Blocked => break,
                            }
                        }
+                        // GPUBUG-DRAIN-001: publish the digest mirror BEFORE
+                        // replying so the CPU's post-drain `digest_snapshot`
+                        // observes the `swaps_seen` bump from any
+                        // PM4_XE_SWAP we just consumed. Without this the
+                        // outer-loop publish at step 5b races the CPU's
+                        // post_swap_counter check and the kernel-side
+                        // `vd_swap` fires the "PM4_XE_SWAP not consumed"
+                        // safety-net warning even when the swap landed.
+                        let snap = GpuDigestSnapshot {
+                            stats: self.system.stats.clone(),
+                            shader_blobs_live: self.system.shader_blobs.len() as u64,
+                            texture_cache_entries: self.system.texture_cache.len() as u64,
+                            texture_decodes: self.system.texture_cache.decodes_total,
+                        };
+                        if let Ok(mut g) = self.digest.lock() {
+                            *g = snap;
+                        }
                        let _ = reply_tx.send(());
                    }
                    GpuCommand::NotifyXeSwap {