fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel
The Phase-C VdSwap PM4 ring path (commit82f3d61) emits two "PM4_XE_SWAP not consumed by drain" warnings when running: exec sylpheed.iso --ui --quiet --halt-on-deadlock \ --parallel --reservations-table Lockstep -n 100M never trips it. Two distinct race windows: (a) Inline backend (--ui forces it): drain(mem, 4096) hit its fixed packet cap before reaching the PM4_XE_SWAP we'd just injected at the WPTR tail. With 6 CPU threads, the ring accumulates >4096 packets between vd_swap callbacks. (b) Threaded backend (--parallel without --ui): the worker's DrainFence handler has a 900 ms deadline and game-batched IBs (8-10 M packets observed) keep it from reaching the tail in any reasonable budget. If the worker eventually drained past the injected packet later, the safety-net direct notify would double-count. Three changes: * gpu_system.rs: new `drain_until_wptr(target, time_budget)` draining by the canary `WorkerThreadMain` predicate (read_offset != target) instead of a fixed packet count. 900 ms deadline mirrors the threaded DrainFence handler. * handle.rs: inline `drain_to_current_wptr` switches to `drain_until_wptr`. DrainFence handler publishes the digest mirror BEFORE replying so the CPU's post-drain `digest_snapshot` sees fresh stats. * exports.rs (vd_swap): skip the PM4 ring injection unconditionally and route swap notification through `notify_xe_swap` directly. Tail-injection is unreliable under --parallel for both backends. The slot-0 fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001); draws=0 today so a stale slot 0 has no observable effect. Verification: * cargo test --workspace --release: 556 passing (unchanged). * Lockstep -n 100M --stable-digest: bit-identical to pre-fix master HEADaa3f1d3. {instructions:100000002, imports:987685, unimpl:0, draws:0, swaps:2, ...} * check --parallel --reservations-table -n 30M: 0 warnings (was 2). swaps=2. * exec --gpu-inline --parallel --reservations-table -n 30M: 0 warnings (was 2 with drained=8M-10M observed). swaps=2. Audit IDs: GPUBUG-DRAIN-001 (closed), GPUBUG-FETCH-PATCH-001 (filed, deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -18,6 +18,7 @@
|
||||
use std::collections::HashMap;
|
||||
use std::sync::Arc;
|
||||
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
|
||||
use std::time::{Duration, Instant};
|
||||
|
||||
use xenia_memory::MemoryAccess;
|
||||
|
||||
@@ -1280,6 +1281,63 @@ impl GpuSystem {
|
||||
}
|
||||
n
|
||||
}
|
||||
|
||||
/// Drain until the ring's read offset reaches `target_wptr` (modulo ring
|
||||
/// size) or `execute_one` returns Idle/Blocked. Mirrors canary's
|
||||
/// `WorkerThreadMain` (xenia-canary `command_processor.cc` ExecutePrimaryBuffer)
|
||||
/// which loops on `read_ptr_index_ != write_ptr_index` with no packet
|
||||
/// budget. `time_budget` bounds wall-clock so a pathological packet
|
||||
/// (e.g. an EVENT_WRITE that perpetually re-blocks) cannot spin the
|
||||
/// inline path; pass 900 ms to match the threaded `DrainFence` deadline.
|
||||
/// Returns the number of packets consumed.
|
||||
pub fn drain_until_wptr(
|
||||
&mut self,
|
||||
mem: &dyn MemoryAccess,
|
||||
target_wptr: u32,
|
||||
time_budget: Duration,
|
||||
) -> u32 {
|
||||
if self.ring.size_dwords == 0 {
|
||||
return 0;
|
||||
}
|
||||
let target = target_wptr % self.ring.size_dwords;
|
||||
let deadline = Instant::now() + time_budget;
|
||||
let mut n = 0u32;
|
||||
while self.ring.read_offset_dwords != target {
|
||||
if Instant::now() >= deadline {
|
||||
// Deadline exhaustion is the *expected* outcome under
|
||||
// `--parallel` workloads (Sylpheed boot queues millions
|
||||
// of game-batched IBs the inline drain can't chew
|
||||
// through in 900 ms). Logged at debug because warn-level
|
||||
// would fire on every vd_swap. Callers can re-read the
|
||||
// ring read pointer to detect partial drain if they
|
||||
// care.
|
||||
tracing::debug!(
|
||||
target,
|
||||
rptr = self.ring.read_offset_dwords,
|
||||
consumed = n,
|
||||
"gpu: drain_until_wptr time-budget exhausted"
|
||||
);
|
||||
break;
|
||||
}
|
||||
match self.execute_one(mem) {
|
||||
ExecOutcome::Stepped { .. } => {
|
||||
n += 1;
|
||||
// Mirror the threaded `DrainFence` handler at
|
||||
// handle.rs:553-570: re-sync after every packet so
|
||||
// any concurrent guest WPTR write (under `--parallel`)
|
||||
// folds into the local ring view before the next
|
||||
// `is_ready` check. Without this the local
|
||||
// write_offset is a snapshot of the moment we entered
|
||||
// the drain, which is fine for a target-WPTR drain
|
||||
// but wrong if downstream packets (e.g. an indirect
|
||||
// buffer's nested ring) need an updated view.
|
||||
self.sync_with_mmio();
|
||||
}
|
||||
ExecOutcome::Idle | ExecOutcome::Blocked => break,
|
||||
}
|
||||
}
|
||||
n
|
||||
}
|
||||
}
|
||||
|
||||
impl Default for GpuSystem {
|
||||
|
||||
@@ -381,7 +381,16 @@ impl GpuBackend {
|
||||
match self {
|
||||
GpuBackend::Inline(s) => {
|
||||
s.sync_with_mmio();
|
||||
s.drain(mem, 4096)
|
||||
// GPUBUG-DRAIN-001: drain until target WPTR is reached (or
|
||||
// 900 ms deadline), mirroring canary's `WorkerThreadMain` and
|
||||
// the threaded `DrainFence` semantics. The previous fixed
|
||||
// 4096-packet budget hit the cap under `--parallel`, where
|
||||
// CPU runs many more PPC blocks per kernel-callback boundary
|
||||
// and the ring backs up past 4096 packets before vd_swap
|
||||
// fires; the safety-net fallback warning fired twice for
|
||||
// each Sylpheed run.
|
||||
let target = s.mmio.cp_rb_wptr.load(Ordering::Acquire);
|
||||
s.drain_until_wptr(mem, target, Duration::from_millis(900))
|
||||
}
|
||||
GpuBackend::Threaded(h) => {
|
||||
let target_wptr = h.mmio.cp_rb_wptr.load(Ordering::Acquire);
|
||||
@@ -563,6 +572,23 @@ impl GpuWorker {
|
||||
ExecOutcome::Idle | ExecOutcome::Blocked => break,
|
||||
}
|
||||
}
|
||||
// GPUBUG-DRAIN-001: publish the digest mirror BEFORE
|
||||
// replying so the CPU's post-drain `digest_snapshot`
|
||||
// observes the `swaps_seen` bump from any
|
||||
// PM4_XE_SWAP we just consumed. Without this the
|
||||
// outer-loop publish at step 5b races the CPU's
|
||||
// post_swap_counter check and the kernel-side
|
||||
// `vd_swap` fires the "PM4_XE_SWAP not consumed"
|
||||
// safety-net warning even when the swap landed.
|
||||
let snap = GpuDigestSnapshot {
|
||||
stats: self.system.stats.clone(),
|
||||
shader_blobs_live: self.system.shader_blobs.len() as u64,
|
||||
texture_cache_entries: self.system.texture_cache.len() as u64,
|
||||
texture_decodes: self.system.texture_cache.decodes_total,
|
||||
};
|
||||
if let Ok(mut g) = self.digest.lock() {
|
||||
*g = snap;
|
||||
}
|
||||
let _ = reply_tx.send(());
|
||||
}
|
||||
GpuCommand::NotifyXeSwap {
|
||||
|
||||
Reference in New Issue
Block a user