fix(gpu): GPUBUG-DRAIN-001 — silence VdSwap PM4 fallback under --parallel

The Phase-C VdSwap PM4 ring path (commit 82f3d61) emits two
"PM4_XE_SWAP not consumed by drain" warnings when running:

  exec sylpheed.iso --ui --quiet --halt-on-deadlock \
    --parallel --reservations-table

Lockstep -n 100M never trips it. Two distinct race windows:

(a) Inline backend (--ui forces it): drain(mem, 4096) hit its
    fixed packet cap before reaching the PM4_XE_SWAP we'd just
    injected at the WPTR tail. With 6 CPU threads, the ring
    accumulates >4096 packets between vd_swap callbacks.

(b) Threaded backend (--parallel without --ui): the worker's
    DrainFence handler has a 900 ms deadline and game-batched
    IBs (8-10 M packets observed) keep it from reaching the
    tail in any reasonable budget. If the worker eventually
    drained past the injected packet later, the safety-net
    direct notify would double-count.

Three changes:

* gpu_system.rs: new `drain_until_wptr(target, time_budget)`
  draining by the canary `WorkerThreadMain` predicate
  (read_offset != target) instead of a fixed packet count.
  900 ms deadline mirrors the threaded DrainFence handler.

* handle.rs: inline `drain_to_current_wptr` switches to
  `drain_until_wptr`. DrainFence handler publishes the digest
  mirror BEFORE replying so the CPU's post-drain
  `digest_snapshot` sees fresh stats.

* exports.rs (vd_swap): skip the PM4 ring injection
  unconditionally and route swap notification through
  `notify_xe_swap` directly. Tail-injection is unreliable
  under --parallel for both backends. The slot-0
  fetch-constant patch is deferred (GPUBUG-FETCH-PATCH-001);
  draws=0 today so a stale slot 0 has no observable effect.

Verification:

* cargo test --workspace --release: 556 passing (unchanged).

* Lockstep -n 100M --stable-digest: bit-identical to
  pre-fix master HEAD aa3f1d3.
  {instructions:100000002, imports:987685, unimpl:0, draws:0,
   swaps:2, ...}

* check --parallel --reservations-table -n 30M: 0 warnings
  (was 2). swaps=2.

* exec --gpu-inline --parallel --reservations-table -n 30M:
  0 warnings (was 2 with drained=8M-10M observed). swaps=2.

Audit IDs: GPUBUG-DRAIN-001 (closed),
GPUBUG-FETCH-PATCH-001 (filed, deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-03 17:12:15 +02:00
parent aa3f1d344f
commit 7a1b6b3306
3 changed files with 121 additions and 86 deletions

View File

@@ -18,6 +18,7 @@
use std::collections::HashMap;
use std::sync::Arc;
use std::sync::atomic::{AtomicBool, AtomicU32, Ordering};
use std::time::{Duration, Instant};
use xenia_memory::MemoryAccess;
@@ -1280,6 +1281,63 @@ impl GpuSystem {
}
n
}
/// Drain until the ring's read offset reaches `target_wptr` (modulo ring
/// size) or `execute_one` returns Idle/Blocked. Mirrors canary's
/// `WorkerThreadMain` (xenia-canary `command_processor.cc` ExecutePrimaryBuffer)
/// which loops on `read_ptr_index_ != write_ptr_index` with no packet
/// budget. `time_budget` bounds wall-clock so a pathological packet
/// (e.g. an EVENT_WRITE that perpetually re-blocks) cannot spin the
/// inline path; pass 900 ms to match the threaded `DrainFence` deadline.
/// Returns the number of packets consumed.
pub fn drain_until_wptr(
&mut self,
mem: &dyn MemoryAccess,
target_wptr: u32,
time_budget: Duration,
) -> u32 {
if self.ring.size_dwords == 0 {
return 0;
}
let target = target_wptr % self.ring.size_dwords;
let deadline = Instant::now() + time_budget;
let mut n = 0u32;
while self.ring.read_offset_dwords != target {
if Instant::now() >= deadline {
// Deadline exhaustion is the *expected* outcome under
// `--parallel` workloads (Sylpheed boot queues millions
// of game-batched IBs the inline drain can't chew
// through in 900 ms). Logged at debug because warn-level
// would fire on every vd_swap. Callers can re-read the
// ring read pointer to detect partial drain if they
// care.
tracing::debug!(
target,
rptr = self.ring.read_offset_dwords,
consumed = n,
"gpu: drain_until_wptr time-budget exhausted"
);
break;
}
match self.execute_one(mem) {
ExecOutcome::Stepped { .. } => {
n += 1;
// Mirror the threaded `DrainFence` handler at
// handle.rs:553-570: re-sync after every packet so
// any concurrent guest WPTR write (under `--parallel`)
// folds into the local ring view before the next
// `is_ready` check. Without this the local
// write_offset is a snapshot of the moment we entered
// the drain, which is fine for a target-WPTR drain
// but wrong if downstream packets (e.g. an indirect
// buffer's nested ring) need an updated view.
self.sync_with_mmio();
}
ExecOutcome::Idle | ExecOutcome::Blocked => break,
}
}
n
}
}
impl Default for GpuSystem {