fix(gpu): GPUBUG-006 — sync_with_mmio Acquire/Release pair the producer
The producer side (`mmio_region.rs:78`, the guest's CP_RB_WPTR MMIO write callback) uses `Ordering::Release` so any ring-memory writes the guest performed before bumping WPTR are visible to a paired `Acquire`-load on the consumer. The consumer here at `sync_with_mmio` was using `Ordering::Relaxed` for both the WPTR load and the RPTR mirror store — leaving the Release/Acquire pairing broken. Under `--parallel`, this broken pairing means the GPU worker can observe a fresh WPTR value while still reading stale ring-memory contents at the corresponding offsets — garbage PM4 packets. The audit's M11 grid run confirmed --parallel is non-deterministic beyond the documented `packets` ±5% noise; this fix is one strand of that. Symmetric fix on the RPTR mirror store: Release pairs with any guest-side Acquire-load of CP_RB_RPTR for ring-writeback bookkeeping. Verification at -n 100M lockstep: swaps: 2 → 2 (unchanged) draws: 0 → 0 (unchanged) packets: ~60M (within noise) Tests: 149 (no count change; this is a memory-ordering correctness fix, not a behavioral change visible at the digest level in lockstep). Closes GPUBUG-006 (P1). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -522,15 +522,27 @@ impl GpuSystem {
|
|||||||
/// per-round GPU hook: the guest may have advanced `CP_RB_WPTR` since
|
/// per-round GPU hook: the guest may have advanced `CP_RB_WPTR` since
|
||||||
/// we last ran, and we in turn reflect our read-pointer back to the
|
/// we last ran, and we in turn reflect our read-pointer back to the
|
||||||
/// mirror register so the guest sees progress.
|
/// mirror register so the guest sees progress.
|
||||||
|
///
|
||||||
|
/// GPUBUG-006: under `--parallel`, the producer (the guest CP_RB_WPTR
|
||||||
|
/// MMIO write) uses `Release` to publish prior ring-memory writes;
|
||||||
|
/// the consumer here must `Acquire`-load to pair correctly. With
|
||||||
|
/// Relaxed-on-load, ring-memory writes that the guest performed
|
||||||
|
/// before bumping WPTR could be reordered past our subsequent reads
|
||||||
|
/// — leading to garbage PM4 packet contents. The producer side at
|
||||||
|
/// `mmio_region.rs:78` already uses Release; the consumer's Relaxed
|
||||||
|
/// was the missing half. Symmetrically, the RPTR mirror store
|
||||||
|
/// publishes our read progress to the guest and benefits from a
|
||||||
|
/// Release.
|
||||||
pub fn sync_with_mmio(&mut self) {
|
pub fn sync_with_mmio(&mut self) {
|
||||||
let wptr_dwords = self.mmio.cp_rb_wptr.load(Ordering::Relaxed);
|
let wptr_dwords = self.mmio.cp_rb_wptr.load(Ordering::Acquire);
|
||||||
if wptr_dwords != self.ring.write_offset_dwords && self.ring.size_dwords != 0 {
|
if wptr_dwords != self.ring.write_offset_dwords && self.ring.size_dwords != 0 {
|
||||||
self.ring.write_offset_dwords = wptr_dwords % self.ring.size_dwords;
|
self.ring.write_offset_dwords = wptr_dwords % self.ring.size_dwords;
|
||||||
}
|
}
|
||||||
// Mirror our read pointer.
|
// Mirror our read pointer (Release pairs with any guest-side
|
||||||
|
// Acquire-load of CP_RB_RPTR for ring writeback bookkeeping).
|
||||||
self.mmio
|
self.mmio
|
||||||
.cp_rb_rptr
|
.cp_rb_rptr
|
||||||
.store(self.ring.read_offset_dwords, Ordering::Relaxed);
|
.store(self.ring.read_offset_dwords, Ordering::Release);
|
||||||
}
|
}
|
||||||
|
|
||||||
/// True iff `execute_one` is expected to make progress without blocking.
|
/// True iff `execute_one` is expected to make progress without blocking.
|
||||||
|
|||||||
Reference in New Issue
Block a user