fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel

The synthetic v-sync ticker used a per-instruction proxy (VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep throughput → 60 Hz. Audit M11 observed this drifts under `--parallel`: with 6 worker threads sharing the kernel mutex, the dispatcher executes more PPC instructions per tick callback, so the accumulator never crosses 150 k. Result: ~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel. Hybrid solution preserves lockstep determinism (which the goldens depend on) while fixing --parallel: * `tick_vsync_instr(instr_count)` — legacy instruction-count ticker, used by lockstep. Bit-stable across runs. * `tick_vsync_wallclock()` — new Instant-based ticker. Fires `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor and advances the anchor by that many full periods (no lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a forward-jumping clock can't overflow the FIFO. * `KernelState.parallel_active` flag set at startup from `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round` in main.rs to choose between the two tickers. Verification: * cargo test --workspace --release: 561 passing (+3 new wall-clock tests vs prior 558 baseline). * lockstep -n 100M --stable-digest: BIT-IDENTICAL to pre-Phase-3 baseline. interrupts_delivered preserved at ~630 (was ~629 pre-fix). * --parallel --reservations-table -n 30M: interrupts_delivered rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps burst delivery; that's a separate bottleneck — addressed by raising cap when --parallel queue depth becomes the next blocker.) Trade-off: --parallel runs are non-deterministic at the v-sync rate by design (per audit M05 PPCBUG-703 already). Lockstep stays bit-identical, so the `sylpheed_n*m.json` goldens are untouched. Audit IDs: KRNBUG-D08 (closed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 17:34:30 +02:00
parent b82919bdd0
commit 27d3608174
3 changed files with 144 additions and 23 deletions
--- a/crates/xenia-app/src/main.rs
+++ b/crates/xenia-app/src/main.rs
@@ -786,6 +786,7 @@ fn cmd_exec_inner(
            v == "1" || v == "true" || v == "yes"
        });
    let parallel_active = parallel || parallel_via_env;
+    kernel.parallel_active = parallel_active;
    if reservations_table || reservations_via_env || parallel_active {
        kernel.reservations.enable();
        if !quiet {
@@ -1517,7 +1518,27 @@ fn coord_pre_round(
        );
    }

-    if kernel.interrupts.tick_vsync(stats.instruction_count) {
+    // KRNBUG-D08: backend-aware v-sync ticker.
+    //
+    // **Lockstep**: instruction-count ticker (deterministic; one tick per
+    // PPC block boundary, predictable cadence). The cadence drifts a bit
+    // from real 60 Hz but is bit-stable across runs, which matters for
+    // the `sylpheed_n*m.json` golden oracles.
+    //
+    // **--parallel**: wall-clock ticker. The instruction-count proxy
+    // dropped from 629 v-syncs/100M lockstep to ~2 under `--parallel`
+    // (audit M11) because the dispatcher executes more PPC instructions
+    // per tick callback when 6 worker threads share the kernel mutex,
+    // so the accumulator never crosses the 150k threshold. Wall-clock
+    // restores the ~60 Hz rate at the cost of bit-exact run reproducibility,
+    // which is acceptable under `--parallel` (M11 already documented
+    // `--parallel` as non-deterministic by design).
+    let fired = if kernel.parallel_active {
+        kernel.interrupts.tick_vsync_wallclock()
+    } else {
+        kernel.interrupts.tick_vsync_instr(stats.instruction_count)
+    };
+    if fired {
        use std::sync::atomic::Ordering;
        let mmio = kernel.gpu.mmio();
        let prev = mmio.d1mode_vblank_vline_status.load(Ordering::Relaxed);