fix(kernel): KRNBUG-D08 — wall-clock v-sync under --parallel

The synthetic v-sync ticker used a per-instruction proxy
(VSYNC_INSTR_PERIOD = 150 k) tuned for ~10 MIPS lockstep
throughput → 60 Hz. Audit M11 observed this drifts under
`--parallel`: with 6 worker threads sharing the kernel mutex,
the dispatcher executes more PPC instructions per tick
callback, so the accumulator never crosses 150 k. Result:
~629 v-syncs/100M lockstep → ~2 v-syncs/100M --parallel.

Hybrid solution preserves lockstep determinism (which the
goldens depend on) while fixing --parallel:

* `tick_vsync_instr(instr_count)` — legacy instruction-count
  ticker, used by lockstep. Bit-stable across runs.

* `tick_vsync_wallclock()` — new Instant-based ticker. Fires
  `floor(elapsed / VSYNC_PERIOD)` v-syncs since the anchor
  and advances the anchor by that many full periods (no
  lazy backlog). Capped at INTERRUPT_QUEUE_CAP per call so a
  forward-jumping clock can't overflow the FIFO.

* `KernelState.parallel_active` flag set at startup from
  `--parallel` / `XENIA_PARALLEL=1`. Read by `coord_pre_round`
  in main.rs to choose between the two tickers.

Verification:

* cargo test --workspace --release: 561 passing (+3 new
  wall-clock tests vs prior 558 baseline).
* lockstep -n 100M --stable-digest: BIT-IDENTICAL to
  pre-Phase-3 baseline. interrupts_delivered preserved at
  ~630 (was ~629 pre-fix).
* --parallel --reservations-table -n 30M: interrupts_delivered
  rose from ~2 to 17. (FIFO INTERRUPT_QUEUE_CAP=4 still caps
  burst delivery; that's a separate bottleneck — addressed
  by raising cap when --parallel queue depth becomes the
  next blocker.)

Trade-off: --parallel runs are non-deterministic at the
v-sync rate by design (per audit M05 PPCBUG-703 already).
Lockstep stays bit-identical, so the `sylpheed_n*m.json`
goldens are untouched.

Audit IDs: KRNBUG-D08 (closed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-05-03 17:34:30 +02:00
parent b82919bdd0
commit 27d3608174
3 changed files with 144 additions and 23 deletions

View File

@@ -786,6 +786,7 @@ fn cmd_exec_inner(
v == "1" || v == "true" || v == "yes"
});
let parallel_active = parallel || parallel_via_env;
kernel.parallel_active = parallel_active;
if reservations_table || reservations_via_env || parallel_active {
kernel.reservations.enable();
if !quiet {
@@ -1517,7 +1518,27 @@ fn coord_pre_round(
);
}
if kernel.interrupts.tick_vsync(stats.instruction_count) {
// KRNBUG-D08: backend-aware v-sync ticker.
//
// **Lockstep**: instruction-count ticker (deterministic; one tick per
// PPC block boundary, predictable cadence). The cadence drifts a bit
// from real 60 Hz but is bit-stable across runs, which matters for
// the `sylpheed_n*m.json` golden oracles.
//
// **--parallel**: wall-clock ticker. The instruction-count proxy
// dropped from 629 v-syncs/100M lockstep to ~2 under `--parallel`
// (audit M11) because the dispatcher executes more PPC instructions
// per tick callback when 6 worker threads share the kernel mutex,
// so the accumulator never crosses the 150k threshold. Wall-clock
// restores the ~60 Hz rate at the cost of bit-exact run reproducibility,
// which is acceptable under `--parallel` (M11 already documented
// `--parallel` as non-deterministic by design).
let fired = if kernel.parallel_active {
kernel.interrupts.tick_vsync_wallclock()
} else {
kernel.interrupts.tick_vsync_instr(stats.instruction_count)
};
if fired {
use std::sync::atomic::Ordering;
let mmio = kernel.gpu.mmio();
let prev = mmio.d1mode_vblank_vline_status.load(Ordering::Relaxed);