[iterate-2E] Extend coherent monotonic clock to lockstep (timebase-desync livelock fix)

Lockstep livelocked the scheduler the same way --parallel did before
0332d19: the kernel deadline-arithmetic (`now_basis_at`) read per-thread
`ctx(hw_id).timebase`, but a parked/poll thread has `running_idx == None`
so `Scheduler::ctx()` returns `idle_ctx` (timebase 0). A poll thread (tid=7,
a `KeWaitForSingleObject` loop with a 30ms relative timeout) computing its
deadline via `parse_timeout` therefore read `now = 0` and registered
`deadline = 0 + 3000 = 3000` — a constant ~7.78M units in the past.
`coord_idle_advance` then re-armed that same constant 3000 deadline forever,
pinning virtual time and starving every other thread's real future deadline.

Render-gate impact: the submitter (tid=6) re-enters a 16ms-timeout
WaitForMultiple after its first jobs; that timeout never fired because vtime
was pinned at 3000, so virtual time never reached real future deadlines.

Fix (Option A — mirror the parallel fix): drive the existing deterministic
`Scheduler::global_clock` in lockstep too (floored up once per outer round
to `stats.instruction_count`, a pure function of retired guest instructions —
no wall-clock), and route `KernelState::now_basis_at` through `global_clock()`
in BOTH modes. New `Scheduler::advance_global_clock_to(now)` floor-up keeps it
monotone alongside `advance_all_timebases_to`. Parallel behavior unchanged
(it already read `global_clock()`).

Verified (lockstep, 50M):
- DETERMINISM: two cold `check -n 5M` and two cold `-n 50M` runs byte-identical.
- LIVELOCK GONE: "advanced to deadline" went from 592,679 fires / 2 unique
  values / 562,084 pinned at 3000  ->  18,586 fires / 18,567 unique /
  0 pinned, strictly increasing 5.4M -> 50M. Poll thread tid=7 now ends
  Blocked with a real future deadline Some(60002824) instead of spin-Ready
  on the past 3000.
- imports 1,790,936 -> 92,317 at 50M (the spin no longer burns import calls).

Cascade (lockstep, XENIA_CACHE_PERSIST=1, -n 200M): engine now runs to budget
instead of hard-deadlocking. Hub enqueue (sub_82458068) 4x; submitter dequeue
(sub_82458508) still 3x — the lost 4th-job HANDOFF (count/notify between
sub_82458068's tail and the submitter queue) is a SEPARATE downstream gate,
not the timebase. New gate: tid=5 (hub) Blocked INFINITE on event 0x1080
(job-4 completion); tid=6 (submitter) Ready, parked in WaitForMultiple
(sub_824AB214), loop-top stops at cycle 6.23M. draws still 0, VdSwap 1.

Golden re-baseline (same commit): sylpheed_n50m
  instructions 50000004 -> 50000007, imports 1790936 -> 92317
  (swaps/draws/RTs/shaders/textures unchanged). sylpheed_n2m unchanged
  (livelock onsets after 2M). Suite 665/665 + oracle green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-12 21:42:28 +02:00
parent 5aaadfec36
commit 7e2603a9e5
4 changed files with 72 additions and 33 deletions

View File

@@ -2830,6 +2830,19 @@ fn run_execution(
// Both calls are no-ops when `XENIA_SILPH_UI_AUTOSIGNAL_DELAY`
// is unset (the pending queue stays empty).
kernel.set_now_cycle_hint(stats.instruction_count);
// Drive the coherent monotonic "now" the kernel deadline-arithmetic
// reads (`KernelState::now_basis_at` -> `Scheduler::global_clock`)
// from the deterministic retired-instruction count. Floored up (never
// backwards). This is the LOCKSTEP analogue of the parallel writeback's
// `advance_global_clock`: a parked/poll thread computing a relative
// timeout via `parse_timeout` now reads a real, non-zero, monotone
// basis instead of `idle_ctx`'s timebase-0, so its deadline lands in
// the future and `coord_idle_advance` stops re-arming the constant
// past deadline forever (the timebase-desync livelock / render-gate
// root). Pure function of guest instructions -> bit-reproducible.
kernel
.scheduler
.advance_global_clock_to(stats.instruction_count);
kernel.fire_due_silph_autosignals(stats.instruction_count);
dispatch_graphics_interrupts(
kernel,