[iterate-3AJ] Present-anchor vsync so the splash logo fade-in renders

The publisher/dev splash logo's intro fade-IN was skipped: the logo
popped in at full brightness instead of ramping dim->bright like the
canary oracle. Root (measured, iterate-3AF/3AI): ours' guest vsync
counter is fed by a fixed-instruction-quantum proxy (one vsync per
150k retired instructions). During the ~1.1s splash asset-load the
title's frame pump runs ~10M instructions inside a single guest frame,
so the proxy fired ~66 vsyncs in that one frame. The pump's per-frame
delta (counter_now - counter_last) was therefore ~66 on the first tick,
which the anim tick (sub_823CDBF8) divides into the fade counter
[item+72] @ 0x40c0add0 -> the counter JUMPED 0->0x42(66) in one step,
landing past the fade-in region. Canary's wall-clock 60Hz vblank
advances ~1 per heavy load frame, so its counter ramps smoothly 0->66
and the fade-in renders.

Fix: anchor the lockstep vsync ticker to the guest's real present rate
(VdSwap count), mirroring real hardware where the title double-buffers
at vblank, so one heavy guest frame advances the vsync counter by ~1
instead of ~66.

- interrupts.rs: tick_vsync_instr now takes the live present count.
  Two regimes: (1) bootstrap, before the guest's first present, keeps
  the original fixed instruction quantum unchanged -- the iterate-2W
  present-loop bootstrap needs vsyncs delivered BEFORE it can present
  (measured: callback registered ~6M instr, first delivered vsync and
  first present coincide; pure present-driven vsync would deadlock).
  (2) present-anchored, after the first present: one vblank per present,
  plus a small DRY_FALLBACK_CAP=4 instruction-quantum fallback per dry
  window so a non-presenting frame still ticks a few vsyncs (a small
  ramp like canary's 0/5/10/2/1...) without re-spiking to 66.
- handle.rs: cheap GpuBackend::swaps_seen() accessor.
- main.rs: pass the live present count into the lockstep ticker.

Not masking: the fade dt/counter is never clamped or synthesized; the
guest naturally computes a smooth dt once vblank tracks presents.

Verified:
- V1: fade counter 0x40c0add0 now ramps 0,6,8,10,12,13,+1... (was a
  0->0x42 jump; direct baseline-vs-fix mem-watch).
- V2 (--ui readback via per-frame logo vertex-alpha): logo alpha ramps
  102,136,204,221,238,254 (dim->bright fade-IN) vs baseline all 255
  (pop-in). Real artwork (has_real_vertices) still renders; milestone-1
  intact.
- V3: 150M boot progression intact -- texture_decodes=2, RTs=2,
  tex_cache=1 unchanged; draws/swaps higher (tighter present loop),
  1B sanity linear, no stall/collapse.
- V4: 50M --gpu-inline --stable-digest byte-identical 2x; golden
  re-baselined intentionally (pacing-only delta: draws 718->1274,
  swaps 147->259; structural fields unchanged). 688 tests green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-19 21:01:33 +02:00
parent c62a355418
commit 9d24dd0eaa
4 changed files with 189 additions and 19 deletions

View File

@@ -2151,7 +2151,13 @@ fn coord_pre_round(
let fired = if kernel.parallel_active { let fired = if kernel.parallel_active {
kernel.interrupts.tick_vsync_wallclock() kernel.interrupts.tick_vsync_wallclock()
} else { } else {
kernel.interrupts.tick_vsync_instr(stats.instruction_count) // iterate-3AJ: present-anchored — pass the guest's live present
// (`VdSwap`) count so vsync tracks the real present rate once the
// guest is presenting (≈1 vblank/present), instead of firing a
// fixed instruction quantum that over-fires ~66× during one heavy
// splash asset-load frame and collapsed the logo fade-in.
let presents = kernel.gpu.swaps_seen();
kernel.interrupts.tick_vsync_instr(stats.instruction_count, presents)
}; };
if fired { if fired {
use std::sync::atomic::Ordering; use std::sync::atomic::Ordering;

View File

@@ -1,9 +1,9 @@
{ {
"instructions": 50000014, "instructions": 50000007,
"imports": 352251, "imports": 333453,
"unimpl": 0, "unimpl": 0,
"draws": 718, "draws": 1274,
"swaps": 147, "swaps": 259,
"unique_render_targets": 2, "unique_render_targets": 2,
"shader_blobs_live": 6, "shader_blobs_live": 6,
"texture_cache_entries": 1 "texture_cache_entries": 1

View File

@@ -444,6 +444,23 @@ impl GpuBackend {
} }
} }
/// Current guest present (`VdSwap`) count. Cheap single-field read used
/// by the present-anchored vsync ticker (iterate-3AJ) every scheduler
/// round. Inline mode reads the live counter directly; threaded mode
/// reads the last-published digest mirror under a brief lock (the
/// `--parallel` path uses the wall-clock vsync ticker anyway, so the
/// exact freshness here is not load-bearing).
pub fn swaps_seen(&self) -> u64 {
match self {
GpuBackend::Inline(s) => s.stats.swaps_seen,
GpuBackend::Threaded(h) => h
.digest
.lock()
.map(|d| d.stats.swaps_seen)
.unwrap_or(0),
}
}
/// Forward [`GpuSystem::has_pending_interrupts`] under inline mode; /// Forward [`GpuSystem::has_pending_interrupts`] under inline mode;
/// under threaded mode peek the `int_rx` channel. /// under threaded mode peek the `int_rx` channel.
pub fn has_pending_interrupts(&self) -> bool { pub fn has_pending_interrupts(&self) -> bool {

View File

@@ -183,6 +183,28 @@ pub struct InterruptState {
/// ticker. `tick_vsync_instr` diffs against this to advance /// ticker. `tick_vsync_instr` diffs against this to advance
/// `vsync_accumulator`. /// `vsync_accumulator`.
pub last_instr_count: u64, pub last_instr_count: u64,
/// **iterate-3AJ — present-anchored vsync.** Set `true` once the guest
/// has presented at least one frame (a `VdSwap`). Before this, the
/// vsync ticker uses the legacy fixed instruction-quantum cadence so
/// the boot present-loop bootstrap (iterate-2W) still gets the vsyncs
/// it needs *before* the first present. After this, vsync is anchored
/// to the guest's real present rate (≈1 vblank per present, as on real
/// hardware where the title double-buffers at vblank), with only a
/// small capped instruction-quantum *fallback* for frames where the
/// guest genuinely stops presenting (heavy asset load). This stops the
/// proxy from firing ~66 vsyncs during one heavy load frame, which
/// collapsed the splash-logo intro fade-in (the guest's vsync counter
/// jumped 0→66 in one frame instead of ramping smoothly).
pub vsync_present_anchored: bool,
/// Last observed guest present (`VdSwap`) count. `tick_vsync_instr`
/// diffs the live count against this each call to emit one vblank per
/// new present once `vsync_present_anchored` is set.
pub last_present_count: u64,
/// How many *fallback* (non-present-driven) vsyncs have fired in the
/// current dry (no-present) window. Reset to 0 whenever a present
/// occurs. Capped at [`DRY_FALLBACK_CAP`] so one heavy non-presenting
/// frame cannot fire a long burst of vsyncs (the fade-in regression).
pub dry_fallback_fired: u32,
/// Wall-clock anchor for the production v-sync ticker. `None` until /// Wall-clock anchor for the production v-sync ticker. `None` until
/// the first `tick_vsync_wallclock` call (lazy init so unit tests /// the first `tick_vsync_wallclock` call (lazy init so unit tests
/// that never invoke that function don't construct an Instant). /// that never invoke that function don't construct an Instant).
@@ -208,6 +230,21 @@ pub struct InterruptState {
/// determinism. /// determinism.
pub const VSYNC_INSTR_PERIOD: u64 = 150_000; pub const VSYNC_INSTR_PERIOD: u64 = 150_000;
/// **iterate-3AJ — present-anchored vsync fallback.**
///
/// Once the guest is in its present loop (`vsync_present_anchored`), each
/// guest present emits exactly one vblank — vsync *is* the present cadence,
/// as on real Xbox 360 hardware where the title double-buffers at vblank.
/// For a frame where the guest stops presenting (e.g. the ~1.1 s splash
/// asset-load), we still need *some* vsyncs to keep timers / the present
/// loop alive, but firing one per [`VSYNC_INSTR_PERIOD`] would reproduce the
/// ~66-vsync spike that collapsed the fade-in. So the fallback fires one
/// vblank per `VSYNC_INSTR_PERIOD` of *non-presenting* instructions, but at
/// most [`DRY_FALLBACK_CAP`] per dry window (the counter resets on each
/// present). A heavy load frame therefore advances the guest vsync counter
/// by ≤ `DRY_FALLBACK_CAP` (a small ramp like canary's 0/5/10/2/1…), not 66.
pub const DRY_FALLBACK_CAP: u32 = 4;
/// Wall-clock period for the **production** v-sync ticker. 16.667 ms /// Wall-clock period for the **production** v-sync ticker. 16.667 ms
/// targets exactly 60 Hz. KRNBUG-D08 — converting from the /// targets exactly 60 Hz. KRNBUG-D08 — converting from the
/// instruction-count proxy fixes the `--parallel` rate drop while /// instruction-count proxy fixes the `--parallel` rate drop while
@@ -254,23 +291,83 @@ impl InterruptState {
self.pending.pop_front().map(|(source, _)| source) self.pending.pop_front().map(|(source, _)| source)
} }
/// **Legacy** — instruction-count v-sync ticker. Kept for unit tests /// **Present-anchored** instruction-paced v-sync ticker (the lockstep
/// that need a deterministic clock source. Production code calls /// production path; also used by unit tests for a deterministic clock).
/// `tick_vsync_wallclock` instead. Returns `true` if at least one ///
/// v-sync was queued. /// `current_instr_count` is the running retired-instruction count.
pub fn tick_vsync_instr(&mut self, current_instr_count: u64) -> bool { /// `present_count` is the guest's running `VdSwap` count (monotonic).
///
/// Two regimes:
///
/// 1. **Bootstrap** (`!vsync_present_anchored`, i.e. before the guest's
/// first present): legacy fixed-quantum cadence — one vsync per
/// [`VSYNC_INSTR_PERIOD`] retired instructions. The boot present loop
/// (iterate-2W) needs vsyncs delivered *before* it can present, so
/// this regime is unchanged from the original ticker. The first
/// observed present flips `vsync_present_anchored`.
///
/// 2. **Present-anchored** (after the first present): one vblank per
/// guest present (vsync *is* the present cadence on real hardware),
/// plus a small capped instruction-quantum fallback ([`DRY_FALLBACK_CAP`]
/// per dry window) so a frame where the guest stops presenting (heavy
/// asset load) still ticks a *few* vsyncs — not ~66, which collapsed
/// the splash fade-in.
///
/// Returns `true` if at least one v-sync was queued.
pub fn tick_vsync_instr(&mut self, current_instr_count: u64, present_count: u64) -> bool {
let delta = current_instr_count.saturating_sub(self.last_instr_count); let delta = current_instr_count.saturating_sub(self.last_instr_count);
self.last_instr_count = current_instr_count; self.last_instr_count = current_instr_count;
self.vsync_accumulator = self.vsync_accumulator.saturating_add(delta); self.vsync_accumulator = self.vsync_accumulator.saturating_add(delta);
if self.vsync_accumulator < VSYNC_INSTR_PERIOD {
return false; let new_presents = present_count.saturating_sub(self.last_present_count);
self.last_present_count = present_count;
if new_presents > 0 {
self.vsync_present_anchored = true;
} }
let periods = self.vsync_accumulator / VSYNC_INSTR_PERIOD;
self.vsync_accumulator %= VSYNC_INSTR_PERIOD; // Regime 1 — bootstrap: legacy fixed instruction quantum. Preserves
for _ in 0..periods { // the iterate-2W present-loop bootstrap exactly (vsyncs must fire
// before the guest can present).
if !self.vsync_present_anchored {
if self.vsync_accumulator < VSYNC_INSTR_PERIOD {
return false;
}
let periods = self.vsync_accumulator / VSYNC_INSTR_PERIOD;
self.vsync_accumulator %= VSYNC_INSTR_PERIOD;
for _ in 0..periods {
self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
}
return true;
}
// Regime 2 — present-anchored.
let mut queued = false;
if new_presents > 0 {
// One vblank per guest present. `queue_interrupt` caps the FIFO,
// so a burst of presents in one round can't flood. A fresh
// present resets the dry-window state.
for _ in 0..new_presents {
self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
}
self.vsync_accumulator = 0;
self.dry_fallback_fired = 0;
queued = true;
} else if self.vsync_accumulator >= VSYNC_INSTR_PERIOD
&& self.dry_fallback_fired < DRY_FALLBACK_CAP
{
// Dry frame (no present this tick): the guest stopped presenting
// (heavy load). Tick a *capped* number of fallback vsyncs so
// timers/the present loop stay alive without re-introducing the
// ~66-vsync spike. Consume one period per fired vsync so the
// accumulator paces the few fallbacks.
self.vsync_accumulator -= VSYNC_INSTR_PERIOD;
self.dry_fallback_fired += 1;
self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU); self.queue_interrupt(INTERRUPT_SOURCE_VSYNC, VSYNC_TARGET_CPU);
queued = true;
} }
true
queued
} }
/// **Production** — wall-clock v-sync ticker. Fires /// **Production** — wall-clock v-sync ticker. Fires
@@ -364,9 +461,10 @@ mod tests {
let mut s = InterruptState::default(); let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB); s.set_callback(0x1000, 0xAB);
assert_eq!(VSYNC_INSTR_PERIOD, 150_000); assert_eq!(VSYNC_INSTR_PERIOD, 150_000);
assert!(!s.tick_vsync_instr(VSYNC_INSTR_PERIOD - 1)); // present_count = 0 → bootstrap regime (legacy fixed quantum).
assert!(!s.tick_vsync_instr(VSYNC_INSTR_PERIOD - 1, 0));
assert!(s.pending.is_empty()); assert!(s.pending.is_empty());
assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD)); assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD, 0));
assert_eq!(s.peek_next(), Some(INTERRUPT_SOURCE_VSYNC)); assert_eq!(s.peek_next(), Some(INTERRUPT_SOURCE_VSYNC));
} }
@@ -376,10 +474,59 @@ mod tests {
// be delivered, not lost. // be delivered, not lost.
let mut s = InterruptState::default(); let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB); s.set_callback(0x1000, 0xAB);
assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD * 3 + 10)); // present_count = 0 → bootstrap regime drains all 3 periods at once.
assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD * 3 + 10, 0));
assert_eq!(s.pending.len(), 3); assert_eq!(s.pending.len(), 3);
} }
#[test]
fn tick_vsync_instr_present_anchors_after_first_present() {
// iterate-3AJ: once the guest presents, vsync tracks presents (one
// vblank per present), NOT the fixed instruction quantum.
let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB);
// Bootstrap: instruction quantum fires (present_count still 0).
assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD, 0));
assert_eq!(s.pending.len(), 1);
let _ = s.take_next();
// First present flips to anchored: exactly one vblank for the present.
assert!(s.tick_vsync_instr(VSYNC_INSTR_PERIOD * 2, 1));
assert!(s.vsync_present_anchored);
assert_eq!(s.pending.len(), 1);
let _ = s.take_next();
}
#[test]
fn tick_vsync_instr_heavy_dry_frame_capped_not_spiking() {
// iterate-3AJ: the regression. A heavy non-presenting frame retires
// ~10M instructions; the OLD ticker fired ~66 vsyncs (10M/150k) in
// that single frame, jumping the guest vsync counter 0→66 and
// skipping the fade-in. The present-anchored ticker caps the dry
// window at DRY_FALLBACK_CAP.
let mut s = InterruptState::default();
s.set_callback(0x1000, 0xAB);
// Enter anchored mode via one present.
let mut instr: u64 = VSYNC_INSTR_PERIOD;
assert!(s.tick_vsync_instr(instr, 1));
while s.take_next().is_some() {}
// Simulate a 10M-instruction frame with NO new present, ticked in
// chunks (as coord_pre_round would). Count fallback vsyncs queued.
let mut fallback = 0usize;
for _ in 0..100 {
instr += 100_000; // 100 chunks × 100k = 10M instructions
if s.tick_vsync_instr(instr, 1) {
while s.take_next().is_some() {
fallback += 1;
}
}
}
assert_eq!(
fallback, DRY_FALLBACK_CAP as usize,
"a heavy dry frame must cap fallback vsyncs at DRY_FALLBACK_CAP, \
not fire ~66"
);
}
#[test] #[test]
fn tick_vsync_wallclock_first_call_sets_anchor() { fn tick_vsync_wallclock_first_call_sets_anchor() {
// First call seeds the anchor and never fires. KRNBUG-D08: // First call seeds the anchor and never fires. KRNBUG-D08: