Files
xenia-rs/crates/xenia-app/tests/golden/README.md
MechaCat02 1f416aaa2e test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle
Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep,
covering the first VdSwap pair (the n2m oracle is swap-blind because
the first VdSwap fires at ~18M instructions). The new --stable-digest
flag emits/compares only fields that are deterministic in lockstep:

  instructions, imports, unimpl, draws, swaps,
  unique_render_targets, shader_blobs_live, texture_cache_entries

Excluded:

  packets — empirically ±2-8% lockstep variance (GPU thread race per
    audit M11)
  resolves, interrupts_delivered, interrupts_dropped, texture_decodes —
    scheduling-sensitive under --parallel
  path — cwd-dependent

Empirical determinism: 3 consecutive lockstep -n 50M runs produce
byte-identical stable-digest output.

The n4b canonical-invocation golden the audit's recommended next sprint
also called for is deferred. Per audit memory `--parallel
--reservations-table` is pathologically slow (>32 min for -n 100M), so
-n 4B in that mode would be many hours per run, not the 5-15 min the
plan estimated. n4b will be captured one-shot post-renderer-unblock as
a manual artifact under audit-runs/post-fix/, not as a test golden. See
crates/xenia-app/tests/golden/README.md.

Test infrastructure:
- crates/xenia-app/tests/sylpheed_oracles.rs — invokes
  CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO
  env var (skips gracefully if missing).
- #[ignore]-gated; run via:
    cargo test --release -p xenia-app --test sylpheed_oracles \\
      -- --ignored --nocapture

Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:46:02 +02:00

2.8 KiB
Raw Permalink Blame History

Sylpheed regression goldens

These JSON files anchor xenia-rs check digest output for Project Sylpheed.

Files

File -n Mode Captures
sylpheed_n2m.json 2_000_000 full digest early boot (swaps=0, no rendering)
sylpheed_n50m.json 50_000_000 stable-digest first VdSwap pair (swaps=2 post-Phase-A)

Stable-digest mode

sylpheed_n50m.json is captured with --stable-digest, which omits timing-sensitive counters: packets (±28% lockstep noise from a GPU thread race), resolves, interrupts_delivered, interrupts_dropped, texture_decodes. The remaining fields are byte-identical across repeated lockstep runs at a fixed -n.

sylpheed_n2m.json predates the stable-digest flag and uses full-digest compare. It still works because at -n 2M the GPU pipeline has not produced any packets yet — packets=0 is trivially deterministic.

Circularity hazard

Per ORACBUG-001/002/003, these goldens were captured by running the same code they validate. They detect regression from a known-good snapshot, not correctness. When a planned fix intentionally moves the digest (e.g. a shader fix landing draws > 0 for the first time), re-baseline the golden as a separate commit and reference the audit ID in the message.

Re-baselining

cargo build --release -p xenia-app
target/release/xenia-rs check \
    "$SYLPHEED_ISO" \
    -n 50000000 \
    --stable-digest \
    --out crates/xenia-app/tests/golden/sylpheed_n50m.json

Running the goldens

cargo test --release -p xenia-app --test sylpheed_oracles -- --ignored --nocapture

The tests are #[ignore]-gated because each run takes a few seconds, which is unacceptable in the default cargo test cycle. The ISO path defaults to the contributor's local ~/RE Project Sylpheed/Project Sylpheed*.iso and can be overridden via SYLPHEED_ISO=/path/to/sylpheed.iso.

n4b canonical-invocation regression anchor (deferred)

The audit's recommended next sprint also called for a sylpheed_n4b.json golden capturing the canonical reference invocation xenia-rs check sylpheed.iso -n 4_000_000_000 --parallel --reservations-table. This is deferred because:

  1. The --parallel --reservations-table combination is empirically pathologically slow at -n 100M (>32 min per run per the audit memory). At -n 4B the run cost is many hours, not the single-session-friendly 515 min the original plan estimated.
  2. Each phase that intentionally moves rendering counters (C, D, E, F) would need a re-baseline of n4b — a significant time cost compounding over the sprint.

Once the renderer-unblock phases (C+D+E) land and draws > 0 is confirmed at -n 100M lockstep, an n4b artifact may be captured one-shot and stored under audit-runs/post-fix/ (not as a test golden) as a manual regression anchor for the canonical invocation.