test(check): ORACBUG-004 — sylpheed_n50m stable-digest oracle

Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep, covering the first VdSwap pair (the n2m oracle is swap-blind because the first VdSwap fires at ~18M instructions). The new --stable-digest flag emits/compares only fields that are deterministic in lockstep: instructions, imports, unimpl, draws, swaps, unique_render_targets, shader_blobs_live, texture_cache_entries Excluded: packets — empirically ±2-8% lockstep variance (GPU thread race per audit M11) resolves, interrupts_delivered, interrupts_dropped, texture_decodes — scheduling-sensitive under --parallel path — cwd-dependent Empirical determinism: 3 consecutive lockstep -n 50M runs produce byte-identical stable-digest output. The n4b canonical-invocation golden the audit's recommended next sprint also called for is deferred. Per audit memory `--parallel --reservations-table` is pathologically slow (>32 min for -n 100M), so -n 4B in that mode would be many hours per run, not the 5-15 min the plan estimated. n4b will be captured one-shot post-renderer-unblock as a manual artifact under audit-runs/post-fix/, not as a test golden. See crates/xenia-app/tests/golden/README.md. Test infrastructure: - crates/xenia-app/tests/sylpheed_oracles.rs — invokes CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO env var (skips gracefully if missing). - #[ignore]-gated; run via: cargo test --release -p xenia-app --test sylpheed_oracles \\ -- --ignored --nocapture Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 13:46:02 +02:00
parent 62f673d094
commit 1f416aaa2e
4 changed files with 214 additions and 2 deletions
--- a/crates/xenia-app/tests/golden/README.md
+++ b/crates/xenia-app/tests/golden/README.md
@@ -0,0 +1,72 @@
+# Sylpheed regression goldens
+
+These JSON files anchor `xenia-rs check` digest output for Project Sylpheed.
+
+## Files
+
+| File | -n | Mode | Captures |
+|------|----|------|----------|
+| `sylpheed_n2m.json` | 2_000_000 | full digest | early boot (swaps=0, no rendering) |
+| `sylpheed_n50m.json` | 50_000_000 | stable-digest | first VdSwap pair (swaps=2 post-Phase-A) |
+
+## Stable-digest mode
+
+`sylpheed_n50m.json` is captured with `--stable-digest`, which omits
+timing-sensitive counters: `packets` (±2–8% lockstep noise from a GPU thread
+race), `resolves`, `interrupts_delivered`, `interrupts_dropped`,
+`texture_decodes`. The remaining fields are byte-identical across repeated
+lockstep runs at a fixed -n.
+
+`sylpheed_n2m.json` predates the stable-digest flag and uses full-digest
+compare. It still works because at -n 2M the GPU pipeline has not produced any
+packets yet — `packets=0` is trivially deterministic.
+
+## Circularity hazard
+
+Per ORACBUG-001/002/003, these goldens were captured by running the same code
+they validate. They detect **regression** from a known-good snapshot, not
+**correctness**. When a planned fix intentionally moves the digest (e.g. a
+shader fix landing `draws > 0` for the first time), re-baseline the golden as
+a separate commit and reference the audit ID in the message.
+
+## Re-baselining
+
+```sh
+cargo build --release -p xenia-app
+target/release/xenia-rs check \
+    "$SYLPHEED_ISO" \
+    -n 50000000 \
+    --stable-digest \
+    --out crates/xenia-app/tests/golden/sylpheed_n50m.json
+```
+
+## Running the goldens
+
+```sh
+cargo test --release -p xenia-app --test sylpheed_oracles -- --ignored --nocapture
+```
+
+The tests are `#[ignore]`-gated because each run takes a few seconds, which is
+unacceptable in the default `cargo test` cycle. The ISO path defaults to the
+contributor's local `~/RE Project Sylpheed/Project Sylpheed*.iso` and can be
+overridden via `SYLPHEED_ISO=/path/to/sylpheed.iso`.
+
+## n4b canonical-invocation regression anchor (deferred)
+
+The audit's recommended next sprint also called for a `sylpheed_n4b.json`
+golden capturing the canonical reference invocation
+`xenia-rs check sylpheed.iso -n 4_000_000_000 --parallel --reservations-table`.
+This is **deferred** because:
+
+1. The `--parallel --reservations-table` combination is empirically pathologically
+   slow at -n 100M (>32 min per run per the audit memory). At -n 4B the run cost
+   is many hours, not the single-session-friendly 5–15 min the original plan
+   estimated.
+2. Each phase that intentionally moves rendering counters (C, D, E, F) would
+   need a re-baseline of n4b — a significant time cost compounding over the
+   sprint.
+
+Once the renderer-unblock phases (C+D+E) land and `draws > 0` is confirmed at
+-n 100M lockstep, an n4b artifact may be captured one-shot and stored under
+`audit-runs/post-fix/` (not as a test golden) as a manual regression anchor for
+the canonical invocation.