Adds a regression-catcher golden for Sylpheed boot at -n 50M lockstep,
covering the first VdSwap pair (the n2m oracle is swap-blind because
the first VdSwap fires at ~18M instructions). The new --stable-digest
flag emits/compares only fields that are deterministic in lockstep:
instructions, imports, unimpl, draws, swaps,
unique_render_targets, shader_blobs_live, texture_cache_entries
Excluded:
packets — empirically ±2-8% lockstep variance (GPU thread race per
audit M11)
resolves, interrupts_delivered, interrupts_dropped, texture_decodes —
scheduling-sensitive under --parallel
path — cwd-dependent
Empirical determinism: 3 consecutive lockstep -n 50M runs produce
byte-identical stable-digest output.
The n4b canonical-invocation golden the audit's recommended next sprint
also called for is deferred. Per audit memory `--parallel
--reservations-table` is pathologically slow (>32 min for -n 100M), so
-n 4B in that mode would be many hours per run, not the 5-15 min the
plan estimated. n4b will be captured one-shot post-renderer-unblock as
a manual artifact under audit-runs/post-fix/, not as a test golden. See
crates/xenia-app/tests/golden/README.md.
Test infrastructure:
- crates/xenia-app/tests/sylpheed_oracles.rs — invokes
CARGO_BIN_EXE_xenia-rs against the ISO. Path resolved via SYLPHEED_ISO
env var (skips gracefully if missing).
- #[ignore]-gated; run via:
cargo test --release -p xenia-app --test sylpheed_oracles \\
-- --ignored --nocapture
Closes ORACBUG-004 (P0). Partial: ORACBUG-006 (P1 deferred).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
73 lines
2.8 KiB
Markdown
73 lines
2.8 KiB
Markdown
# Sylpheed regression goldens
|
||
|
||
These JSON files anchor `xenia-rs check` digest output for Project Sylpheed.
|
||
|
||
## Files
|
||
|
||
| File | -n | Mode | Captures |
|
||
|------|----|------|----------|
|
||
| `sylpheed_n2m.json` | 2_000_000 | full digest | early boot (swaps=0, no rendering) |
|
||
| `sylpheed_n50m.json` | 50_000_000 | stable-digest | first VdSwap pair (swaps=2 post-Phase-A) |
|
||
|
||
## Stable-digest mode
|
||
|
||
`sylpheed_n50m.json` is captured with `--stable-digest`, which omits
|
||
timing-sensitive counters: `packets` (±2–8% lockstep noise from a GPU thread
|
||
race), `resolves`, `interrupts_delivered`, `interrupts_dropped`,
|
||
`texture_decodes`. The remaining fields are byte-identical across repeated
|
||
lockstep runs at a fixed -n.
|
||
|
||
`sylpheed_n2m.json` predates the stable-digest flag and uses full-digest
|
||
compare. It still works because at -n 2M the GPU pipeline has not produced any
|
||
packets yet — `packets=0` is trivially deterministic.
|
||
|
||
## Circularity hazard
|
||
|
||
Per ORACBUG-001/002/003, these goldens were captured by running the same code
|
||
they validate. They detect **regression** from a known-good snapshot, not
|
||
**correctness**. When a planned fix intentionally moves the digest (e.g. a
|
||
shader fix landing `draws > 0` for the first time), re-baseline the golden as
|
||
a separate commit and reference the audit ID in the message.
|
||
|
||
## Re-baselining
|
||
|
||
```sh
|
||
cargo build --release -p xenia-app
|
||
target/release/xenia-rs check \
|
||
"$SYLPHEED_ISO" \
|
||
-n 50000000 \
|
||
--stable-digest \
|
||
--out crates/xenia-app/tests/golden/sylpheed_n50m.json
|
||
```
|
||
|
||
## Running the goldens
|
||
|
||
```sh
|
||
cargo test --release -p xenia-app --test sylpheed_oracles -- --ignored --nocapture
|
||
```
|
||
|
||
The tests are `#[ignore]`-gated because each run takes a few seconds, which is
|
||
unacceptable in the default `cargo test` cycle. The ISO path defaults to the
|
||
contributor's local `~/RE Project Sylpheed/Project Sylpheed*.iso` and can be
|
||
overridden via `SYLPHEED_ISO=/path/to/sylpheed.iso`.
|
||
|
||
## n4b canonical-invocation regression anchor (deferred)
|
||
|
||
The audit's recommended next sprint also called for a `sylpheed_n4b.json`
|
||
golden capturing the canonical reference invocation
|
||
`xenia-rs check sylpheed.iso -n 4_000_000_000 --parallel --reservations-table`.
|
||
This is **deferred** because:
|
||
|
||
1. The `--parallel --reservations-table` combination is empirically pathologically
|
||
slow at -n 100M (>32 min per run per the audit memory). At -n 4B the run cost
|
||
is many hours, not the single-session-friendly 5–15 min the original plan
|
||
estimated.
|
||
2. Each phase that intentionally moves rendering counters (C, D, E, F) would
|
||
need a re-baseline of n4b — a significant time cost compounding over the
|
||
sprint.
|
||
|
||
Once the renderer-unblock phases (C+D+E) land and `draws > 0` is confirmed at
|
||
-n 100M lockstep, an n4b artifact may be captured one-shot and stored under
|
||
`audit-runs/post-fix/` (not as a test golden) as a manual regression anchor for
|
||
the canonical invocation.
|