Files
xenia-rs/audit-runs/phase-b-state-equivalence/validation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

131 lines
7.1 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase B — Validation record
All gates executed on 2026-05-13 against the patched canary
(`build-cross/bin/Windows/Debug/xenia_canary.exe` + renamed
`xenia_canary_phaseB.exe`) and ours (`target/release/xenia-rs` + renamed
`target/release/xenia-rs-phaseB`).
## Gate 1: cvar-OFF determinism
### ours
- Pre-patch digest: `audit-runs/phase-a-diff-harness/digest-post-patch-cvaroff.json` (Phase A baseline; Phase A's gate-1 already proved byte-identity to the genuine pre-patch).
- Post-Phase-B digest: `audit-runs/phase-b-state-equivalence/digest-post-phaseB-cvaroff.json`.
- Both runs: `check --stable-digest -n 50000000` against the same ISO.
- `diff` of the two files produces zero output. Byte-identical. **PASS.**
### canary
- Phase B adds three new CONFIG DUMP lines (`phase_b_snapshot_dir = ""`, `phase_b_snapshot_and_exit = false`, `phase_b_dump_section_content = false`). All other lines either match Phase A's accepted host-pointer/timing jitter or are unchanged.
- Smoke marker (`AUDIT-DEMO-SETUP-BEGIN`) still fires.
- **PASS** by the Phase A gate-1 method.
## Gate 2: Snapshot files well-formed
### ours
```
$ ls audit-runs/phase-b-state-equivalence/snap-001/ours/
config.json cpu_state.json kernel.json manifest.json memory.json vfs.json
```
All six files parse as JSON, lead with `"schema_version": 1` (or contain it in manifest), and are alphabetically sort-keys-sorted (verified by re-serializing — `serde_json::Map` defaults to ordered). **PASS.**
### canary
```
$ ls audit-runs/phase-b-state-equivalence/snap-001/canary/
config.json cpu_state.json kernel.json manifest.json memory.json vfs.json
```
Same six files, same shape. Note: canary's `phase_b_snapshot.cc` writes JSON via direct `fmt::format` rather than a JSON map, so keys are emitted in **insertion order, not alphabetical order**. The diff tool parses to dict before comparing, so this asymmetry has no functional impact (verified empirically — `diff_state.py` produces identical reports across multiple runs of either engine). It does mean the canary↔ours manifest hashes differ even when the underlying state is semantically identical; the diff tool falls back to full content comparison in that case. **PASS** with this caveat documented.
## Gate 3: Hash-deterministic re-runs (ours)
Two runs of ours with identical args:
```
$ ./target/release/xenia-rs-phaseB exec --quiet \
--phase-b-snapshot-dir <dir> --phase-b-snapshot-and-exit <iso> # run 1
$ mv <dir>/ours <dir>/ours-a
$ ./target/release/xenia-rs-phaseB exec --quiet \
--phase-b-snapshot-dir <dir> --phase-b-snapshot-and-exit <iso> # run 2
$ diff -r <dir>/ours <dir>/ours-a && echo BYTE-IDENTICAL
BYTE-IDENTICAL
```
**PASS.** Re-running ours with the same args produces hash-identical snapshot files.
> The first re-run attempt produced a `config.json` mismatch because the
> two runs were given different `--phase-b-snapshot-dir` values (whose
> path string is embedded in `config.json::cvars.phase_b_snapshot_dir`).
> That field is in the diff tool's `SKIP_BY_FILE["config.json"]` skip
> set; the hash difference confirmed the skip rule is well-placed. With
> identical inputs the snapshots are byte-equal.
## Gate 4: Invariants (HARD GATE)
From `report.md`:
| invariant | canary | ours | ok? |
|---|---|---|---|
| xex_entry_point | `0x824ab748` | `0x824ab748` | **PASS** |
| cpu_state.pc == xex_entry_point | `0x824ab748 == 0x824ab748` (canary) | `0x824ab748 == 0x824ab748` (ours) | **PASS** |
| image_loaded_sha256 | `a70993b7…` | `ea8d160e…` | **FAIL → STOP** |
The PC + entry-point invariants prove the snapshot point is **equivalent across engines** — both fired immediately before the first instruction at the same address. This is the principal Phase B equivalence claim.
The `image_loaded_sha256` mismatch is the **expected STOP condition** per the spec. Phase B's contract is to detect and report this; investigation belongs to Phase C/D. The report.md flags it explicitly with re-run guidance.
## Gate 5: Diff-tool negative test
```
$ cp audit-runs/phase-b-state-equivalence/snap-001/ours/kernel.json /tmp/kernel-mut.json
$ sed -i 's/"thread_id": 1/"thread_id": 999/' /tmp/kernel-mut.json
$ mkdir -p /tmp/ours-mut && cp -r audit-runs/phase-b-state-equivalence/snap-001/ours/* /tmp/ours-mut/
$ cp /tmp/kernel-mut.json /tmp/ours-mut/kernel.json
$ python3 tools/diff-state/diff_state.py \
--canary audit-runs/phase-b-state-equivalence/snap-001/ours \
--ours /tmp/ours-mut --out /tmp/r.md
$ echo $?
1
```
Report.md names two divergences:
- `kernel.json <manifest>` `manifest-hash-mismatch` — surfaces that `/tmp/ours-mut/kernel.json`'s SHA does not match what `/tmp/ours-mut/manifest.json` claims.
- `kernel.json objects[handle_semantic_id=…].details.thread_id` value=`canary=1, ours=999` — the actual mutation.
**PASS.**
> Verified 2026-05-13 (Phase A/B verify session). Pre-fix the diff tool
> trusted the manifest-claimed hashes without verifying them; a tampered
> file with an intact manifest copy would silently report "identical"
> (exit 0). The fix in [`diff_state.py`](../../tools/diff-state/diff_state.py)
> (around `diff_directory`) re-hashes each file, surfaces a
> `manifest-hash-mismatch` σ-structural divergence when the on-disk SHA
> does not match the manifest, and falls through to a full content diff.
## Summary
| Gate | Status |
|---|---|
| 1. Cvar-OFF determinism (both engines) | PASS |
| 2. Snapshots well-formed (both engines) | PASS |
| 3. Hash-deterministic re-runs (ours) | PASS |
| 4. Invariants — pc == entry_point | PASS |
| 4. Invariants — image_loaded_sha256 | **FAIL → STOP** (expected: this is what Phase B catalogs) |
| 5. Diff-tool negative test | PASS |
## Cascade prediction at session close
- A (snapshot tool emits readable state both engines): **achieved**.
- B (section content hashes match): **NOT achieved**`image_loaded_sha256` differs. The XEX is loaded into different post-decompression states between the two engines. This is the primary finding that Phase C will investigate, *not* a Phase B failure.
- C (divergence catalog produced with classification): **achieved** — 58 divergences across all 5 files, fully classified.
- D (fix lands): **N/A — out of scope for Phase B**.
## Notes on minor implementation choices
- Canary's PPCContext doesn't expose a `pc` field (the JIT dispatch loop manages PC). At the snapshot point the about-to-execute PC equals the `address` arg to `processor()->Execute(...)`, which the hook receives as `entry_address`; we emit that value as `cpu_state.pc`.
- Memory snapshots emit a **fixed named-region list** (XEX image, main stack, PCR, TLS) rather than walking the full page table. An earlier blanket-walk approach crashed in Wine because canary's `QueryRegionInfo` reports `COMMIT` for some pages whose host-side backing is reserved-not-committed (physical heap mirrors, low system heap). The named-region list is sufficient for the diff tool's cross-engine comparison.
- The `xex_header_sha256` field uses different formats in each engine (canary emits a 64-bit `UserModule::hash()`; ours emits a placeholder zero string). This is a known one-line shim that Phase B intentionally leaves as a divergence to demonstrate the diff tool's δ-content class.