Files
xenia-rs/audit-runs/phase-b-state-equivalence/validation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

7.1 KiB
Raw Blame History

Phase B — Validation record

All gates executed on 2026-05-13 against the patched canary (build-cross/bin/Windows/Debug/xenia_canary.exe + renamed xenia_canary_phaseB.exe) and ours (target/release/xenia-rs + renamed target/release/xenia-rs-phaseB).

Gate 1: cvar-OFF determinism

ours

  • Pre-patch digest: audit-runs/phase-a-diff-harness/digest-post-patch-cvaroff.json (Phase A baseline; Phase A's gate-1 already proved byte-identity to the genuine pre-patch).
  • Post-Phase-B digest: audit-runs/phase-b-state-equivalence/digest-post-phaseB-cvaroff.json.
  • Both runs: check --stable-digest -n 50000000 against the same ISO.
  • diff of the two files produces zero output. Byte-identical. PASS.

canary

  • Phase B adds three new CONFIG DUMP lines (phase_b_snapshot_dir = "", phase_b_snapshot_and_exit = false, phase_b_dump_section_content = false). All other lines either match Phase A's accepted host-pointer/timing jitter or are unchanged.
  • Smoke marker (AUDIT-DEMO-SETUP-BEGIN) still fires.
  • PASS by the Phase A gate-1 method.

Gate 2: Snapshot files well-formed

ours

$ ls audit-runs/phase-b-state-equivalence/snap-001/ours/
config.json  cpu_state.json  kernel.json  manifest.json  memory.json  vfs.json

All six files parse as JSON, lead with "schema_version": 1 (or contain it in manifest), and are alphabetically sort-keys-sorted (verified by re-serializing — serde_json::Map defaults to ordered). PASS.

canary

$ ls audit-runs/phase-b-state-equivalence/snap-001/canary/
config.json  cpu_state.json  kernel.json  manifest.json  memory.json  vfs.json

Same six files, same shape. Note: canary's phase_b_snapshot.cc writes JSON via direct fmt::format rather than a JSON map, so keys are emitted in insertion order, not alphabetical order. The diff tool parses to dict before comparing, so this asymmetry has no functional impact (verified empirically — diff_state.py produces identical reports across multiple runs of either engine). It does mean the canary↔ours manifest hashes differ even when the underlying state is semantically identical; the diff tool falls back to full content comparison in that case. PASS with this caveat documented.

Gate 3: Hash-deterministic re-runs (ours)

Two runs of ours with identical args:

$ ./target/release/xenia-rs-phaseB exec --quiet \
    --phase-b-snapshot-dir <dir> --phase-b-snapshot-and-exit <iso>   # run 1
$ mv <dir>/ours <dir>/ours-a
$ ./target/release/xenia-rs-phaseB exec --quiet \
    --phase-b-snapshot-dir <dir> --phase-b-snapshot-and-exit <iso>   # run 2
$ diff -r <dir>/ours <dir>/ours-a && echo BYTE-IDENTICAL
BYTE-IDENTICAL

PASS. Re-running ours with the same args produces hash-identical snapshot files.

The first re-run attempt produced a config.json mismatch because the two runs were given different --phase-b-snapshot-dir values (whose path string is embedded in config.json::cvars.phase_b_snapshot_dir). That field is in the diff tool's SKIP_BY_FILE["config.json"] skip set; the hash difference confirmed the skip rule is well-placed. With identical inputs the snapshots are byte-equal.

Gate 4: Invariants (HARD GATE)

From report.md:

invariant canary ours ok?
xex_entry_point 0x824ab748 0x824ab748 PASS
cpu_state.pc == xex_entry_point 0x824ab748 == 0x824ab748 (canary) 0x824ab748 == 0x824ab748 (ours) PASS
image_loaded_sha256 a70993b7… ea8d160e… FAIL → STOP

The PC + entry-point invariants prove the snapshot point is equivalent across engines — both fired immediately before the first instruction at the same address. This is the principal Phase B equivalence claim.

The image_loaded_sha256 mismatch is the expected STOP condition per the spec. Phase B's contract is to detect and report this; investigation belongs to Phase C/D. The report.md flags it explicitly with re-run guidance.

Gate 5: Diff-tool negative test

$ cp audit-runs/phase-b-state-equivalence/snap-001/ours/kernel.json /tmp/kernel-mut.json
$ sed -i 's/"thread_id": 1/"thread_id": 999/' /tmp/kernel-mut.json
$ mkdir -p /tmp/ours-mut && cp -r audit-runs/phase-b-state-equivalence/snap-001/ours/* /tmp/ours-mut/
$ cp /tmp/kernel-mut.json /tmp/ours-mut/kernel.json
$ python3 tools/diff-state/diff_state.py \
    --canary audit-runs/phase-b-state-equivalence/snap-001/ours \
    --ours /tmp/ours-mut --out /tmp/r.md
$ echo $?
1

Report.md names two divergences:

  • kernel.json <manifest> manifest-hash-mismatch — surfaces that /tmp/ours-mut/kernel.json's SHA does not match what /tmp/ours-mut/manifest.json claims.
  • kernel.json objects[handle_semantic_id=…].details.thread_id value=canary=1, ours=999 — the actual mutation.

PASS.

Verified 2026-05-13 (Phase A/B verify session). Pre-fix the diff tool trusted the manifest-claimed hashes without verifying them; a tampered file with an intact manifest copy would silently report "identical" (exit 0). The fix in diff_state.py (around diff_directory) re-hashes each file, surfaces a manifest-hash-mismatch σ-structural divergence when the on-disk SHA does not match the manifest, and falls through to a full content diff.

Summary

Gate Status
1. Cvar-OFF determinism (both engines) PASS
2. Snapshots well-formed (both engines) PASS
3. Hash-deterministic re-runs (ours) PASS
4. Invariants — pc == entry_point PASS
4. Invariants — image_loaded_sha256 FAIL → STOP (expected: this is what Phase B catalogs)
5. Diff-tool negative test PASS

Cascade prediction at session close

  • A (snapshot tool emits readable state both engines): achieved.
  • B (section content hashes match): NOT achievedimage_loaded_sha256 differs. The XEX is loaded into different post-decompression states between the two engines. This is the primary finding that Phase C will investigate, not a Phase B failure.
  • C (divergence catalog produced with classification): achieved — 58 divergences across all 5 files, fully classified.
  • D (fix lands): N/A — out of scope for Phase B.

Notes on minor implementation choices

  • Canary's PPCContext doesn't expose a pc field (the JIT dispatch loop manages PC). At the snapshot point the about-to-execute PC equals the address arg to processor()->Execute(...), which the hook receives as entry_address; we emit that value as cpu_state.pc.
  • Memory snapshots emit a fixed named-region list (XEX image, main stack, PCR, TLS) rather than walking the full page table. An earlier blanket-walk approach crashed in Wine because canary's QueryRegionInfo reports COMMIT for some pages whose host-side backing is reserved-not-committed (physical heap mirrors, low system heap). The named-region list is sufficient for the diff tool's cross-engine comparison.
  • The xex_header_sha256 field uses different formats in each engine (canary emits a 64-bit UserModule::hash(); ours emits a placeholder zero string). This is a known one-line shim that Phase B intentionally leaves as a divergence to demonstrate the diff tool's δ-content class.