Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
17 KiB
Phase A + Phase B verification report
Session: 2026-05-13. Reviewer: WRITE-mode verify pass over Phase A
(audit-runs/phase-a-diff-harness/) and Phase B
(audit-runs/phase-b-state-equivalence/) deliverables. Discipline:
no Phase C investigation, no XEX sha256 chase, no anchor-on-divergence.
Outcome
| Phase | Gates | Pre-fix | Post-fix |
|---|---|---|---|
| A | 4 | 4/4 PASS | 4/4 PASS |
| B | 5 | 4/5 PASS — gate 5 produced false-PASS | 5/5 PASS |
| Combined cvar-OFF determinism | 1 | PASS | PASS |
| Diff-tool synthetic edge cases | 5 (each tool) | PASS | PASS |
| Hook-point semantic equivalence | 2 (Phase A + Phase B) | PASS | PASS |
The "false-PASS" pre-fix is HIGH-severity and is detailed in
Issue-1 below. Without the fix, the negative-test gate of Phase B
silently passed when the test was actually broken — meaning a tampered
snapshot file with an intact manifest copy would have been reported as
"identical" by diff_state.py. The Phase B catalog (canary↔ours
divergences) is unaffected by this bug because canary's manifest hashes
legitimately differ from ours's, so the buggy short-circuit never
engaged for any real Phase B comparison.
Issues found and resolutions
Issue 1 — HIGH: diff_state.py manifest-hash short-circuit trusts
manifests without verification
Symptom. Re-running validation.md gate 5 verbatim produces
exit 0 and "0 divergences", not the documented exit 1. The
mutation (changing kernel.json thread_id: 1 → thread_id: 999) is
silently masked because the gate-5 procedure copies the original
manifest.json alongside the mutated file. Both manifests then claim
the same kernel.json hash, so the diff tool's manifest-hash
short-circuit (if ch == oh: file_status[name] = "identical") reports
the file as identical without comparing content.
Reproduction. audit-runs/phase-ab-verify/synthetic-diff-tests/
plus the verbatim gate-5 procedure (see this report's
Re-validation gate 5).
Fix. Patched
tools/diff-state/diff_state.py
diff_directory to re-hash both files when manifests claim equality
and only short-circuit when the on-disk SHAs match the manifest. When
they don't, a manifest-hash-mismatch σ-structural divergence is
emitted and the file is fully content-diffed, ensuring no silent
masking.
Re-validation.
- Verbatim gate-5 procedure now exits
1and names the divergence precisely (kernel.json objects[handle_semantic_id=…].details.thread_id canary=1 ours=999) plus themanifest-hash-mismatchσ row. - Stored Phase B report (
report.md) regenerates byte-identical (58 divergences, exit 2 STOP) — no regression on the legitimate canary↔ours comparison. - Self-diff of
snap-001/oursandsnap-001/canarycontinues to returnvalidate-identical: OKexit 0 — the optimization still applies to truthful manifests. - Inter-run reproducibility tests (
snap-002a/oursvssnap-002b/ours) also passvalidate-identical.
Issue 2 — MEDIUM: validation.md gate 5 documents a procedure that
relies on the buggy short-circuit
The gate-5 procedure as written in validation.md (and the claim that
it produced exit 1) was already inaccurate before this verification.
Either the gate was re-stated from memory rather than re-run at
landing, or the actual run used a different procedure.
Fix. Updated
audit-runs/phase-b-state-equivalence/validation.md
gate-5 entry to (a) keep the verbatim procedure, (b) name both
divergences the fixed diff tool now surfaces (manifest-hash-mismatch
σ + the actual mutation), and (c) include a footnote describing the
pre-fix bug and pointing at the diff_state.py change.
Issue 3 — LOW: validation.md gate 2 mis-claims canary's snapshot
JSON is sort-keys-sorted
Canary's phase_b_snapshot.cc writes JSON via direct fmt::format,
emitting keys in insertion order — schema_version, engine, pc, lr, ctr, …. ours's phase_b_snapshot.rs uses serde_json which emits
keys alphabetically (cr, ctr, deterministic_skip, engine, …). The
diff tool parses both sides into dicts before comparing, so this has
no functional impact on the catalog. It does mean that even
semantically-equivalent snapshots produce mismatching SHAs at the file
level, so the manifest-hash short-circuit in diff_state.py never
short-circuits canary↔ours comparisons (the underlying byte content
trivially differs even where the parsed semantics match).
Fix. Updated validation.md gate-2 entry to describe the actual
behavior accurately.
Issue 4 — LOW: schema kind count and unwired-list inaccuracies
audit-runs/phase-a-diff-harness/README.md claims "Schema v1 declares
11 event kinds" and "wires three" then lists four kinds. Actual count
in schema-v1.md: 13 sections with 16 distinct kind strings
(thread.suspend/thread.resume and vfs.open/vfs.read/vfs.close
share their respective sections).
ours-changes.md lists six unwired kind families but omits
thread.suspend/thread.resume. The Rust emitter API has 9 emit_*
functions, of which 3 are wired (4 if you count the synthetic
schema_version header) and 6 are stubbed. Five additional kinds
have no Rust function yet (thread.suspend, thread.resume,
mem.write, vfs.open, vfs.read, vfs.close).
Fix. Updated
README.md and
ours-changes.md to
distinguish wired / stubbed / not-yet-stubbed precisely and use
accurate counts. Did not add any new emitters or hooks (out of
scope per session brief).
Per-step verification record
Step 2 — Combined Phase A + Phase B cvar-OFF determinism
Ran the current target/release/xenia-rs (built from sources containing
both Phase A and Phase B) with no Phase A or Phase B cvars set:
$ ./target/release/xenia-rs check --stable-digest -n 50000000 \
--out audit-runs/phase-ab-verify/digest-current-cvaroff.json \
"<ISO>"
$ diff audit-runs/phase-a-diff-harness/digest-pre-patch.json \
audit-runs/phase-ab-verify/digest-current-cvaroff.json
# (no output)
PASS. Combined Phase A + Phase B cvar-OFF binary digest is byte-identical to the pre-Phase-A baseline.
Verified by md5sum that target/release/xenia-rs and
target/release/xenia-rs-phaseB are byte-identical (current build);
xenia-rs-phaseA-pre is older (pre-patch baseline).
Step 3 — Phase A four gates re-validated
| Gate | Result | Method |
|---|---|---|
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
| 1 cvar-OFF canary smoke marker fires | ✅ | Wine 18-s timed run with --mute=true; AUDIT-DEMO-SETUP-BEGIN and AUDIT-DEMO-SETUP-GRAPHICS-OK both observed in xenia.log. CONFIG DUMP shows the 5 expected new cvars (2 Phase A + 3 Phase B), all default empty/false. |
2 cvar-ON valid JSONL with schema_version first line |
✅ | All 121 363 lines of ours-sanity.jsonl and 1 635 789 lines of canary-sanity.jsonl parse as JSON. Both lead with {"schema_version":1,…,"kind":"schema_version",…}. Kind histogram: ours 3:1:1:1 ratio import.call/kernel.call/kernel.return/header (perfect — 40454 each); canary 1:545271:545270:545247 (24 in-flight calls when wineserver killed, expected). |
| 3 ≥100-event matching prefix on tid=6→tid=1 | ✅ | Re-ran diff_events.py on stored sanity logs; output byte-identical to stored diff-report.md. 113 matched events on canary tid=6 → ours tid=1; first divergence at idx 113 (KeQuerySystemTime return_value differs — Phase B/C input). |
| 4 negative test detects corruption at exact index | ✅ | Took first 100 events of ours-sanity.jsonl to /tmp/ours-short.jsonl; corrupted line 50 (tid_event_idx=48) by changing kind: import.call → kind: kernel.CORRUPT. Self-diff: exit 0 OK. Corrupt diff: exit 1, validate-identical: divergence in canary_tid=1 at tid_event_idx=48 (kind: canary='import.call' ours='kernel.CORRUPT'). |
Step 4 — Phase B five gates re-validated
| Gate | Result | Method |
|---|---|---|
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
| 1 cvar-OFF canary CONFIG DUMP shows 5 expected lines | ✅ | Same Wine smoke run; CONFIG DUMP [Audit] section includes phase_a_event_log_path, phase_a_event_log_mem_writes, phase_b_dump_section_content, phase_b_snapshot_and_exit, phase_b_snapshot_dir with default empty/false values. |
| 2 well-formed snapshots both engines | ✅ | Both snap-001 dirs contain 6 files; all parse as JSON; manifest SHA-256s match recomputed file hashes; ours's JSON is sort-keys-sorted, canary's is insertion-order (note Issue 3). |
| 3 hash-deterministic re-runs | ✅ ours | Two ours runs to different --phase-b-snapshot-dirs (snap-002a and snap-002b): validate-identical: OK exit 0. Same-dir re-run (snap-002c/ours vs snap-002c/ours-1): byte-identical via diff -r. |
| 3 hash-deterministic re-runs | ✅ canary | New canary snapshot snap-canary-002/canary vs existing snap-001/canary: validate-identical: OK exit 0. Full diff: 4 of 5 files identical, only config.json "diverged" with 0 reportable divergences (path/timestamp fields are skipped). |
4 invariant pc == entry_point == 0x824ab748 both engines |
✅ | Confirmed by inspecting snap-001/canary/cpu_state.json and snap-001/ours/cpu_state.json — both pc: "0x824ab748"; config.json::xex_entry_point: "0x824ab748" in both. |
4 invariant image_loaded_sha256 matches |
❌ FAIL → STOP | Reproduced canary a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c and ours ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18 across two independent runs each. Reproducible STOP condition; this is the documented Phase C handoff, not a Phase B failure. |
| 5 negative test detects mutation | ❌ → ✅ post-fix | Pre-fix: false PASS (Issue 1). Post-fix: exit 1, names both the manifest-hash-mismatch σ and the actual mutation γ. |
Step 5 — Hook-point semantic equivalence
Phase A boundary. Both engines hook at the kernel-export dispatch
boundary (canary: shim_utils.h::ExportRegistrerHelper::*::Trampoline;
ours: state.rs::call_export). Verified by inspecting the first 113
matched events on the boot thread:
- canary tid=6 [0]:
import.call RtlImageXexHeaderField(ord=299) - ours tid=1 [0]:
import.call RtlImageXexHeaderField(ord=299) - canary tid=6 [1]:
kernel.call RtlImageXexHeaderField - ours tid=1 [1]:
kernel.call RtlImageXexHeaderField - canary tid=6 [2]:
kernel.return RtlImageXexHeaderField - ours tid=1 [2]:
kernel.return RtlImageXexHeaderField
The 113-event matching prefix demonstrates the boundary captures the same kernel-call sequence on the boot thread of each engine through 113 calls.
Asymmetries.
- canary's debug build emits some kernel calls that complete before
shim_utils trampoline (24 in-flight calls when
wineserver -kkills the process — visible askernel.call > kernel.returncount imbalance). ours'scheck -nexit is clean. Not an asymmetry of the hook itself. - ours's
call_exportonly emits when an export isSome(&(name, func))in the dispatch table; unimplemented exports take the early return path and emit nothing. Canary's trampoline is per-shim; if canary has a shim where ours has no export, only canary will emit akernel.callfor it. This is an inherent boundary asymmetry that Phase C should be aware of, but it does NOT invalidate the matching prefix (the first 113 boot-thread calls are all on shared exports).
Phase B boundary. Both engines fire the snapshot hook immediately
before the first guest PPC instruction at entry_point on the boot
thread. PC == 0x824ab748 in both cpu_state.json files; thread_id
records the boot thread (canary 6, ours 1). No "instruction count" /
tbl_tbu field is captured, but the pc == entry_pc invariant is
sufficient: had any instructions executed, PC would have advanced.
Verdict. Both Phase A and Phase B hook points are semantically equivalent across engines for the in-scope event types. Asymmetries (unimplemented exports, kernel-call-count off-by-N at process kill) are inherent to the boundaries themselves, not bugs in the harness.
Step 6 — Diff-tool robustness (5 synthetic edge cases each)
diff_events.py
| Case | Input | Result |
|---|---|---|
| empty file | empty.jsonl |
SystemExit('empty file') exit 1, no crash |
| single event (header only) | single-event.jsonl (just schema_version) |
Auto-mapping finds no shared first kernel.call → exit 2 with clear message; no crash |
| missing schema header | first line is import.call |
SystemExit('first event is not schema_version') exit 1, clear message |
| mismatched thread tids | canary has only tid=2; ours has only tid=1, no shared first-call name | exit 2 with clear "no tid mapping" message |
| field comparison rules honored | self-diff of ours-sanity[0:99] |
exit 0; corruption at idx 48 → exit 1 with exact tid_event_idx=48 named |
diff_state.py
| Case | Input | Result |
|---|---|---|
| empty snapshot dirs | ds-empty/canary and ds-empty/ours (no JSON files) |
exit 2 STOP (invariants fail because config.json missing); 5 missing-file divergences |
| self-diff existing snapshot | snap-001/ours against itself |
validate-identical: OK exit 0 (legitimate manifest match still short-circuits correctly) |
| missing canary dir | /tmp/does-not-exist-xyz as canary |
exit 2 with "both snapshot dirs must exist" message |
| missing config.json | manifests present (empty) but no JSON files | exit 2 STOP (FileNotFoundError caught in check_invariants); 5 missing-file divergences |
| field mutation detection | snap-001/ours vs /tmp/verify-gate5 (kernel.json mutated, manifest copied verbatim) |
exit 1 (post-fix); names manifest-hash-mismatch σ + actual γ-content divergence |
All synthetic cases handled gracefully; no crashes, exit codes distinguish failure modes (1 = data divergence; 2 = STOP / invalid input).
Step 7 — Schema coverage scope
Schema-v1.md declares 13 sections (16 distinct kind strings). Phase A wires:
| status | kinds |
|---|---|
wired (call sites in state.rs::call_export + canary shim_utils.h) |
schema_version, import.call, kernel.call, kernel.return |
stubbed (Rust emit_* exists, no call site) |
thread.create, thread.exit, handle.create, handle.destroy, wait.begin, wait.end |
| not-yet-stubbed (no Rust function) | thread.suspend, thread.resume, mem.write, vfs.open, vfs.read, vfs.close |
Documentation updates (Issue 4) clarify which is which. Per session brief, NOT wiring any of the unwired kinds — that is Phase A+ / Phase C scope.
Confirmed Phase B image_loaded_sha256 mismatch (handed to Phase C)
Reproducible across two independent runs of each engine:
- canary:
a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c - ours:
ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18
xex_entry_point = 0x824ab748 and cpu_state.pc = 0x824ab748 in
both engines (these match — the snapshot point is equivalent). The
in-memory bytes loaded for the XEX image differ. Per Phase B contract,
this is the catalog finding handed to Phase C; verifier did not
investigate cause. Phase B's documented next-step (re-run with
--phase-b-dump-section-content, binary-diff section_contents[])
remains the correct Phase C entry point.
Files in this directory
| File | Purpose |
|---|---|
verification-report.md |
This file. |
re-validation.md |
Per-gate post-fix re-validation evidence (compact). |
digest-current-cvaroff.json |
Step 2 digest from current build. |
regenerated-phase-a-diff-report.md |
diff_events.py output on stored sanity logs (byte-identical to stored diff-report.md). |
regenerated-phase-b-report.md |
diff_state.py output on stored snap-001 (pre-fix; byte-identical to stored report.md). |
regenerated-phase-b-report-postfix.md |
Same, but generated post-fix (also byte-identical). |
snap-002a/ours/, snap-002b/ours/ |
Two independent ours snapshot runs (Phase B gate 3 reproducibility). |
snap-002c/ours/, snap-002c/ours-1/ |
Same-dir ours re-run (byte-equality test). |
snap-canary-002/canary/ |
Independent canary snapshot run (Phase B gate 3 reproducibility). |
coexist/ |
Phase A + Phase B cvars enabled simultaneously, ours brief run; jsonl + 5-file snapshot both emitted cleanly. |
synthetic-diff-tests/ |
Fixtures for Step 6 edge-case tests. |
Cascade prediction
- A re-verify gates with reproduction: achieved — all gates re-run, reproductions match.
- B identify ≥1 instrumentation bug or doc issue: achieved — Issue 1 HIGH (diff tool short-circuit), Issues 2–4 documentation.
- C fixes land + re-pass all gates: achieved — diff_state.py fix + 4 doc fixes; all gates pass post-fix; no regressions.
- D Phase C base is solid going forward: achieved, with the caveat that Issue 3 (canary insertion-order JSON) means inter-engine manifest-hash short-circuit will never fire, but the fall-through full-content-diff path covers this correctly.