Files
xenia-rs/audit-runs/phase-ab-verify/verification-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

293 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase A + Phase B verification report
Session: 2026-05-13. Reviewer: WRITE-mode verify pass over Phase A
(`audit-runs/phase-a-diff-harness/`) and Phase B
(`audit-runs/phase-b-state-equivalence/`) deliverables. Discipline:
no Phase C investigation, no XEX sha256 chase, no anchor-on-divergence.
## Outcome
| Phase | Gates | Pre-fix | Post-fix |
|---|---|---|---|
| A | 4 | 4/4 PASS | 4/4 PASS |
| B | 5 | 4/5 PASS — gate 5 produced false-PASS | 5/5 PASS |
| Combined cvar-OFF determinism | 1 | PASS | PASS |
| Diff-tool synthetic edge cases | 5 (each tool) | PASS | PASS |
| Hook-point semantic equivalence | 2 (Phase A + Phase B) | PASS | PASS |
The "false-PASS" pre-fix is HIGH-severity and is detailed in
`Issue-1` below. Without the fix, the negative-test gate of Phase B
silently passed when the test was actually broken — meaning a tampered
snapshot file with an intact manifest copy would have been reported as
"identical" by `diff_state.py`. The Phase B catalog (canary↔ours
divergences) is unaffected by this bug because canary's manifest hashes
legitimately differ from ours's, so the buggy short-circuit never
engaged for any real Phase B comparison.
## Issues found and resolutions
### Issue 1 — HIGH: `diff_state.py` manifest-hash short-circuit trusts
manifests without verification
**Symptom.** Re-running `validation.md` gate 5 verbatim produces
`exit 0` and "0 divergences", *not* the documented `exit 1`. The
mutation (changing `kernel.json` `thread_id: 1``thread_id: 999`) is
silently masked because the gate-5 procedure copies the original
`manifest.json` alongside the mutated file. Both manifests then claim
the same kernel.json hash, so the diff tool's manifest-hash
short-circuit (`if ch == oh: file_status[name] = "identical"`) reports
the file as identical without comparing content.
**Reproduction.** `audit-runs/phase-ab-verify/synthetic-diff-tests/`
plus the verbatim gate-5 procedure (see this report's
`Re-validation gate 5`).
**Fix.** Patched
[`tools/diff-state/diff_state.py`](../../tools/diff-state/diff_state.py)
`diff_directory` to re-hash both files when manifests claim equality
and only short-circuit when the on-disk SHAs match the manifest. When
they don't, a `manifest-hash-mismatch` σ-structural divergence is
emitted *and* the file is fully content-diffed, ensuring no silent
masking.
**Re-validation.**
- Verbatim gate-5 procedure now exits `1` and names the divergence
precisely (`kernel.json objects[handle_semantic_id=…].details.thread_id
canary=1 ours=999`) plus the `manifest-hash-mismatch` σ row.
- Stored Phase B report (`report.md`) regenerates byte-identical
(58 divergences, exit 2 STOP) — no regression on the legitimate
canary↔ours comparison.
- Self-diff of `snap-001/ours` and `snap-001/canary` continues to
return `validate-identical: OK` exit 0 — the optimization still
applies to truthful manifests.
- Inter-run reproducibility tests (`snap-002a/ours` vs `snap-002b/ours`)
also pass `validate-identical`.
### Issue 2 — MEDIUM: `validation.md` gate 5 documents a procedure that
relies on the buggy short-circuit
The gate-5 procedure as written in `validation.md` (and the claim that
it produced `exit 1`) was already inaccurate before this verification.
Either the gate was re-stated from memory rather than re-run at
landing, or the actual run used a different procedure.
**Fix.** Updated
[`audit-runs/phase-b-state-equivalence/validation.md`](../phase-b-state-equivalence/validation.md)
gate-5 entry to (a) keep the verbatim procedure, (b) name *both*
divergences the fixed diff tool now surfaces (`manifest-hash-mismatch`
σ + the actual mutation), and (c) include a footnote describing the
pre-fix bug and pointing at the diff_state.py change.
### Issue 3 — LOW: `validation.md` gate 2 mis-claims canary's snapshot
JSON is sort-keys-sorted
Canary's `phase_b_snapshot.cc` writes JSON via direct `fmt::format`,
emitting keys in **insertion order**`schema_version, engine, pc, lr,
ctr, …`. ours's `phase_b_snapshot.rs` uses `serde_json` which emits
keys alphabetically (`cr, ctr, deterministic_skip, engine, …`). The
diff tool parses both sides into dicts before comparing, so this has
no functional impact on the catalog. It does mean that even
semantically-equivalent snapshots produce mismatching SHAs at the file
level, so the manifest-hash short-circuit in `diff_state.py` never
short-circuits canary↔ours comparisons (the underlying byte content
trivially differs even where the parsed semantics match).
**Fix.** Updated `validation.md` gate-2 entry to describe the actual
behavior accurately.
### Issue 4 — LOW: schema kind count and unwired-list inaccuracies
`audit-runs/phase-a-diff-harness/README.md` claims "Schema v1 declares
11 event kinds" and "wires three" then lists four kinds. Actual count
in `schema-v1.md`: **13 sections** with **16 distinct kind strings**
(`thread.suspend`/`thread.resume` and `vfs.open`/`vfs.read`/`vfs.close`
share their respective sections).
`ours-changes.md` lists six unwired kind families but omits
`thread.suspend`/`thread.resume`. The Rust emitter API has 9 `emit_*`
functions, of which 3 are wired (4 if you count the synthetic
`schema_version` header) and 6 are stubbed. Five additional kinds
have no Rust function yet (`thread.suspend`, `thread.resume`,
`mem.write`, `vfs.open`, `vfs.read`, `vfs.close`).
**Fix.** Updated
[`README.md`](../phase-a-diff-harness/README.md) and
[`ours-changes.md`](../phase-a-diff-harness/ours-changes.md) to
distinguish `wired` / `stubbed` / `not-yet-stubbed` precisely and use
accurate counts. Did **not** add any new emitters or hooks (out of
scope per session brief).
## Per-step verification record
### Step 2 — Combined Phase A + Phase B cvar-OFF determinism
Ran the current `target/release/xenia-rs` (built from sources containing
both Phase A and Phase B) with no Phase A or Phase B cvars set:
```
$ ./target/release/xenia-rs check --stable-digest -n 50000000 \
--out audit-runs/phase-ab-verify/digest-current-cvaroff.json \
"<ISO>"
$ diff audit-runs/phase-a-diff-harness/digest-pre-patch.json \
audit-runs/phase-ab-verify/digest-current-cvaroff.json
# (no output)
```
**PASS.** Combined Phase A + Phase B cvar-OFF binary digest is
byte-identical to the pre-Phase-A baseline.
Verified by `md5sum` that `target/release/xenia-rs` and
`target/release/xenia-rs-phaseB` are byte-identical (current build);
`xenia-rs-phaseA-pre` is older (pre-patch baseline).
### Step 3 — Phase A four gates re-validated
| Gate | Result | Method |
|---|---|---|
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
| 1 cvar-OFF canary smoke marker fires | ✅ | Wine 18-s timed run with `--mute=true`; `AUDIT-DEMO-SETUP-BEGIN` and `AUDIT-DEMO-SETUP-GRAPHICS-OK` both observed in `xenia.log`. CONFIG DUMP shows the 5 expected new cvars (2 Phase A + 3 Phase B), all default empty/false. |
| 2 cvar-ON valid JSONL with `schema_version` first line | ✅ | All 121 363 lines of `ours-sanity.jsonl` and 1 635 789 lines of `canary-sanity.jsonl` parse as JSON. Both lead with `{"schema_version":1,…,"kind":"schema_version",…}`. Kind histogram: ours 3:1:1:1 ratio import.call/kernel.call/kernel.return/header (perfect — 40454 each); canary 1:545271:545270:545247 (24 in-flight calls when wineserver killed, expected). |
| 3 ≥100-event matching prefix on tid=6→tid=1 | ✅ | Re-ran `diff_events.py` on stored sanity logs; output **byte-identical** to stored `diff-report.md`. 113 matched events on canary tid=6 → ours tid=1; first divergence at idx 113 (KeQuerySystemTime return_value differs — Phase B/C input). |
| 4 negative test detects corruption at exact index | ✅ | Took first 100 events of `ours-sanity.jsonl` to `/tmp/ours-short.jsonl`; corrupted line 50 (`tid_event_idx=48`) by changing `kind: import.call``kind: kernel.CORRUPT`. Self-diff: exit 0 OK. Corrupt diff: exit 1, `validate-identical: divergence in canary_tid=1 at tid_event_idx=48 (kind: canary='import.call' ours='kernel.CORRUPT')`. |
### Step 4 — Phase B five gates re-validated
| Gate | Result | Method |
|---|---|---|
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
| 1 cvar-OFF canary CONFIG DUMP shows 5 expected lines | ✅ | Same Wine smoke run; CONFIG DUMP `[Audit]` section includes `phase_a_event_log_path`, `phase_a_event_log_mem_writes`, `phase_b_dump_section_content`, `phase_b_snapshot_and_exit`, `phase_b_snapshot_dir` with default empty/false values. |
| 2 well-formed snapshots both engines | ✅ | Both snap-001 dirs contain 6 files; all parse as JSON; manifest SHA-256s match recomputed file hashes; ours's JSON is sort-keys-sorted, canary's is insertion-order (note Issue 3). |
| 3 hash-deterministic re-runs | ✅ ours | Two ours runs to different `--phase-b-snapshot-dir`s (`snap-002a` and `snap-002b`): `validate-identical: OK` exit 0. Same-dir re-run (`snap-002c/ours` vs `snap-002c/ours-1`): byte-identical via `diff -r`. |
| 3 hash-deterministic re-runs | ✅ canary | New canary snapshot `snap-canary-002/canary` vs existing `snap-001/canary`: `validate-identical: OK` exit 0. Full diff: 4 of 5 files identical, only `config.json` "diverged" with 0 reportable divergences (path/timestamp fields are skipped). |
| 4 invariant `pc == entry_point == 0x824ab748` both engines | ✅ | Confirmed by inspecting `snap-001/canary/cpu_state.json` and `snap-001/ours/cpu_state.json` — both `pc: "0x824ab748"`; `config.json::xex_entry_point: "0x824ab748"` in both. |
| 4 invariant `image_loaded_sha256` matches | ❌ FAIL → STOP | Reproduced canary `a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c` and ours `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18` across **two independent runs each**. Reproducible STOP condition; this is the documented Phase C handoff, not a Phase B failure. |
| 5 negative test detects mutation | ❌ → ✅ post-fix | Pre-fix: false PASS (Issue 1). Post-fix: exit 1, names both the manifest-hash-mismatch σ and the actual mutation γ. |
### Step 5 — Hook-point semantic equivalence
**Phase A boundary.** Both engines hook at the kernel-export dispatch
boundary (canary: `shim_utils.h::ExportRegistrerHelper::*::Trampoline`;
ours: `state.rs::call_export`). Verified by inspecting the first 113
matched events on the boot thread:
- canary tid=6 [0]: `import.call RtlImageXexHeaderField` (ord=299)
- ours tid=1 [0]: `import.call RtlImageXexHeaderField` (ord=299)
- canary tid=6 [1]: `kernel.call RtlImageXexHeaderField`
- ours tid=1 [1]: `kernel.call RtlImageXexHeaderField`
- canary tid=6 [2]: `kernel.return RtlImageXexHeaderField`
- ours tid=1 [2]: `kernel.return RtlImageXexHeaderField`
The 113-event matching prefix demonstrates the boundary captures the
same kernel-call sequence on the boot thread of each engine through
113 calls.
**Asymmetries.**
- canary's debug build emits some kernel calls that complete before
shim_utils trampoline (24 in-flight calls when `wineserver -k` kills
the process — visible as `kernel.call > kernel.return` count
imbalance). ours's `check -n` exit is clean. Not an asymmetry of the
hook itself.
- ours's `call_export` only emits when an export is `Some(&(name,
func))` in the dispatch table; unimplemented exports take the early
return path and emit nothing. Canary's trampoline is per-shim; if
canary has a shim where ours has no export, only canary will emit a
`kernel.call` for it. This is an inherent boundary asymmetry that
Phase C should be aware of, but it does NOT invalidate the matching
prefix (the first 113 boot-thread calls are all on shared exports).
**Phase B boundary.** Both engines fire the snapshot hook immediately
before the first guest PPC instruction at `entry_point` on the boot
thread. PC == `0x824ab748` in both `cpu_state.json` files; `thread_id`
records the boot thread (canary 6, ours 1). No "instruction count" /
`tbl_tbu` field is captured, but the `pc == entry_pc` invariant is
sufficient: had any instructions executed, PC would have advanced.
**Verdict.** Both Phase A and Phase B hook points are semantically
equivalent across engines for the in-scope event types. Asymmetries
(unimplemented exports, kernel-call-count off-by-N at process kill)
are inherent to the boundaries themselves, not bugs in the harness.
### Step 6 — Diff-tool robustness (5 synthetic edge cases each)
#### `diff_events.py`
| Case | Input | Result |
|---|---|---|
| empty file | `empty.jsonl` | `SystemExit('empty file')` exit 1, no crash |
| single event (header only) | `single-event.jsonl` (just `schema_version`) | Auto-mapping finds no shared first kernel.call → exit 2 with clear message; no crash |
| missing schema header | first line is `import.call` | `SystemExit('first event is not schema_version')` exit 1, clear message |
| mismatched thread tids | canary has only tid=2; ours has only tid=1, no shared first-call name | exit 2 with clear "no tid mapping" message |
| field comparison rules honored | self-diff of `ours-sanity[0:99]` | exit 0; corruption at idx 48 → exit 1 with exact `tid_event_idx=48` named |
#### `diff_state.py`
| Case | Input | Result |
|---|---|---|
| empty snapshot dirs | `ds-empty/canary` and `ds-empty/ours` (no JSON files) | exit 2 STOP (invariants fail because `config.json` missing); 5 missing-file divergences |
| self-diff existing snapshot | `snap-001/ours` against itself | `validate-identical: OK` exit 0 (legitimate manifest match still short-circuits correctly) |
| missing canary dir | `/tmp/does-not-exist-xyz` as canary | exit 2 with "both snapshot dirs must exist" message |
| missing config.json | manifests present (empty) but no JSON files | exit 2 STOP (FileNotFoundError caught in `check_invariants`); 5 missing-file divergences |
| field mutation detection | `snap-001/ours` vs `/tmp/verify-gate5` (kernel.json mutated, manifest copied verbatim) | exit 1 (post-fix); names `manifest-hash-mismatch` σ + actual γ-content divergence |
All synthetic cases handled gracefully; no crashes, exit codes
distinguish failure modes (1 = data divergence; 2 = STOP / invalid
input).
### Step 7 — Schema coverage scope
Schema-v1.md declares **13 sections** (16 distinct kind strings).
Phase A wires:
| status | kinds |
|---|---|
| wired (call sites in `state.rs::call_export` + canary `shim_utils.h`) | `schema_version`, `import.call`, `kernel.call`, `kernel.return` |
| stubbed (Rust `emit_*` exists, no call site) | `thread.create`, `thread.exit`, `handle.create`, `handle.destroy`, `wait.begin`, `wait.end` |
| not-yet-stubbed (no Rust function) | `thread.suspend`, `thread.resume`, `mem.write`, `vfs.open`, `vfs.read`, `vfs.close` |
Documentation updates (Issue 4) clarify which is which. Per session
brief, **NOT** wiring any of the unwired kinds — that is Phase A+ /
Phase C scope.
## Confirmed Phase B `image_loaded_sha256` mismatch (handed to Phase C)
Reproducible across two independent runs of each engine:
- canary: `a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c`
- ours: `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
`xex_entry_point` = `0x824ab748` and `cpu_state.pc` = `0x824ab748` in
**both** engines (these match — the snapshot point is equivalent). The
in-memory bytes loaded for the XEX image differ. Per Phase B contract,
this is the catalog finding handed to Phase C; verifier did not
investigate cause. Phase B's documented next-step (re-run with
`--phase-b-dump-section-content`, binary-diff `section_contents[]`)
remains the correct Phase C entry point.
## Files in this directory
| File | Purpose |
|---|---|
| `verification-report.md` | This file. |
| `re-validation.md` | Per-gate post-fix re-validation evidence (compact). |
| `digest-current-cvaroff.json` | Step 2 digest from current build. |
| `regenerated-phase-a-diff-report.md` | `diff_events.py` output on stored sanity logs (byte-identical to stored `diff-report.md`). |
| `regenerated-phase-b-report.md` | `diff_state.py` output on stored snap-001 (pre-fix; byte-identical to stored `report.md`). |
| `regenerated-phase-b-report-postfix.md` | Same, but generated post-fix (also byte-identical). |
| `snap-002a/ours/`, `snap-002b/ours/` | Two independent ours snapshot runs (Phase B gate 3 reproducibility). |
| `snap-002c/ours/`, `snap-002c/ours-1/` | Same-dir ours re-run (byte-equality test). |
| `snap-canary-002/canary/` | Independent canary snapshot run (Phase B gate 3 reproducibility). |
| `coexist/` | Phase A + Phase B cvars enabled simultaneously, ours brief run; jsonl + 5-file snapshot both emitted cleanly. |
| `synthetic-diff-tests/` | Fixtures for Step 6 edge-case tests. |
## Cascade prediction
- A re-verify gates with reproduction: **achieved** — all gates re-run,
reproductions match.
- B identify ≥1 instrumentation bug or doc issue: **achieved**
Issue 1 HIGH (diff tool short-circuit), Issues 24 documentation.
- C fixes land + re-pass all gates: **achieved** — diff_state.py fix +
4 doc fixes; all gates pass post-fix; no regressions.
- D Phase C base is solid going forward: **achieved**, with the
caveat that Issue 3 (canary insertion-order JSON) means inter-engine
manifest-hash short-circuit will never fire, but the fall-through
full-content-diff path covers this correctly.