handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
292
audit-runs/phase-ab-verify/verification-report.md
Normal file
292
audit-runs/phase-ab-verify/verification-report.md
Normal file
@@ -0,0 +1,292 @@
|
||||
# Phase A + Phase B verification report
|
||||
|
||||
Session: 2026-05-13. Reviewer: WRITE-mode verify pass over Phase A
|
||||
(`audit-runs/phase-a-diff-harness/`) and Phase B
|
||||
(`audit-runs/phase-b-state-equivalence/`) deliverables. Discipline:
|
||||
no Phase C investigation, no XEX sha256 chase, no anchor-on-divergence.
|
||||
|
||||
## Outcome
|
||||
|
||||
| Phase | Gates | Pre-fix | Post-fix |
|
||||
|---|---|---|---|
|
||||
| A | 4 | 4/4 PASS | 4/4 PASS |
|
||||
| B | 5 | 4/5 PASS — gate 5 produced false-PASS | 5/5 PASS |
|
||||
| Combined cvar-OFF determinism | 1 | PASS | PASS |
|
||||
| Diff-tool synthetic edge cases | 5 (each tool) | PASS | PASS |
|
||||
| Hook-point semantic equivalence | 2 (Phase A + Phase B) | PASS | PASS |
|
||||
|
||||
The "false-PASS" pre-fix is HIGH-severity and is detailed in
|
||||
`Issue-1` below. Without the fix, the negative-test gate of Phase B
|
||||
silently passed when the test was actually broken — meaning a tampered
|
||||
snapshot file with an intact manifest copy would have been reported as
|
||||
"identical" by `diff_state.py`. The Phase B catalog (canary↔ours
|
||||
divergences) is unaffected by this bug because canary's manifest hashes
|
||||
legitimately differ from ours's, so the buggy short-circuit never
|
||||
engaged for any real Phase B comparison.
|
||||
|
||||
## Issues found and resolutions
|
||||
|
||||
### Issue 1 — HIGH: `diff_state.py` manifest-hash short-circuit trusts
|
||||
manifests without verification
|
||||
|
||||
**Symptom.** Re-running `validation.md` gate 5 verbatim produces
|
||||
`exit 0` and "0 divergences", *not* the documented `exit 1`. The
|
||||
mutation (changing `kernel.json` `thread_id: 1` → `thread_id: 999`) is
|
||||
silently masked because the gate-5 procedure copies the original
|
||||
`manifest.json` alongside the mutated file. Both manifests then claim
|
||||
the same kernel.json hash, so the diff tool's manifest-hash
|
||||
short-circuit (`if ch == oh: file_status[name] = "identical"`) reports
|
||||
the file as identical without comparing content.
|
||||
|
||||
**Reproduction.** `audit-runs/phase-ab-verify/synthetic-diff-tests/`
|
||||
plus the verbatim gate-5 procedure (see this report's
|
||||
`Re-validation gate 5`).
|
||||
|
||||
**Fix.** Patched
|
||||
[`tools/diff-state/diff_state.py`](../../tools/diff-state/diff_state.py)
|
||||
`diff_directory` to re-hash both files when manifests claim equality
|
||||
and only short-circuit when the on-disk SHAs match the manifest. When
|
||||
they don't, a `manifest-hash-mismatch` σ-structural divergence is
|
||||
emitted *and* the file is fully content-diffed, ensuring no silent
|
||||
masking.
|
||||
|
||||
**Re-validation.**
|
||||
- Verbatim gate-5 procedure now exits `1` and names the divergence
|
||||
precisely (`kernel.json objects[handle_semantic_id=…].details.thread_id
|
||||
canary=1 ours=999`) plus the `manifest-hash-mismatch` σ row.
|
||||
- Stored Phase B report (`report.md`) regenerates byte-identical
|
||||
(58 divergences, exit 2 STOP) — no regression on the legitimate
|
||||
canary↔ours comparison.
|
||||
- Self-diff of `snap-001/ours` and `snap-001/canary` continues to
|
||||
return `validate-identical: OK` exit 0 — the optimization still
|
||||
applies to truthful manifests.
|
||||
- Inter-run reproducibility tests (`snap-002a/ours` vs `snap-002b/ours`)
|
||||
also pass `validate-identical`.
|
||||
|
||||
### Issue 2 — MEDIUM: `validation.md` gate 5 documents a procedure that
|
||||
relies on the buggy short-circuit
|
||||
|
||||
The gate-5 procedure as written in `validation.md` (and the claim that
|
||||
it produced `exit 1`) was already inaccurate before this verification.
|
||||
Either the gate was re-stated from memory rather than re-run at
|
||||
landing, or the actual run used a different procedure.
|
||||
|
||||
**Fix.** Updated
|
||||
[`audit-runs/phase-b-state-equivalence/validation.md`](../phase-b-state-equivalence/validation.md)
|
||||
gate-5 entry to (a) keep the verbatim procedure, (b) name *both*
|
||||
divergences the fixed diff tool now surfaces (`manifest-hash-mismatch`
|
||||
σ + the actual mutation), and (c) include a footnote describing the
|
||||
pre-fix bug and pointing at the diff_state.py change.
|
||||
|
||||
### Issue 3 — LOW: `validation.md` gate 2 mis-claims canary's snapshot
|
||||
JSON is sort-keys-sorted
|
||||
|
||||
Canary's `phase_b_snapshot.cc` writes JSON via direct `fmt::format`,
|
||||
emitting keys in **insertion order** — `schema_version, engine, pc, lr,
|
||||
ctr, …`. ours's `phase_b_snapshot.rs` uses `serde_json` which emits
|
||||
keys alphabetically (`cr, ctr, deterministic_skip, engine, …`). The
|
||||
diff tool parses both sides into dicts before comparing, so this has
|
||||
no functional impact on the catalog. It does mean that even
|
||||
semantically-equivalent snapshots produce mismatching SHAs at the file
|
||||
level, so the manifest-hash short-circuit in `diff_state.py` never
|
||||
short-circuits canary↔ours comparisons (the underlying byte content
|
||||
trivially differs even where the parsed semantics match).
|
||||
|
||||
**Fix.** Updated `validation.md` gate-2 entry to describe the actual
|
||||
behavior accurately.
|
||||
|
||||
### Issue 4 — LOW: schema kind count and unwired-list inaccuracies
|
||||
|
||||
`audit-runs/phase-a-diff-harness/README.md` claims "Schema v1 declares
|
||||
11 event kinds" and "wires three" then lists four kinds. Actual count
|
||||
in `schema-v1.md`: **13 sections** with **16 distinct kind strings**
|
||||
(`thread.suspend`/`thread.resume` and `vfs.open`/`vfs.read`/`vfs.close`
|
||||
share their respective sections).
|
||||
|
||||
`ours-changes.md` lists six unwired kind families but omits
|
||||
`thread.suspend`/`thread.resume`. The Rust emitter API has 9 `emit_*`
|
||||
functions, of which 3 are wired (4 if you count the synthetic
|
||||
`schema_version` header) and 6 are stubbed. Five additional kinds
|
||||
have no Rust function yet (`thread.suspend`, `thread.resume`,
|
||||
`mem.write`, `vfs.open`, `vfs.read`, `vfs.close`).
|
||||
|
||||
**Fix.** Updated
|
||||
[`README.md`](../phase-a-diff-harness/README.md) and
|
||||
[`ours-changes.md`](../phase-a-diff-harness/ours-changes.md) to
|
||||
distinguish `wired` / `stubbed` / `not-yet-stubbed` precisely and use
|
||||
accurate counts. Did **not** add any new emitters or hooks (out of
|
||||
scope per session brief).
|
||||
|
||||
## Per-step verification record
|
||||
|
||||
### Step 2 — Combined Phase A + Phase B cvar-OFF determinism
|
||||
|
||||
Ran the current `target/release/xenia-rs` (built from sources containing
|
||||
both Phase A and Phase B) with no Phase A or Phase B cvars set:
|
||||
|
||||
```
|
||||
$ ./target/release/xenia-rs check --stable-digest -n 50000000 \
|
||||
--out audit-runs/phase-ab-verify/digest-current-cvaroff.json \
|
||||
"<ISO>"
|
||||
$ diff audit-runs/phase-a-diff-harness/digest-pre-patch.json \
|
||||
audit-runs/phase-ab-verify/digest-current-cvaroff.json
|
||||
# (no output)
|
||||
```
|
||||
|
||||
**PASS.** Combined Phase A + Phase B cvar-OFF binary digest is
|
||||
byte-identical to the pre-Phase-A baseline.
|
||||
|
||||
Verified by `md5sum` that `target/release/xenia-rs` and
|
||||
`target/release/xenia-rs-phaseB` are byte-identical (current build);
|
||||
`xenia-rs-phaseA-pre` is older (pre-patch baseline).
|
||||
|
||||
### Step 3 — Phase A four gates re-validated
|
||||
|
||||
| Gate | Result | Method |
|
||||
|---|---|---|
|
||||
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
|
||||
| 1 cvar-OFF canary smoke marker fires | ✅ | Wine 18-s timed run with `--mute=true`; `AUDIT-DEMO-SETUP-BEGIN` and `AUDIT-DEMO-SETUP-GRAPHICS-OK` both observed in `xenia.log`. CONFIG DUMP shows the 5 expected new cvars (2 Phase A + 3 Phase B), all default empty/false. |
|
||||
| 2 cvar-ON valid JSONL with `schema_version` first line | ✅ | All 121 363 lines of `ours-sanity.jsonl` and 1 635 789 lines of `canary-sanity.jsonl` parse as JSON. Both lead with `{"schema_version":1,…,"kind":"schema_version",…}`. Kind histogram: ours 3:1:1:1 ratio import.call/kernel.call/kernel.return/header (perfect — 40454 each); canary 1:545271:545270:545247 (24 in-flight calls when wineserver killed, expected). |
|
||||
| 3 ≥100-event matching prefix on tid=6→tid=1 | ✅ | Re-ran `diff_events.py` on stored sanity logs; output **byte-identical** to stored `diff-report.md`. 113 matched events on canary tid=6 → ours tid=1; first divergence at idx 113 (KeQuerySystemTime return_value differs — Phase B/C input). |
|
||||
| 4 negative test detects corruption at exact index | ✅ | Took first 100 events of `ours-sanity.jsonl` to `/tmp/ours-short.jsonl`; corrupted line 50 (`tid_event_idx=48`) by changing `kind: import.call` → `kind: kernel.CORRUPT`. Self-diff: exit 0 OK. Corrupt diff: exit 1, `validate-identical: divergence in canary_tid=1 at tid_event_idx=48 (kind: canary='import.call' ours='kernel.CORRUPT')`. |
|
||||
|
||||
### Step 4 — Phase B five gates re-validated
|
||||
|
||||
| Gate | Result | Method |
|
||||
|---|---|---|
|
||||
| 1 cvar-OFF byte-identical (ours) | ✅ | Step 2 above |
|
||||
| 1 cvar-OFF canary CONFIG DUMP shows 5 expected lines | ✅ | Same Wine smoke run; CONFIG DUMP `[Audit]` section includes `phase_a_event_log_path`, `phase_a_event_log_mem_writes`, `phase_b_dump_section_content`, `phase_b_snapshot_and_exit`, `phase_b_snapshot_dir` with default empty/false values. |
|
||||
| 2 well-formed snapshots both engines | ✅ | Both snap-001 dirs contain 6 files; all parse as JSON; manifest SHA-256s match recomputed file hashes; ours's JSON is sort-keys-sorted, canary's is insertion-order (note Issue 3). |
|
||||
| 3 hash-deterministic re-runs | ✅ ours | Two ours runs to different `--phase-b-snapshot-dir`s (`snap-002a` and `snap-002b`): `validate-identical: OK` exit 0. Same-dir re-run (`snap-002c/ours` vs `snap-002c/ours-1`): byte-identical via `diff -r`. |
|
||||
| 3 hash-deterministic re-runs | ✅ canary | New canary snapshot `snap-canary-002/canary` vs existing `snap-001/canary`: `validate-identical: OK` exit 0. Full diff: 4 of 5 files identical, only `config.json` "diverged" with 0 reportable divergences (path/timestamp fields are skipped). |
|
||||
| 4 invariant `pc == entry_point == 0x824ab748` both engines | ✅ | Confirmed by inspecting `snap-001/canary/cpu_state.json` and `snap-001/ours/cpu_state.json` — both `pc: "0x824ab748"`; `config.json::xex_entry_point: "0x824ab748"` in both. |
|
||||
| 4 invariant `image_loaded_sha256` matches | ❌ FAIL → STOP | Reproduced canary `a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c` and ours `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18` across **two independent runs each**. Reproducible STOP condition; this is the documented Phase C handoff, not a Phase B failure. |
|
||||
| 5 negative test detects mutation | ❌ → ✅ post-fix | Pre-fix: false PASS (Issue 1). Post-fix: exit 1, names both the manifest-hash-mismatch σ and the actual mutation γ. |
|
||||
|
||||
### Step 5 — Hook-point semantic equivalence
|
||||
|
||||
**Phase A boundary.** Both engines hook at the kernel-export dispatch
|
||||
boundary (canary: `shim_utils.h::ExportRegistrerHelper::*::Trampoline`;
|
||||
ours: `state.rs::call_export`). Verified by inspecting the first 113
|
||||
matched events on the boot thread:
|
||||
|
||||
- canary tid=6 [0]: `import.call RtlImageXexHeaderField` (ord=299)
|
||||
- ours tid=1 [0]: `import.call RtlImageXexHeaderField` (ord=299)
|
||||
- canary tid=6 [1]: `kernel.call RtlImageXexHeaderField`
|
||||
- ours tid=1 [1]: `kernel.call RtlImageXexHeaderField`
|
||||
- canary tid=6 [2]: `kernel.return RtlImageXexHeaderField`
|
||||
- ours tid=1 [2]: `kernel.return RtlImageXexHeaderField`
|
||||
|
||||
The 113-event matching prefix demonstrates the boundary captures the
|
||||
same kernel-call sequence on the boot thread of each engine through
|
||||
113 calls.
|
||||
|
||||
**Asymmetries.**
|
||||
- canary's debug build emits some kernel calls that complete before
|
||||
shim_utils trampoline (24 in-flight calls when `wineserver -k` kills
|
||||
the process — visible as `kernel.call > kernel.return` count
|
||||
imbalance). ours's `check -n` exit is clean. Not an asymmetry of the
|
||||
hook itself.
|
||||
- ours's `call_export` only emits when an export is `Some(&(name,
|
||||
func))` in the dispatch table; unimplemented exports take the early
|
||||
return path and emit nothing. Canary's trampoline is per-shim; if
|
||||
canary has a shim where ours has no export, only canary will emit a
|
||||
`kernel.call` for it. This is an inherent boundary asymmetry that
|
||||
Phase C should be aware of, but it does NOT invalidate the matching
|
||||
prefix (the first 113 boot-thread calls are all on shared exports).
|
||||
|
||||
**Phase B boundary.** Both engines fire the snapshot hook immediately
|
||||
before the first guest PPC instruction at `entry_point` on the boot
|
||||
thread. PC == `0x824ab748` in both `cpu_state.json` files; `thread_id`
|
||||
records the boot thread (canary 6, ours 1). No "instruction count" /
|
||||
`tbl_tbu` field is captured, but the `pc == entry_pc` invariant is
|
||||
sufficient: had any instructions executed, PC would have advanced.
|
||||
|
||||
**Verdict.** Both Phase A and Phase B hook points are semantically
|
||||
equivalent across engines for the in-scope event types. Asymmetries
|
||||
(unimplemented exports, kernel-call-count off-by-N at process kill)
|
||||
are inherent to the boundaries themselves, not bugs in the harness.
|
||||
|
||||
### Step 6 — Diff-tool robustness (5 synthetic edge cases each)
|
||||
|
||||
#### `diff_events.py`
|
||||
|
||||
| Case | Input | Result |
|
||||
|---|---|---|
|
||||
| empty file | `empty.jsonl` | `SystemExit('empty file')` exit 1, no crash |
|
||||
| single event (header only) | `single-event.jsonl` (just `schema_version`) | Auto-mapping finds no shared first kernel.call → exit 2 with clear message; no crash |
|
||||
| missing schema header | first line is `import.call` | `SystemExit('first event is not schema_version')` exit 1, clear message |
|
||||
| mismatched thread tids | canary has only tid=2; ours has only tid=1, no shared first-call name | exit 2 with clear "no tid mapping" message |
|
||||
| field comparison rules honored | self-diff of `ours-sanity[0:99]` | exit 0; corruption at idx 48 → exit 1 with exact `tid_event_idx=48` named |
|
||||
|
||||
#### `diff_state.py`
|
||||
|
||||
| Case | Input | Result |
|
||||
|---|---|---|
|
||||
| empty snapshot dirs | `ds-empty/canary` and `ds-empty/ours` (no JSON files) | exit 2 STOP (invariants fail because `config.json` missing); 5 missing-file divergences |
|
||||
| self-diff existing snapshot | `snap-001/ours` against itself | `validate-identical: OK` exit 0 (legitimate manifest match still short-circuits correctly) |
|
||||
| missing canary dir | `/tmp/does-not-exist-xyz` as canary | exit 2 with "both snapshot dirs must exist" message |
|
||||
| missing config.json | manifests present (empty) but no JSON files | exit 2 STOP (FileNotFoundError caught in `check_invariants`); 5 missing-file divergences |
|
||||
| field mutation detection | `snap-001/ours` vs `/tmp/verify-gate5` (kernel.json mutated, manifest copied verbatim) | exit 1 (post-fix); names `manifest-hash-mismatch` σ + actual γ-content divergence |
|
||||
|
||||
All synthetic cases handled gracefully; no crashes, exit codes
|
||||
distinguish failure modes (1 = data divergence; 2 = STOP / invalid
|
||||
input).
|
||||
|
||||
### Step 7 — Schema coverage scope
|
||||
|
||||
Schema-v1.md declares **13 sections** (16 distinct kind strings).
|
||||
Phase A wires:
|
||||
|
||||
| status | kinds |
|
||||
|---|---|
|
||||
| wired (call sites in `state.rs::call_export` + canary `shim_utils.h`) | `schema_version`, `import.call`, `kernel.call`, `kernel.return` |
|
||||
| stubbed (Rust `emit_*` exists, no call site) | `thread.create`, `thread.exit`, `handle.create`, `handle.destroy`, `wait.begin`, `wait.end` |
|
||||
| not-yet-stubbed (no Rust function) | `thread.suspend`, `thread.resume`, `mem.write`, `vfs.open`, `vfs.read`, `vfs.close` |
|
||||
|
||||
Documentation updates (Issue 4) clarify which is which. Per session
|
||||
brief, **NOT** wiring any of the unwired kinds — that is Phase A+ /
|
||||
Phase C scope.
|
||||
|
||||
## Confirmed Phase B `image_loaded_sha256` mismatch (handed to Phase C)
|
||||
|
||||
Reproducible across two independent runs of each engine:
|
||||
- canary: `a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c`
|
||||
- ours: `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18`
|
||||
|
||||
`xex_entry_point` = `0x824ab748` and `cpu_state.pc` = `0x824ab748` in
|
||||
**both** engines (these match — the snapshot point is equivalent). The
|
||||
in-memory bytes loaded for the XEX image differ. Per Phase B contract,
|
||||
this is the catalog finding handed to Phase C; verifier did not
|
||||
investigate cause. Phase B's documented next-step (re-run with
|
||||
`--phase-b-dump-section-content`, binary-diff `section_contents[]`)
|
||||
remains the correct Phase C entry point.
|
||||
|
||||
## Files in this directory
|
||||
|
||||
| File | Purpose |
|
||||
|---|---|
|
||||
| `verification-report.md` | This file. |
|
||||
| `re-validation.md` | Per-gate post-fix re-validation evidence (compact). |
|
||||
| `digest-current-cvaroff.json` | Step 2 digest from current build. |
|
||||
| `regenerated-phase-a-diff-report.md` | `diff_events.py` output on stored sanity logs (byte-identical to stored `diff-report.md`). |
|
||||
| `regenerated-phase-b-report.md` | `diff_state.py` output on stored snap-001 (pre-fix; byte-identical to stored `report.md`). |
|
||||
| `regenerated-phase-b-report-postfix.md` | Same, but generated post-fix (also byte-identical). |
|
||||
| `snap-002a/ours/`, `snap-002b/ours/` | Two independent ours snapshot runs (Phase B gate 3 reproducibility). |
|
||||
| `snap-002c/ours/`, `snap-002c/ours-1/` | Same-dir ours re-run (byte-equality test). |
|
||||
| `snap-canary-002/canary/` | Independent canary snapshot run (Phase B gate 3 reproducibility). |
|
||||
| `coexist/` | Phase A + Phase B cvars enabled simultaneously, ours brief run; jsonl + 5-file snapshot both emitted cleanly. |
|
||||
| `synthetic-diff-tests/` | Fixtures for Step 6 edge-case tests. |
|
||||
|
||||
## Cascade prediction
|
||||
|
||||
- A re-verify gates with reproduction: **achieved** — all gates re-run,
|
||||
reproductions match.
|
||||
- B identify ≥1 instrumentation bug or doc issue: **achieved** —
|
||||
Issue 1 HIGH (diff tool short-circuit), Issues 2–4 documentation.
|
||||
- C fixes land + re-pass all gates: **achieved** — diff_state.py fix +
|
||||
4 doc fixes; all gates pass post-fix; no regressions.
|
||||
- D Phase C base is solid going forward: **achieved**, with the
|
||||
caveat that Issue 3 (canary insertion-order JSON) means inter-engine
|
||||
manifest-hash short-circuit will never fire, but the fall-through
|
||||
full-content-diff path covers this correctly.
|
||||
Reference in New Issue
Block a user