Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

17 KiB

Raw Blame History

Phase A + Phase B verification report

Session: 2026-05-13. Reviewer: WRITE-mode verify pass over Phase A (audit-runs/phase-a-diff-harness/) and Phase B (audit-runs/phase-b-state-equivalence/) deliverables. Discipline: no Phase C investigation, no XEX sha256 chase, no anchor-on-divergence.

Outcome

Phase	Gates	Pre-fix	Post-fix
A	4	4/4 PASS	4/4 PASS
B	5	4/5 PASS — gate 5 produced false-PASS	5/5 PASS
Combined cvar-OFF determinism	1	PASS	PASS
Diff-tool synthetic edge cases	5 (each tool)	PASS	PASS
Hook-point semantic equivalence	2 (Phase A + Phase B)	PASS	PASS

The "false-PASS" pre-fix is HIGH-severity and is detailed in Issue-1 below. Without the fix, the negative-test gate of Phase B silently passed when the test was actually broken — meaning a tampered snapshot file with an intact manifest copy would have been reported as "identical" by diff_state.py. The Phase B catalog (canary↔ours divergences) is unaffected by this bug because canary's manifest hashes legitimately differ from ours's, so the buggy short-circuit never engaged for any real Phase B comparison.

Issues found and resolutions

Issue 1 — HIGH: `diff_state.py` manifest-hash short-circuit trusts

manifests without verification

Symptom. Re-running validation.md gate 5 verbatim produces exit 0 and "0 divergences", not the documented exit 1. The mutation (changing kernel.json thread_id: 1 → thread_id: 999) is silently masked because the gate-5 procedure copies the original manifest.json alongside the mutated file. Both manifests then claim the same kernel.json hash, so the diff tool's manifest-hash short-circuit (if ch == oh: file_status[name] = "identical") reports the file as identical without comparing content.

Reproduction. audit-runs/phase-ab-verify/synthetic-diff-tests/ plus the verbatim gate-5 procedure (see this report's Re-validation gate 5).

Fix. Patched tools/diff-state/diff_state.py diff_directory to re-hash both files when manifests claim equality and only short-circuit when the on-disk SHAs match the manifest. When they don't, a manifest-hash-mismatch σ-structural divergence is emitted and the file is fully content-diffed, ensuring no silent masking.

Re-validation.

Verbatim gate-5 procedure now exits 1 and names the divergence precisely (kernel.json objects[handle_semantic_id=…].details.thread_id canary=1 ours=999) plus the manifest-hash-mismatch σ row.
Stored Phase B report (report.md) regenerates byte-identical (58 divergences, exit 2 STOP) — no regression on the legitimate canary↔ours comparison.
Self-diff of snap-001/ours and snap-001/canary continues to return validate-identical: OK exit 0 — the optimization still applies to truthful manifests.
Inter-run reproducibility tests (snap-002a/ours vs snap-002b/ours) also pass validate-identical.

Issue 2 — MEDIUM: `validation.md` gate 5 documents a procedure that

relies on the buggy short-circuit

The gate-5 procedure as written in validation.md (and the claim that it produced exit 1) was already inaccurate before this verification. Either the gate was re-stated from memory rather than re-run at landing, or the actual run used a different procedure.

Fix. Updated audit-runs/phase-b-state-equivalence/validation.md gate-5 entry to (a) keep the verbatim procedure, (b) name both divergences the fixed diff tool now surfaces (manifest-hash-mismatch σ + the actual mutation), and (c) include a footnote describing the pre-fix bug and pointing at the diff_state.py change.

Issue 3 — LOW: `validation.md` gate 2 mis-claims canary's snapshot

JSON is sort-keys-sorted

Canary's phase_b_snapshot.cc writes JSON via direct fmt::format, emitting keys in insertion order — schema_version, engine, pc, lr, ctr, …. ours's phase_b_snapshot.rs uses serde_json which emits keys alphabetically (cr, ctr, deterministic_skip, engine, …). The diff tool parses both sides into dicts before comparing, so this has no functional impact on the catalog. It does mean that even semantically-equivalent snapshots produce mismatching SHAs at the file level, so the manifest-hash short-circuit in diff_state.py never short-circuits canary↔ours comparisons (the underlying byte content trivially differs even where the parsed semantics match).

Fix. Updated validation.md gate-2 entry to describe the actual behavior accurately.

Issue 4 — LOW: schema kind count and unwired-list inaccuracies

audit-runs/phase-a-diff-harness/README.md claims "Schema v1 declares 11 event kinds" and "wires three" then lists four kinds. Actual count in schema-v1.md: 13 sections with 16 distinct kind strings (thread.suspend/thread.resume and vfs.open/vfs.read/vfs.close share their respective sections).

ours-changes.md lists six unwired kind families but omits thread.suspend/thread.resume. The Rust emitter API has 9 emit_* functions, of which 3 are wired (4 if you count the synthetic schema_version header) and 6 are stubbed. Five additional kinds have no Rust function yet (thread.suspend, thread.resume, mem.write, vfs.open, vfs.read, vfs.close).

Fix. Updated README.md and ours-changes.md to distinguish wired / stubbed / not-yet-stubbed precisely and use accurate counts. Did not add any new emitters or hooks (out of scope per session brief).

Per-step verification record

Step 2 — Combined Phase A + Phase B cvar-OFF determinism

Ran the current target/release/xenia-rs (built from sources containing both Phase A and Phase B) with no Phase A or Phase B cvars set:

$ ./target/release/xenia-rs check --stable-digest -n 50000000 \
    --out audit-runs/phase-ab-verify/digest-current-cvaroff.json \
    "<ISO>"
$ diff audit-runs/phase-a-diff-harness/digest-pre-patch.json \
       audit-runs/phase-ab-verify/digest-current-cvaroff.json
# (no output)

PASS. Combined Phase A + Phase B cvar-OFF binary digest is byte-identical to the pre-Phase-A baseline.

Verified by md5sum that target/release/xenia-rs and target/release/xenia-rs-phaseB are byte-identical (current build); xenia-rs-phaseA-pre is older (pre-patch baseline).

Step 3 — Phase A four gates re-validated

Gate	Result	Method
1 cvar-OFF byte-identical (ours)	✅	Step 2 above
1 cvar-OFF canary smoke marker fires	✅	Wine 18-s timed run with `--mute=true`; `AUDIT-DEMO-SETUP-BEGIN` and `AUDIT-DEMO-SETUP-GRAPHICS-OK` both observed in `xenia.log`. CONFIG DUMP shows the 5 expected new cvars (2 Phase A + 3 Phase B), all default empty/false.
2 cvar-ON valid JSONL with `schema_version` first line	✅	All 121 363 lines of `ours-sanity.jsonl` and 1 635 789 lines of `canary-sanity.jsonl` parse as JSON. Both lead with `{"schema_version":1,…,"kind":"schema_version",…}`. Kind histogram: ours 3:1:1:1 ratio import.call/kernel.call/kernel.return/header (perfect — 40454 each); canary 1:545271:545270:545247 (24 in-flight calls when wineserver killed, expected).
3 ≥100-event matching prefix on tid=6→tid=1	✅	Re-ran `diff_events.py` on stored sanity logs; output byte-identical to stored `diff-report.md`. 113 matched events on canary tid=6 → ours tid=1; first divergence at idx 113 (KeQuerySystemTime return_value differs — Phase B/C input).
4 negative test detects corruption at exact index	✅	Took first 100 events of `ours-sanity.jsonl` to `/tmp/ours-short.jsonl`; corrupted line 50 (`tid_event_idx=48`) by changing `kind: import.call` → `kind: kernel.CORRUPT`. Self-diff: exit 0 OK. Corrupt diff: exit 1, `validate-identical: divergence in canary_tid=1 at tid_event_idx=48 (kind: canary='import.call' ours='kernel.CORRUPT')`.

Step 4 — Phase B five gates re-validated

Gate	Result	Method
1 cvar-OFF byte-identical (ours)	✅	Step 2 above
1 cvar-OFF canary CONFIG DUMP shows 5 expected lines	✅	Same Wine smoke run; CONFIG DUMP `[Audit]` section includes `phase_a_event_log_path`, `phase_a_event_log_mem_writes`, `phase_b_dump_section_content`, `phase_b_snapshot_and_exit`, `phase_b_snapshot_dir` with default empty/false values.
2 well-formed snapshots both engines	✅	Both snap-001 dirs contain 6 files; all parse as JSON; manifest SHA-256s match recomputed file hashes; ours's JSON is sort-keys-sorted, canary's is insertion-order (note Issue 3).
3 hash-deterministic re-runs	✅ ours	Two ours runs to different `--phase-b-snapshot-dir`s (`snap-002a` and `snap-002b`): `validate-identical: OK` exit 0. Same-dir re-run (`snap-002c/ours` vs `snap-002c/ours-1`): byte-identical via `diff -r`.
3 hash-deterministic re-runs	✅ canary	New canary snapshot `snap-canary-002/canary` vs existing `snap-001/canary`: `validate-identical: OK` exit 0. Full diff: 4 of 5 files identical, only `config.json` "diverged" with 0 reportable divergences (path/timestamp fields are skipped).
4 invariant `pc == entry_point == 0x824ab748` both engines	✅	Confirmed by inspecting `snap-001/canary/cpu_state.json` and `snap-001/ours/cpu_state.json` — both `pc: "0x824ab748"`; `config.json::xex_entry_point: "0x824ab748"` in both.
4 invariant `image_loaded_sha256` matches	❌ FAIL → STOP	Reproduced canary `a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c` and ours `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18` across two independent runs each. Reproducible STOP condition; this is the documented Phase C handoff, not a Phase B failure.
5 negative test detects mutation	❌ → ✅ post-fix	Pre-fix: false PASS (Issue 1). Post-fix: exit 1, names both the manifest-hash-mismatch σ and the actual mutation γ.

Step 5 — Hook-point semantic equivalence

Phase A boundary. Both engines hook at the kernel-export dispatch boundary (canary: shim_utils.h::ExportRegistrerHelper::*::Trampoline; ours: state.rs::call_export). Verified by inspecting the first 113 matched events on the boot thread:

canary tid=6 [0]: import.call RtlImageXexHeaderField (ord=299)
ours tid=1 [0]: import.call RtlImageXexHeaderField (ord=299)
canary tid=6 [1]: kernel.call RtlImageXexHeaderField
ours tid=1 [1]: kernel.call RtlImageXexHeaderField
canary tid=6 [2]: kernel.return RtlImageXexHeaderField
ours tid=1 [2]: kernel.return RtlImageXexHeaderField

The 113-event matching prefix demonstrates the boundary captures the same kernel-call sequence on the boot thread of each engine through 113 calls.

Asymmetries.

canary's debug build emits some kernel calls that complete before shim_utils trampoline (24 in-flight calls when wineserver -k kills the process — visible as kernel.call > kernel.return count imbalance). ours's check -n exit is clean. Not an asymmetry of the hook itself.
ours's call_export only emits when an export is Some(&(name, func)) in the dispatch table; unimplemented exports take the early return path and emit nothing. Canary's trampoline is per-shim; if canary has a shim where ours has no export, only canary will emit a kernel.call for it. This is an inherent boundary asymmetry that Phase C should be aware of, but it does NOT invalidate the matching prefix (the first 113 boot-thread calls are all on shared exports).

Phase B boundary. Both engines fire the snapshot hook immediately before the first guest PPC instruction at entry_point on the boot thread. PC == 0x824ab748 in both cpu_state.json files; thread_id records the boot thread (canary 6, ours 1). No "instruction count" / tbl_tbu field is captured, but the pc == entry_pc invariant is sufficient: had any instructions executed, PC would have advanced.

Verdict. Both Phase A and Phase B hook points are semantically equivalent across engines for the in-scope event types. Asymmetries (unimplemented exports, kernel-call-count off-by-N at process kill) are inherent to the boundaries themselves, not bugs in the harness.

Step 6 — Diff-tool robustness (5 synthetic edge cases each)

`diff_events.py`

Case	Input	Result
empty file	`empty.jsonl`	`SystemExit('empty file')` exit 1, no crash
single event (header only)	`single-event.jsonl` (just `schema_version`)	Auto-mapping finds no shared first kernel.call → exit 2 with clear message; no crash
missing schema header	first line is `import.call`	`SystemExit('first event is not schema_version')` exit 1, clear message
mismatched thread tids	canary has only tid=2; ours has only tid=1, no shared first-call name	exit 2 with clear "no tid mapping" message
field comparison rules honored	self-diff of `ours-sanity[0:99]`	exit 0; corruption at idx 48 → exit 1 with exact `tid_event_idx=48` named

`diff_state.py`

Case	Input	Result
empty snapshot dirs	`ds-empty/canary` and `ds-empty/ours` (no JSON files)	exit 2 STOP (invariants fail because `config.json` missing); 5 missing-file divergences
self-diff existing snapshot	`snap-001/ours` against itself	`validate-identical: OK` exit 0 (legitimate manifest match still short-circuits correctly)
missing canary dir	`/tmp/does-not-exist-xyz` as canary	exit 2 with "both snapshot dirs must exist" message
missing config.json	manifests present (empty) but no JSON files	exit 2 STOP (FileNotFoundError caught in `check_invariants`); 5 missing-file divergences
field mutation detection	`snap-001/ours` vs `/tmp/verify-gate5` (kernel.json mutated, manifest copied verbatim)	exit 1 (post-fix); names `manifest-hash-mismatch` σ + actual γ-content divergence

All synthetic cases handled gracefully; no crashes, exit codes distinguish failure modes (1 = data divergence; 2 = STOP / invalid input).

Step 7 — Schema coverage scope

Schema-v1.md declares 13 sections (16 distinct kind strings). Phase A wires:

status	kinds
wired (call sites in `state.rs::call_export` + canary `shim_utils.h`)	`schema_version`, `import.call`, `kernel.call`, `kernel.return`
stubbed (Rust `emit_*` exists, no call site)	`thread.create`, `thread.exit`, `handle.create`, `handle.destroy`, `wait.begin`, `wait.end`
not-yet-stubbed (no Rust function)	`thread.suspend`, `thread.resume`, `mem.write`, `vfs.open`, `vfs.read`, `vfs.close`

Documentation updates (Issue 4) clarify which is which. Per session brief, NOT wiring any of the unwired kinds — that is Phase A+ / Phase C scope.

Confirmed Phase B `image_loaded_sha256` mismatch (handed to Phase C)

Reproducible across two independent runs of each engine:

canary: a70993b77ca9e29218d033fad7c0b45c874676c4e0edd966545d39b266486a9c
ours: ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18

xex_entry_point = 0x824ab748 and cpu_state.pc = 0x824ab748 in both engines (these match — the snapshot point is equivalent). The in-memory bytes loaded for the XEX image differ. Per Phase B contract, this is the catalog finding handed to Phase C; verifier did not investigate cause. Phase B's documented next-step (re-run with --phase-b-dump-section-content, binary-diff section_contents[]) remains the correct Phase C entry point.

Files in this directory

File	Purpose
`verification-report.md`	This file.
`re-validation.md`	Per-gate post-fix re-validation evidence (compact).
`digest-current-cvaroff.json`	Step 2 digest from current build.
`regenerated-phase-a-diff-report.md`	`diff_events.py` output on stored sanity logs (byte-identical to stored `diff-report.md`).
`regenerated-phase-b-report.md`	`diff_state.py` output on stored snap-001 (pre-fix; byte-identical to stored `report.md`).
`regenerated-phase-b-report-postfix.md`	Same, but generated post-fix (also byte-identical).
`snap-002a/ours/`, `snap-002b/ours/`	Two independent ours snapshot runs (Phase B gate 3 reproducibility).
`snap-002c/ours/`, `snap-002c/ours-1/`	Same-dir ours re-run (byte-equality test).
`snap-canary-002/canary/`	Independent canary snapshot run (Phase B gate 3 reproducibility).
`coexist/`	Phase A + Phase B cvars enabled simultaneously, ours brief run; jsonl + 5-file snapshot both emitted cleanly.
`synthetic-diff-tests/`	Fixtures for Step 6 edge-case tests.

Cascade prediction

A re-verify gates with reproduction: achieved — all gates re-run, reproductions match.
B identify ≥1 instrumentation bug or doc issue: achieved — Issue 1 HIGH (diff tool short-circuit), Issues 2–4 documentation.
C fixes land + re-pass all gates: achieved — diff_state.py fix + 4 doc fixes; all gates pass post-fix; no regressions.
D Phase C base is solid going forward: achieved, with the caveat that Issue 3 (canary insertion-order JSON) means inter-engine manifest-hash short-circuit will never fire, but the fall-through full-content-diff path covers this correctly.

17 KiB Raw Blame History Unescape Escape