Files
xenia-rs/audit-runs/phase-a-diff-harness/validation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.7 KiB

Phase A — Validation record

All four acceptance gates from the plan have been executed against the patched canary (build-cross/bin/Windows/Debug/xenia_canary.exe) and ours (target/release/xenia-rs). Results below were captured on 2026-05-13.

Gate 1: cvar-OFF determinism

ours

  • Pre-patch binary digest: audit-runs/phase-a-diff-harness/digest-pre-patch.json (captured from a copy of the binary made before applying the patch).
  • Post-patch binary digest: audit-runs/phase-a-diff-harness/digest-post-patch-cvaroff.json.
  • Both runs: check --stable-digest -n 50000000 against the same ISO.
  • Verification: diff of the two files produces zero output. Byte-identical. PASS.

canary

  • Pre-patch run: 12 s boot under Wine, --mute=true, log size 68 301 bytes.
  • Post-patch run with cvar unset: same conditions, log size 68 407 bytes.
  • Per-line diff of the two logs:
    • Lines 19-20: two new entries in the CONFIG DUMP — phase_a_event_log_path = "" and phase_a_event_log_mem_writes = false. Expected — these are the two cvars we declared, both default to empty/false.
    • Remaining differences: host-pointer values (pid=0x..., graphics_system=0x..., native=0x...) and millisecond timings (Translated 5 shaders in 18 ms vs ... in 13 ms).
  • Cross-check: re-ran the same post-patch binary a second time. Log size identical (68 407 bytes). Diff between two consecutive runs of the same binary shows the same volume and nature of host-pointer/timing changes — i.e. this is normal run-to-run jitter, not a behavioral change introduced by the patch.
  • Smoke marker (AUDIT-DEMO-SETUP-BEGIN/AUDIT-DEMO-SETUP-GRAPHICS-OK) fires in both runs.
  • PASS.

Unit tests

cargo test -p xenia-kernel event_log — 2/2 tests pass:

  • fnv1a_known_vector (FNV-1a 64-bit of "foobar" == 0x85944171f73967e8, the standard FNV-1a test vector)
  • semantic_id_stable (identity inputs produce identity output; distinct inputs produce distinct output)

Gate 2: cvar-ON emits well-formed JSONL with schema_version header

ours

$ head -1 audit-runs/phase-a-diff-harness/ours-sanity.jsonl
{"schema_version":1,"engine":"ours","kind":"schema_version","tid":0,"tid_event_idx":0,
 "guest_cycle":0,"host_ns":48371,"deterministic":true,
 "payload":{"version":1,"emitter_build":"ours-phaseA"}}
$ wc -l audit-runs/phase-a-diff-harness/ours-sanity.jsonl
121363

50 M-instruction run produced 121 363 valid JSONL events.

canary

$ head -1 audit-runs/phase-a-diff-harness/canary-sanity.jsonl
{"schema_version":1,"engine":"canary","kind":"schema_version","tid":0,"tid_event_idx":0,
 "guest_cycle":0,"host_ns":300,"deterministic":true,
 "payload":{"version":1,"emitter_build":"canary-phaseA"}}
$ wc -l audit-runs/phase-a-diff-harness/canary-sanity.jsonl
1635789

12 s Wine run produced 1 635 789 valid JSONL events. (Volume differential vs ours reflects canary's debug build with full kernel-call logging at every shim trampoline; both engines pin schema_version=1.)

Both files lead with a schema_version event. PASS.

Gate 3: diff tool finds matching prefix on tid=1

Ran tools/diff-events/diff_events.py on the two sanity files with auto-mapping:

| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|     6      |    1     |   113   |    313196    |   108492   |         113         |
|     4      |   11     |     5   |     25163    |        9   |           5         |
|     7      |    2     |     2   |        29    |       33   |           2         |
|    12      |    7     |     2   |      2846    |        3   |           2         |
|    14      |    9     |    11   |    587000    |       75   |          11         |
|    15      |   10     |    15   |    355601    |       15   |          —          |

The primary boot-thread pair (canary_tid=6ours_tid=1) matched 113 events before the first divergence — well over the ≥100 threshold required by the gate. PASS.

The full per-thread report is at diff-report.md. Per Phase A discipline, those divergences are NOT analyzed in this session; they are input for Phase B.

Gate 4: Negative test detects a hand-corrupted event

# Self-diff of identical files — clean exit
$ python3 tools/diff-events/diff_events.py \
    --canary /tmp/ours-short.jsonl --ours /tmp/ours-short.jsonl --validate-identical
$ echo $?
0

# Corrupted "kernel.call" -> "kernel.CORRUPT" on a tid=1 kernel.call event
$ python3 tools/diff-events/diff_events.py \
    --canary /tmp/ours-short.jsonl --ours /tmp/ours-corrupt.jsonl --validate-identical
$ echo $?
1

The diff report names the divergence at the right index:

First divergence at `tid_event_idx=4`: kind: canary='kernel.call' ours='kernel.CORRUPT'

A second corruption further down the file (line 51, tid_event_idx=49) was also detected. PASS.

Summary

Gate Status
1. Cvar-OFF determinism (both engines)
2. Cvar-ON emits valid JSONL with schema_version header (both engines)
3. Diff tool reports ≥100-event matching prefix on tid=1 → divergence at idx 113
4. Negative test (corrupt one event) → exit 1, correct tid_event_idx named

Cascade prediction at session close (harness signals only):

  • A (infrastructure builds, cvar-OFF zero overhead): achieved.
  • B (cvar-ON emits valid JSONL both engines): achieved.
  • C (sanity validation 4-gate passes first try): achieved on the first complete run, modulo a transient build-time issue (CMake xe_platform_sources is non-incremental for new .cc files in canary — needed a cmake --preset cross-win-clangcl reconfigure).
  • D (fix lands): N/A — out of scope for Phase A.