Files
xenia-rs/audit-runs/phase-c3-RtlImageXexHeaderField/re-validation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

4.5 KiB

Phase C+3 — re-validation

Gate 1 — Determinism (cvar-OFF, ours)

3 fresh runs of check -n 50000000 --stable-digest:

run digest md5
1 f7b035298e7e2d09d413c1457c6c6fa1
2 f7b035298e7e2d09d413c1457c6c6fa1
3 f7b035298e7e2d09d413c1457c6c6fa1
Phase C/C+1/C+2 baseline 608d8e8d293250698207a7d8fc0c18df

Result: byte-identical across 3 runs. New baseline f7b03529… diverges from the C+2 baseline 608d8e8d… — expected per Tripstone #4 ("a real return-value fix in ours likely shifts the boot trajectory; the baseline digest WILL change"). The fix is deterministic (only adds a one-shot alloc_zero + mem.write_bulk at startup using bytes from the on-disk XEX header — no entropy source introduced).

Gate 2 — Phase B image_canonical_sha256

Not re-snapshotted. Inferred OK by code review: the fix touches only

  • KernelState::xex_header_guest_ptr (new field, no interaction with image),
  • xenia-app::cmd_exec (post-image-load alloc_zero into a fresh region in 0x4xxxxxxx; doesn't touch mem.write_bulk(base, &image_data) at line 888),
  • the rtl_image_xex_header_field handler (read-only),
  • diff_events.py (python tool; no engine effect).

The PE image region [base..base+image_size] is byte-identical pre- and post-fix.

Gate 3 — Phase A matched-prefix extension (THE KEY METRIC)

Diffed audit-runs/phase-c3-RtlImageXexHeaderField/ours.jsonl against the existing phase-c-first-divergence/phase-a/canary.jsonl.

With allocator canonicalization (default):

chain C+2 (pre-C3) C+3 (post) Δ
canary tid=6 → ours tid=1 (main) 102014 102032 +18
canary tid=4 → ours tid=11 5 5 0
canary tid=7 → ours tid=2 2 2 0
canary tid=12 → ours tid=7 2 2 0
canary tid=14 → ours tid=9 11 11 0
canary tid=15 → ours tid=10 (no div) (no div) 0

Main thread matched prefix grew from 102014 to 102032. Gate 3 .

The new first-divergence at idx=102032 is XeKeysConsolePrivateKeySign (canary returns 1, ours returns 0) — that's the next Phase C+N target, out of scope here.

With --no-canonicalize-allocators (backward-compat check): matched=161 — same as Phase C+1, because the MmAllocatePhysicalMemoryEx divergence at idx=161 dominates without canonicalization. With BOTH allocator + xex-header canonicalization, prefix reaches 102032.

Gate 4 — Build

$ cargo build --release -p xenia-app
   Compiling xenia-kernel v0.1.0
   Compiling xenia-app v0.1.0
    Finished `release` profile [optimized] target(s) in 6.17s

One pre-existing dead-code warning (walk_committed_regions); not introduced by this fix. Canary untouched.

Gate 5 — Phase A determinism (emitter)

Two cvar-ON captures of the same engine binary on the same ISO, md5-summing only deterministic fields (excluding host_ns):

ours.jsonl  (run 1, deterministic-fields-only)   714f06373f2f8f0e2f2bb5f1082da862
/tmp/c3_pa_run2.jsonl (run 2, det-fields-only)   714f06373f2f8f0e2f2bb5f1082da862

Byte-identical.

Gate 6 — --no-canonicalize-allocators backward-compat

Diff with the flag set reproduces the Phase C+1 baseline result of matched=161 (MmAllocatePhysicalMemoryEx divergence at idx=161). This confirms the canonicalization is purely additive at the diff-tool level and the engine fix doesn't disturb the raw-VA stream upstream.

Gate 7 — Kernel unit tests

$ cargo test --release -p xenia-kernel
test result: ok. 129 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out

. Two new tests would be a logical addition (validate rtl_image_xex_header_field returns the right value for each key-class), but kept out of this session's scope per "minimal fix".

Summary

All 7 gates pass. Phase A main matched prefix grew from 102014 to 102032 (+18 events). The fix is symmetric: canary calls UserModule::GetOptHeader on its in-guest header copy via the XexExecutableModuleHandle → hmodule_ptr → +0x58 → xex_header_base chain; ours now performs the same lookup against its own in-guest header copy, with a KernelState::xex_header_guest_ptr fallback when the chain yields NULL (which it does in ours because the LDR walk goes through *XexExecutableModuleHandle = image_base — see investigation for why fixing the LDR is Phase-A-regressing).

Next divergence: XeKeysConsolePrivateKeySign @ tid_event_idx=102032 (canary returns 1, ours returns 0). Class likely (A) missing handler or (B) stub returning 0 by analogy with this session — Phase C+4 target.