handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/phase-c-first-divergence/classification.md
+++ b/audit-runs/phase-c-first-divergence/classification.md
@@ -0,0 +1,111 @@
+# Phase C — first-divergence classification
+
+## The raw first byte-diff
+
+| | |
+|---|---|
+| Guest VA | `0x82000600` |
+| File offset | `0x00000600` |
+| Section | `.rdata` (start of section, virtual_address = 0x600) |
+| canary byte | `0xde` (start of `de ad c0 de` poison pattern) |
+| ours byte | `0x00` |
+| .pe byte | `0x00` |
+
+## The diff is the xam.xex variable-import slot table
+
+`xex.json` lists 52 `record_type=0` imports for `xam.xex`, each at a
+sequential 4-byte slot starting at `address = 0x82000600`:
+
+```
+xam.xex ord=652 rt=0 addr=0x82000600
+xam.xex ord=700 rt=0 addr=0x82000604
+xam.xex ord=705 rt=0 addr=0x82000608
+xam.xex ord=725 rt=0 addr=0x8200060c
+...
+```
+
+The next 204−52 = 152 `record_type=0` slots are for `xboxkrnl.exe`,
+continuing at `0x820006D0..0x82000934`.
+
+## What each engine writes at these slots
+
+| | record_type=0 (var slot, 4 bytes) | record_type=1 (thunk, 16 bytes) |
+|---|---|---|
+| canary | `de ad c0 de` (poison sentinel) | host-shim bytes: `44 00 00 42 / 4e 80 00 20 / 60 00 00 00 / 60 00 00 00` (`sc; blr; nop; nop`) |
+| ours | `00 00 00 00` (zero) | leaves .pe bytes in place (`01 00 ord_hi ord_lo / 02 00 ord_hi ord_lo / mtspr ctr,r11 / bctr`) |
+| .pe | XEX import-record tag: `00 00 ord_hi ord_lo` | template thunk: `01 00 ord_hi ord_lo / 02 00 ord_hi ord_lo / mtspr ctr,r11 / bctr` |
+
+## Classification: **import-thunk / ε-class allocator drift**
+
+This matches **tripstone #2** of the Phase C brief verbatim:
+
+> Import thunks are legitimately engine-specific. If first byte-diff is
+> in a thunk, canonicalize and re-find first diff.
+
+The two engines implement different HLE dispatch strategies:
+
+- **canary**: in-place thunk patching. Overwrites the guest XEX bytes
+  with host-shim instructions; record_type=0 slots get `0xDEADC0DE`
+  poison (canary panics if a guest dereferences an unimplemented import
+  variable).
+- **ours**: HLE dispatch happens at the JIT translation layer, not by
+  patching the thunk. Record_type=1 thunks keep their original `.pe`
+  bytes; record_type=0 slots get zeroed (still distinguishable from
+  the .pe ordinal-tag content if guest code reads them).
+
+Both are valid engine implementation choices.
+
+## After canonicalization — the real check
+
+Mask all import-slot bytes (record_type=0 = 4 bytes per slot,
+record_type=1 = 16 bytes per slot, total 3920 bytes across 398 slots)
+to `0xCD` in canary, ours, AND .pe. Then compare:
+
+```
+canary canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
+ours   canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
+pe     canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
+```
+
+**All three match.** Bytes differing canonical: **0**.
+
+## Conclusion
+
+There is **NO real engine divergence** at the image-load layer.
+
+- Both engines decode the XEX2 file correctly.
+- Both load it into guest memory at the correct virtual addresses.
+- Both produce byte-identical content outside the import-patch region.
+- Even .pe (an independent third-party offline XEX2 decoder) produces
+  the exact same canonical content.
+
+The Phase B `image_loaded_sha256` δ-content-STOP was a **false positive**
+caused by an overly strict invariant: hashing engine-specific runtime
+patches as if they were XEX content.
+
+## What the fix is
+
+The fix is in the **comparison framework**, not the engines:
+
+1. `diff_state.py`: relaxed STOP invariant — when `--xex-json` is
+   provided AND both snapshots contain `image.bin`, compute and check
+   `image_canonical_sha256` (engine-mask agnostic) as the real STOP
+   key. The raw `image_loaded_sha256` is still reported but is
+   informational.
+2. `phase_b_snapshot.{rs,cc}`: when `phase_b_dump_section_content` is
+   set, emit `image.bin` (raw bytes of the XEX image region) so the
+   diff tool can perform canonicalization. Default-off; cvar-OFF
+   binary digest is byte-identical to pre-Phase-C baseline.
+
+## What this implies for downstream divergences
+
+The Phase B catalog's 57 remaining divergences (post-image-load) are
+still meaningful — they describe real differences in stack/PCR/TLS
+allocation strategy, heap layout, kernel-object population, and
+exports-table state. These are now interpretable on a verified
+canonically-equivalent image baseline.
+
+The Phase A diff's first runtime divergence at `tid_event_idx=113`
+(`KeQuerySystemTime return_value`) is the next Phase C+1 target. It
+is **not** a downstream symptom of the image-load mismatch; it is the
+next genuine engine divergence in the kernel-call sequence.