Files
xenia-rs/audit-runs/phase-c-first-divergence/classification.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

112 lines
4.4 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase C — first-divergence classification
## The raw first byte-diff
| | |
|---|---|
| Guest VA | `0x82000600` |
| File offset | `0x00000600` |
| Section | `.rdata` (start of section, virtual_address = 0x600) |
| canary byte | `0xde` (start of `de ad c0 de` poison pattern) |
| ours byte | `0x00` |
| .pe byte | `0x00` |
## The diff is the xam.xex variable-import slot table
`xex.json` lists 52 `record_type=0` imports for `xam.xex`, each at a
sequential 4-byte slot starting at `address = 0x82000600`:
```
xam.xex ord=652 rt=0 addr=0x82000600
xam.xex ord=700 rt=0 addr=0x82000604
xam.xex ord=705 rt=0 addr=0x82000608
xam.xex ord=725 rt=0 addr=0x8200060c
...
```
The next 20452 = 152 `record_type=0` slots are for `xboxkrnl.exe`,
continuing at `0x820006D0..0x82000934`.
## What each engine writes at these slots
| | record_type=0 (var slot, 4 bytes) | record_type=1 (thunk, 16 bytes) |
|---|---|---|
| canary | `de ad c0 de` (poison sentinel) | host-shim bytes: `44 00 00 42 / 4e 80 00 20 / 60 00 00 00 / 60 00 00 00` (`sc; blr; nop; nop`) |
| ours | `00 00 00 00` (zero) | leaves .pe bytes in place (`01 00 ord_hi ord_lo / 02 00 ord_hi ord_lo / mtspr ctr,r11 / bctr`) |
| .pe | XEX import-record tag: `00 00 ord_hi ord_lo` | template thunk: `01 00 ord_hi ord_lo / 02 00 ord_hi ord_lo / mtspr ctr,r11 / bctr` |
## Classification: **import-thunk / ε-class allocator drift**
This matches **tripstone #2** of the Phase C brief verbatim:
> Import thunks are legitimately engine-specific. If first byte-diff is
> in a thunk, canonicalize and re-find first diff.
The two engines implement different HLE dispatch strategies:
- **canary**: in-place thunk patching. Overwrites the guest XEX bytes
with host-shim instructions; record_type=0 slots get `0xDEADC0DE`
poison (canary panics if a guest dereferences an unimplemented import
variable).
- **ours**: HLE dispatch happens at the JIT translation layer, not by
patching the thunk. Record_type=1 thunks keep their original `.pe`
bytes; record_type=0 slots get zeroed (still distinguishable from
the .pe ordinal-tag content if guest code reads them).
Both are valid engine implementation choices.
## After canonicalization — the real check
Mask all import-slot bytes (record_type=0 = 4 bytes per slot,
record_type=1 = 16 bytes per slot, total 3920 bytes across 398 slots)
to `0xCD` in canary, ours, AND .pe. Then compare:
```
canary canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
ours canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
pe canonical sha256: 62c51908e2df705583fe81a084f39bd399196f9000cfa7bffd56127b41a4ab96
```
**All three match.** Bytes differing canonical: **0**.
## Conclusion
There is **NO real engine divergence** at the image-load layer.
- Both engines decode the XEX2 file correctly.
- Both load it into guest memory at the correct virtual addresses.
- Both produce byte-identical content outside the import-patch region.
- Even .pe (an independent third-party offline XEX2 decoder) produces
the exact same canonical content.
The Phase B `image_loaded_sha256` δ-content-STOP was a **false positive**
caused by an overly strict invariant: hashing engine-specific runtime
patches as if they were XEX content.
## What the fix is
The fix is in the **comparison framework**, not the engines:
1. `diff_state.py`: relaxed STOP invariant — when `--xex-json` is
provided AND both snapshots contain `image.bin`, compute and check
`image_canonical_sha256` (engine-mask agnostic) as the real STOP
key. The raw `image_loaded_sha256` is still reported but is
informational.
2. `phase_b_snapshot.{rs,cc}`: when `phase_b_dump_section_content` is
set, emit `image.bin` (raw bytes of the XEX image region) so the
diff tool can perform canonicalization. Default-off; cvar-OFF
binary digest is byte-identical to pre-Phase-C baseline.
## What this implies for downstream divergences
The Phase B catalog's 57 remaining divergences (post-image-load) are
still meaningful — they describe real differences in stack/PCR/TLS
allocation strategy, heap layout, kernel-object population, and
exports-table state. These are now interpretable on a verified
canonically-equivalent image baseline.
The Phase A diff's first runtime divergence at `tid_event_idx=113`
(`KeQuerySystemTime return_value`) is the next Phase C+1 target. It
is **not** a downstream symptom of the image-load mismatch; it is the
next genuine engine divergence in the kernel-call sequence.