handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
158
audit-runs/phase-c22-payload-canonicalization/investigation.md
Normal file
158
audit-runs/phase-c22-payload-canonicalization/investigation.md
Normal file
@@ -0,0 +1,158 @@
|
||||
# Phase C+22 — Payload-field canonicalization for host-heap-derived guest VAs
|
||||
|
||||
**Date:** 2026-05-26
|
||||
**Mode:** WRITE — diff-tool only. No engine source changes.
|
||||
**Status:** LANDED. Main matched-prefix 105,128 → 105,138 (+10).
|
||||
|
||||
## TL;DR
|
||||
|
||||
The pre-C+22 first divergence at canary tid=6 ↔ ours tid=1 idx 105,128 is a
|
||||
`thread.create.ctx_ptr` mismatch:
|
||||
|
||||
```
|
||||
canary: thread.create {parent_tid=6, entry_pc=0x824cd458, ctx_ptr=0xbe56bb3c, ...}
|
||||
ours: thread.create {parent_tid=1, entry_pc=0x824cd458, ctx_ptr=0x42453b3c, ...}
|
||||
```
|
||||
|
||||
- `parent_tid` was ALREADY skipped via `SKIP_PAYLOAD_FIELDS_BY_KIND["thread.create"]`
|
||||
(line 245 of `diff_events.py`, in place since C+15-α). The task framing that
|
||||
it needed new canonicalization was misread; tests now pin the existing
|
||||
behavior so it doesn't regress.
|
||||
- `ctx_ptr` IS the actual divergence at this index. Canary's `0xbe56bb3c`
|
||||
is in the BC physical heap; ours's `0x42453b3c` is in the unified user heap.
|
||||
Same AUDIT-043 ε class as C+2's `MmAllocatePhysicalMemoryEx`.
|
||||
|
||||
## Why C+2's `ALLOCATOR_RETURN_FNS` doesn't cover this
|
||||
|
||||
C+2 canonicalizes `kernel.return.return_value` for a known set of host-
|
||||
allocator-returning exports. `ExCreateThread`'s return *value* is the new
|
||||
thread's handle (already covered by `handle_semantic_id` skip-policy), but
|
||||
the host-allocated TLS/context block VA appears in a *typed payload field*
|
||||
(`thread.create.ctx_ptr`) — a side channel C+2 doesn't see.
|
||||
|
||||
## The fix
|
||||
|
||||
`HOST_HEAP_PAYLOAD_FIELDS_BY_KIND` map and `canonicalize_host_heap_payload_fields`
|
||||
helper, exact mirror of `ALLOCATOR_RETURN_FNS` / `canonicalize_allocator_returns`,
|
||||
restricted to typed payload fields. Initial set:
|
||||
|
||||
```python
|
||||
HOST_HEAP_PAYLOAD_FIELDS_BY_KIND = {
|
||||
"thread.create": ("ctx_ptr",),
|
||||
}
|
||||
```
|
||||
|
||||
Sentinel format: `<HOSTHEAP_<KIND>_<FIELD>_<ORDINAL>>` — distinct namespace
|
||||
from `<ALLOC_*_*>` so the two passes don't collide.
|
||||
|
||||
## Strict fields preserved (THE tripstone)
|
||||
|
||||
`thread.create`'s game-visible attributes MUST stay strict — they're not
|
||||
host-heap-derived and any divergence is a real bug. Tests verify each:
|
||||
|
||||
| field | canary | ours | strict? |
|
||||
|---|---|---|---|
|
||||
| `entry_pc` | `0x824cd458` | `0x824cd458` | YES — guest VA from XEX, bit-identical |
|
||||
| `priority` | `0` | `0` | YES — game-visible |
|
||||
| `affinity` | `4` | `4` | YES — game-visible |
|
||||
| `stack_size` | `32768` | `32768` | YES — game-visible |
|
||||
| `suspended` | `false` | `false` | YES — game-visible |
|
||||
| `parent_tid` | `6` | `1` | NO — already skipped (C+15-α) |
|
||||
| `handle_semantic_id` | engine-local | engine-local | NO — already skipped (C+15-α) |
|
||||
| `ctx_ptr` | `0xbe56bb3c` | `0x42453b3c` | NEW: canonicalized via ordinal (C+22 v1.7) |
|
||||
|
||||
5 negative tests in `test_diff_events.py` mutate each strict field one-at-a-
|
||||
time and confirm divergence still surfaces — guard against over-suppression.
|
||||
|
||||
## Verification matrix
|
||||
|
||||
| canary file | pre-C+22 matched | post-C+22 matched | Δ |
|
||||
|---|---|---|---|
|
||||
| `canary-jitter-1.jsonl` (4.4 GB, 476,943 events on tid=6) | 105,128 | **105,138** | **+10** |
|
||||
| `canary-jitter-2.jsonl` (3.5 GB, 441,027 events on tid=6) | 105,128 | **105,138** | **+10** |
|
||||
| `canary-jitter-3.jsonl` (3.7 GB, 445,578 events on tid=6) | 105,128 | **105,138** | **+10** |
|
||||
|
||||
All three jitter runs advance to the SAME new divergence: idx 105,138,
|
||||
`kernel.return VdQueryVideoFlags`:
|
||||
|
||||
```
|
||||
canary: payload.return_value = 3 (status "0x00000003")
|
||||
ours: payload.return_value = 0 (status "0x00000000")
|
||||
```
|
||||
|
||||
This is a genuine Vd subsystem divergence (UNRELATED to canonicalization),
|
||||
out of C+22's scope — surfaces correctly as a real first-divergence.
|
||||
|
||||
## Tests
|
||||
|
||||
8 new tests in `test_diff_events.py`:
|
||||
|
||||
1. `test_thread_create_ctx_ptr_in_host_heap_set` — registration sanity.
|
||||
2. `test_host_heap_field_canonicalization_ordinals` — ordinals assigned
|
||||
per-tid in event order, sentinel format correct, strict fields untouched.
|
||||
3. `test_host_heap_field_cross_engine_alignment` — divergent raw VAs
|
||||
collapse to identical sentinels; `compare_event` reports no divergence.
|
||||
4. `test_host_heap_field_real_divergence_still_caught` — parameterized
|
||||
over `entry_pc`/`priority`/`affinity`/`stack_size`/`suspended`,
|
||||
each strict-field mutation surfaces correctly.
|
||||
5. `test_host_heap_field_count_mismatch_still_diverges` — ordinal-count
|
||||
skew produces distinct sentinels (divergence-preserving contract).
|
||||
6. `test_host_heap_field_non_string_value_left_alone` — `None` / missing
|
||||
values leave ordinal counter unincremented; first string-typed value
|
||||
gets ordinal 0.
|
||||
7. `test_parent_tid_already_skipped` — pins the C+15-α behavior so
|
||||
future refactors don't accidentally remove `parent_tid` from
|
||||
`SKIP_PAYLOAD_FIELDS_BY_KIND`.
|
||||
8. (covered in #2) Strict-field preservation as positive assertion.
|
||||
|
||||
Total: previous 33 tests + 8 new = **41 tests, all PASS**.
|
||||
|
||||
## Files touched
|
||||
|
||||
- `xenia-rs/tools/diff-events/diff_events.py` (+~70 LOC additive)
|
||||
- `HOST_HEAP_PAYLOAD_FIELDS_BY_KIND` constant
|
||||
- `canonicalize_host_heap_payload_fields()` function
|
||||
- `--no-canonicalize-host-heap-fields` CLI flag
|
||||
- Call site in `main()` (mirrors `--no-canonicalize-allocators`)
|
||||
- `xenia-rs/tools/diff-events/test_diff_events.py` (+~290 LOC tests)
|
||||
- `xenia-rs/audit-runs/phase-a-diff-harness/schema-v1.md` (+~110 LOC)
|
||||
- New §"Host-heap payload-field canonicalization (v1.7 …)"
|
||||
- Updated `ctx_ptr` row in field-comparison rules table
|
||||
|
||||
NO engine source touched. xenia-rs HEAD unchanged. Phase B
|
||||
`image_loaded_sha256` ε class boundary unchanged.
|
||||
|
||||
## Backward compatibility
|
||||
|
||||
- Wire format unchanged (`schema_version = 1`).
|
||||
- Pre-C+22 event logs whose `thread.create.ctx_ptr` is non-string (`None`
|
||||
/ missing) parse cleanly — the canonicalizer is defensive.
|
||||
- Pre-C+22 event logs whose `ctx_ptr` happens to bit-match (static-
|
||||
allocator VAs both engines use, e.g. `0x828F3D08`) still match
|
||||
identically post-canonicalization (same ordinal in both engines).
|
||||
- `--no-canonicalize-host-heap-fields` reverts to raw-VA comparison
|
||||
for investigation/debugging.
|
||||
|
||||
## Cascade
|
||||
|
||||
- A (design): PASS — minimal extension of C+2 pattern, no new
|
||||
mechanism class.
|
||||
- B (implement + test): PASS — 8 new tests, 41 total PASS.
|
||||
- C (3-jitter verification): PASS — all three jitters advance
|
||||
105,128 → 105,138 (+10), same downstream divergence.
|
||||
- D (fresh canary measurement, main > 105,128): PASS using archived
|
||||
jitter cold runs (105,138 > 105,128 ✓ on all 3). A fresh canary
|
||||
cold run was NOT initiated this session — the 3-jitter archived
|
||||
set is the protocol-honored substitute when canary is wedged or
|
||||
build is slow (per phase-c25-mm-allocator-family precedent).
|
||||
|
||||
## Next divergence (C+23 candidate)
|
||||
|
||||
`kernel.return VdQueryVideoFlags` at idx 105,138:
|
||||
- canary returns `3` (status `0x00000003`)
|
||||
- ours returns `0` (status `0x00000000`)
|
||||
|
||||
`VdQueryVideoFlags` is a Vd-subsystem export that returns a bitmask of
|
||||
video-mode capabilities (HDTV, widescreen, anti-aliasing). The
|
||||
divergence is a real bug downstream of C+22, NOT a canonicalization
|
||||
class. C+23+ scope.
|
||||
Reference in New Issue
Block a user