handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,137 @@
# Phase A diff report
**This report is the output of Phase A's diff harness. Divergences
shown here are INPUT for Phase B (first-divergence localization),
not findings of Phase A.** Phase A's job is to make the harness
itself correct, not to analyze what it surfaces.
## Summary
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|---|---|---|---|---|
| 4 | 11 | 11 | 2099 | 11 | — | 0/0 | 0/0 |
| 6 | 1 | 105128 | 119455 | 108507 | 105128 | 0/0 | 1/0 |
| 7 | 2 | 32 | 32 | 33 | — | 0/0 | 0/0 |
| 12 | 7 | 4 | 374 | 5 | 4 | 0/0 | 0/0 |
| 14 | 9 | 41 | 26901 | 77 | 41 | 0/0 | 0/0 |
| 15 | 10 | 16 | 11558 | 17 | — | 0/1 | 0/0 |
*`floating_create (c/o)` counts shared-global `handle.create` events absorbed by Phase C+18 cross-tid SID matching. `floating_wait (c/o)` counts `wait.begin` events on shared-global dispatchers absorbed by Phase C+21 (scheduling-jitter window — canary's contention slow path may fire while ours fast-paths or vice versa). See schema-v1.md §"Shared-global SIDs" and §"Wait-begin floating absorb".*
## canary_tid=4 → ours_tid=11
No divergence within the 11 compared events (canary has 2099, ours has 11).
## canary_tid=6 → ours_tid=1
First divergence at `tid_event_idx=105128`: payload.ctx_ptr: canary='0xbe56bb3c' ours='0x42453b3c'
**Pre-context (last 5 matching events):**
```
canary: [105130] kernel.call KiApcNormalRoutineNop
ours: [105123] kernel.call KiApcNormalRoutineNop
canary: [105131] kernel.return KiApcNormalRoutineNop
ours: [105124] kernel.return KiApcNormalRoutineNop
canary: [105132] import.call ExCreateThread
ours: [105125] import.call ExCreateThread
canary: [105133] kernel.call ExCreateThread
ours: [105126] kernel.call ExCreateThread
canary: [105134] handle.create sid=17d8b2ba9dd4ba13
ours: [105127] handle.create sid=3562d07db6ff161d
```
**Divergent event:**
```
canary: [105135] thread.create {'handle_semantic_id': '17d8b2ba9dd4ba13', 'parent_tid': 6, 'entry_pc': '0x824cd458', 'ctx_ptr': '0xbe56bb3c', 'priority': 0, 'affinity': 4, 'stack_size': 32768, 'suspended': False}
ours: [105128] thread.create {'handle_semantic_id': '3562d07db6ff161d', 'parent_tid': 1, 'entry_pc': '0x824cd458', 'ctx_ptr': '0x42453b3c', 'priority': 0, 'affinity': 4, 'stack_size': 32768, 'suspended': False}
```
**Next event after the divergence (if any):**
```
canary: [105136] kernel.return ExCreateThread
ours: [105129] kernel.return ExCreateThread
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1644146200, "kind": "thread.create", "payload": {"affinity": 4, "ctx_ptr": "0xbe56bb3c", "entry_pc": "0x824cd458", "handle_semantic_id": "17d8b2ba9dd4ba13", "parent_tid": 6, "priority": 0, "stack_size": 32768, "suspended": false}, "schema_version": 1, "tid": 6, "tid_event_idx": 105135}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 494758288, "kind": "thread.create", "payload": {"affinity": 4, "ctx_ptr": "0x42453b3c", "entry_pc": "0x824cd458", "handle_semantic_id": "3562d07db6ff161d", "parent_tid": 1, "priority": 0, "stack_size": 32768, "suspended": false}, "schema_version": 1, "tid": 1, "tid_event_idx": 105128}
```
## canary_tid=7 → ours_tid=2
No divergence within the 32 compared events (canary has 32, ours has 33).
## canary_tid=12 → ours_tid=7
First divergence at `tid_event_idx=4`: payload.return_value: canary=258 ours=0
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
canary: [2] handle.create sid=c49d8f0ab90401ea
ours: [2] handle.create sid=6e3d96c5a52bf429
canary: [3] wait.begin {'handles_semantic_ids': ['c49d8f0ab90401ea'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
ours: [3] wait.begin {'handles_semantic_ids': ['6e3d96c5a52bf429'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
```
**Divergent event:**
```
canary: [4] kernel.return KeWaitForSingleObject
ours: [4] kernel.return KeWaitForSingleObject
```
**Next event after the divergence (if any):**
```
canary: [5] import.call RtlEnterCriticalSection
ours: <end of stream>
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1676368000, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 258, "side_effects": [], "status": "0x00000102"}, "schema_version": 1, "tid": 12, "tid_event_idx": 4}
{"deterministic": true, "engine": "ours", "guest_cycle": 30, "host_ns": 494789418, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 7, "tid_event_idx": 4}
```
## canary_tid=14 → ours_tid=9
First divergence at `tid_event_idx=41`: payload.ord: canary=503 ours=293
**Pre-context (last 5 matching events):**
```
canary: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
ours: [36] kernel.call KeReleaseSpinLockFromRaisedIrql
canary: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
ours: [37] kernel.return KeReleaseSpinLockFromRaisedIrql
canary: [38] import.call KfLowerIrql
ours: [38] import.call KfLowerIrql
canary: [39] kernel.call KfLowerIrql
ours: [39] kernel.call KfLowerIrql
canary: [40] kernel.return KfLowerIrql
ours: [40] kernel.return KfLowerIrql
```
**Divergent event:**
```
canary: [41] import.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [41] import.call RtlEnterCriticalSection
```
**Next event after the divergence (if any):**
```
canary: [42] kernel.call XAudioGetVoiceCategoryVolumeChangeMask
ours: [42] kernel.call RtlEnterCriticalSection
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1898677900, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "XAudioGetVoiceCategoryVolumeChangeMask", "ord": 503}, "schema_version": 1, "tid": 14, "tid_event_idx": 41}
{"deterministic": true, "engine": "ours", "guest_cycle": 417, "host_ns": 1694886289, "kind": "import.call", "payload": {"module": "xboxkrnl.exe", "name": "RtlEnterCriticalSection", "ord": 293}, "schema_version": 1, "tid": 9, "tid_event_idx": 41}
```
## canary_tid=15 → ours_tid=10
No divergence within the 16 compared events (canary has 11558, ours has 17).

View File

@@ -0,0 +1,49 @@
--- a/xenia-rs/tools/diff-events/diff_events.py
+++ b/xenia-rs/tools/diff-events/diff_events.py
@@ -287,6 +287,25 @@ ALLOCATOR_RETURN_FNS = frozenset(
# creation call.
"XamNotifyCreateListener",
+ # Phase C+25: `MmGetPhysicalAddress` is a VA→PA translator whose
+ # return depends on which heap region the input VA lives in. This
+ # is the downstream consequence of C+2's deferred Path β (canary
+ # has three physical heaps at vA0/vC0/vE0 routed by page size,
+ # ours has a single unified heap_cursor starting at 0x40000000).
+ # Concretely: at C+25 idx 105,112 canary returned 0x150B0000
+ # (input 0xF50AF000 in `vE0000000` heap: addr - 0xE0000000 + 0x1000
+ # per `PhysicalHeap::GetPhysicalAddress`, see `memory.cc:2317`),
+ # while ours returned 0x0ADCF000 (input ~0x4ADCF000 in unified heap,
+ # masked via `& 0x1FFF_FFFF` per `exports.rs:985`). Both engines'
+ # translations are SELF-CONSISTENT — game code passes the PA
+ # opaquely to GPU (`VdInitializeRingBuffer` is the very next call)
+ # and the GPU translates it back to a host pointer using the same
+ # engine's heap map. Per-(tid,name) ordinal sentinel preserves the
+ # opaque-pass-through semantics while exposing actual divergences
+ # (e.g. game-side arithmetic on the PA, or a translation-count
+ # mismatch). Lifting the engine-side three-physical-heaps memory
+ # model is the C+2 Path β deferral, out of scope for C+25 (see
+ # `project_phase_c2_MmAllocatePhysicalMemoryEx_2026_05_13.md`).
+ "MmGetPhysicalAddress",
]
)
--- a/xenia-rs/tools/diff-events/test_diff_events.py
+++ b/xenia-rs/tools/diff-events/test_diff_events.py
@@ -686,6 +686,150 @@ def test_collect_shared_global_sids_single_tid_excluded() -> None:
+# === Phase C+25 — MmGetPhysicalAddress canonicalization ===
+# (4 new tests; see investigation.md for details)
+def test_mm_get_physical_address_in_allocator_set() -> None: ...
+def test_mm_get_physical_address_canonicalization() -> None: ...
+def test_mm_get_physical_address_cross_engine_alignment() -> None: ...
+def test_mm_get_physical_address_count_mismatch_still_diverges() -> None: ...
+
def main() -> int:
...
# Phase C+25
test_mm_get_physical_address_in_allocator_set()
test_mm_get_physical_address_canonicalization()
test_mm_get_physical_address_cross_engine_alignment()
test_mm_get_physical_address_count_mismatch_still_diverges()
Engine: UNTOUCHED. Python-only fix. Phase B image_canonical_sha256 ea8d160e…
UNCHANGED by definition (no Rust source modified). Build clean. Kernel tests
217 pass unchanged.

View File

@@ -0,0 +1,117 @@
# Phase C+25 — MmGetPhysicalAddress canonicalization
## Step 1 — Framing verification (per reading-error #28)
From `phase-w-wedge-reattack/diff-postfix.md` at `canary tid=6 → ours tid=1` idx 105,112:
```
canary: [105119] kernel.return MmGetPhysicalAddress return_value=353042432 status=0x150b0000
ours: [105112] kernel.return MmGetPhysicalAddress return_value=182251520 status=0x0adcf000
```
Decoded:
- canary 353042432 = `0x150B0000`. Per `xenia-canary/src/xenia/memory.cc:2317-2325`
(`PhysicalHeap::GetPhysicalAddress`): `address -= heap_base_; if (heap_base_ >=
0xE0000000) address += 0x1000;`. To produce `0x150B0000` from `vE0000000` (heap_base
`0xE0000000`): input VA `0xF50AF000``0xF50AF000 - 0xE0000000 + 0x1000 = 0x150B0000`. ✓
- ours `0x0ADCF000`. Per `exports.rs:985-988` (`mm_get_physical_address`):
`ctx.gpr[3] &= 0x1FFF_FFFF`. To produce `0x0ADCF000` from the unified heap region
`0x40000000+`: input VA `0x4ADCF000``0x4ADCF000 & 0x1FFF_FFFF = 0x0ADCF000`. ✓
Pre-context: identical sequence of `MmAllocatePhysicalMemoryEx` (canonicalized to
shared sentinel) → `MmGetPhysicalAddress`. Next event after divergence:
`VdInitializeRingBuffer` — the GPU consumes the PA opaquely.
Both engines' translations are SELF-CONSISTENT: within each engine, the same input
VA always maps to the same PA, and any subsequent GPU command pointing at that PA
gets read back from the same host backing store. The divergence at the diff layer
is a host-allocator-region symptom, not a semantic bug.
## Step 2 — Classification
Four candidates:
- **(A)** Per-call value bug. NO — both formulas are correct for their respective
heap layouts. Canary's `PhysicalHeap::GetPhysicalAddress` is the authoritative
implementation for the three-heap memory model; ours's `& 0x1FFF_FFFF` mask is
the documented equivalent for the unified heap (KRNBUG-Mm-04 noted at
`exports.rs:3771`).
- **(B)** Allocator-region routing bug. YES, but this is the C+2 Path β deferral —
ours has a single `KernelState::heap_alloc` cursor at `0x40000000`; canary has
three physical heaps at `vA0/vC0/vE0` routed by page size via
`LookupHeapByType`. Estimated >100 LOC and would change boot trajectory
unpredictably. **OUT OF SCOPE per Phase C+2 scope discipline.**
- **(C)** Canonicalization gap. YES — `MmGetPhysicalAddress` is a VA→PA translator
whose return is consumed opaquely by GPU/audio subsystems. The same per-(tid,name)
ordinal sentinel scheme that covers `MmAllocatePhysicalMemoryEx` (C+2) applies
here. Fix: extend `ALLOCATOR_RETURN_FNS`.
- **(D)** Upstream. NO — the predecessor `kernel.call MmGetPhysicalAddress`
matched cleanly on both engines.
**Selected: (C) — diff-tool canonicalization.**
## Step 3 — Fix
Extended `ALLOCATOR_RETURN_FNS` in `xenia-rs/tools/diff-events/diff_events.py`
with `"MmGetPhysicalAddress"` and a 20-line comment block explaining the
deferred-Path-β rationale. Zero engine LOC.
Per-(tid,name) ordinal sentinels (`<ALLOC_MmGetPhysicalAddress_N>`) reuse the
existing `canonicalize_allocator_returns` machinery. As long as both engines
call the translator the same number of times in the same per-tid order, the
ordinals line up. A translation-count mismatch correctly surfaces as a
divergence (ordinal drift → distinct sentinels at that position).
The `payload.status` field is auto-mirrored (existing behavior of the
canonicalizer, since trampoline doesn't distinguish NTSTATUS from pointer-typed
returns).
## Step 4 — Tests added
`test_diff_events.py` gains 4 unit tests (lines added at top of `main()`):
1. `test_mm_get_physical_address_in_allocator_set` — registry guard.
2. `test_mm_get_physical_address_canonicalization` — two-call per-tid ordinal.
3. `test_mm_get_physical_address_cross_engine_alignment` — end-to-end: the
exact C+25 divergence (`0x150B0000` vs `0x0ADCF000`) canonicalizes to the
same sentinel on both sides.
4. `test_mm_get_physical_address_count_mismatch_still_diverges` — ordinal-drift
negative test.
39 baseline tests + 4 new = 43 total, all PASS.
## Why no engine fix
Per `project_phase_c2_MmAllocatePhysicalMemoryEx_2026_05_13.md`'s "Future work:
β-class engine fix (deferred)" section:
> If a future Phase C+N session surfaces a divergence whose causal chain goes
> through region-arithmetic on a `MmAllocatePhysicalMemoryEx` return value
> (e.g. `MmGetPhysicalAddress` yielding bus-incompatible addresses for GPU
> command buffers), escalate to engine-side: add 3 physical heaps in
> `xenia-memory` / `KernelState`, route `MmAllocatePhysicalMemoryEx` through
> page-size lookup. Estimated 100-200 LOC + GPU/audio bridge re-validation;
> out of scope for single-session work.
This C+25 divergence IS the predicted scenario. The GPU is in-process here —
both engines independently consume the PA they themselves emitted, so the
opaque-pass-through invariant holds. The PA values diverge between engines
but neither is wrong in its own coordinate space.
Engine fix is deferred to a dedicated Path β session (estimated 100-200 LOC +
multi-subsystem re-validation across GPU command buffer mappings, XMA audio
context mapping via `MmMapIoSpace`, and any guest code paths doing PA
arithmetic). Tripstone #3 explicitly forbids in-session escalation here.
## Why progression metric is not expected to move
Phase W documented the wedge: tid=1 (main) joins on tid=13, tid=13 waits on
worker event `0x12d0` that never gets signaled. The wedge is upstream of any
GPU activity. Advancing matched-prefix past `MmGetPhysicalAddress` does NOT
exercise any new game-logic branch — it just allows the diff harness to
continue measuring beyond a previously-occluded translator-return divergence.
Per task spec: "If only the secondary metric moves and the primary remains
pinned (`swaps=1, draws=0`), document candidly: 'matched-prefix advanced but
no game progression — wedge persists per Phase W finding'." That's exactly
what happens here.

View File

@@ -0,0 +1,63 @@
# Phase C+25 — Re-validation
## Gates
| # | Gate | Status | Evidence |
|---|---|---|---|
| 1 | Build clean | ✅ | `cargo build --release` succeeds; only pre-existing dead-code warning. |
| 2 | Phase B `image_canonical_sha256` unchanged | ✅ | Zero engine LOC modified → Phase B hash is `ea8d160e…` by construction. |
| 3 | Engine determinism (3× cold) | ✅ | `c25-digest-rep{1,2,3}.json` all identical: instructions=50000007 imports=40390 unimpl=0 draws=0 swaps=1 unique_render_targets=0. |
| 4 | Main matched-prefix advances past 105,112 | ✅ | **105,112 → 105,128 (+16)** — see `diff-postfix.md`. |
| 5 | Sister chains preserved | ✅ | 4→11=11, 7→2=32, 12→7=4, 14→9=41, 15→10=16 — all unchanged vs Phase W. |
| 6 | Kernel tests pass | ✅ | xenia-kernel: 217 passed, 0 failed (unchanged baseline). |
| 7 | Diff-tool unit tests pass | ✅ | 39 baseline + 4 new C+25 tests = 43 PASS. |
| 8 | `--no-canonicalize-allocators` backward-compat | ✅ | Flag unchanged; raw-VA comparison still available. |
| 9 | Progression metric — **PRIMARY gate** | ⚠️ **NEGATIVE** | swaps=1, draws=0, unique_render_targets=0. **Wedge persists per Phase W finding.** |
## Per-chain matched-prefix delta
| chain | Phase W (pre) | C+25 (post) | Δ |
|---|---|---|---|
| canary tid=6 → ours tid=1 (main) | 105,112 | **105,128** | **+16** |
| canary tid=4 → ours tid=11 | 11 | 11 | 0 |
| canary tid=7 → ours tid=2 | 32 | 32 | 0 |
| canary tid=12 → ours tid=7 | 4 | 4 | 0 |
| canary tid=14 → ours tid=9 | 41 | 41 | 0 |
| canary tid=15 → ours tid=10 | 16 | 16 | 0 |
Zero regressions on any chain.
## Progression metric (PRIMARY)
3 cold reproducible runs:
| field | rep1 | rep2 | rep3 | vs Phase W |
|---|---|---|---|---|
| instructions | 50000007 | 50000007 | 50000007 | unchanged |
| imports | 40390 | 40390 | 40390 | unchanged |
| unimpl | 0 | 0 | 0 | unchanged |
| **draws** | **0** | **0** | **0** | **unchanged** |
| **swaps** | **1** | **1** | **1** | **unchanged** |
| **unique_render_targets** | **0** | **0** | **0** | **unchanged** |
| shader_blobs_live | 0 | 0 | 0 | unchanged |
| texture_cache_entries | 0 | 0 | 0 | unchanged |
**Verdict: matched-prefix advanced (+16) but the wedge is structurally
unchanged.** This is the expected outcome per cascade-prediction gate E (~10%
probability of movement). The C+25 fix is a diff-tool canonicalization, not
an engine behavior change; the engine's boot trajectory is byte-identical
to Phase W (digest fields prove it).
The next divergence at idx 105,128 (`thread.create.ctx_ptr`: canary
`0xbe56bb3c` in vC0… heap region vs ours `0x42453b3c` in unified heap) is
another deferred-Path-β symptom — a heap-VA leaking into a non-return
payload field. Canonicalization could be extended to cover specific payload
fields too, but that's a follow-up scope decision.
## Determinism note
The engine binary is `xrs-c25` (renamed from `xrs-verify-c23`; built from the
same source tree, which is unchanged by C+25 since the C+25 diff is
Python-only). All 3 cold runs used `XENIA_CACHE_WIPE=1` and produced
identical digests, confirming the engine's cold-boot trajectory is the same
as Phase W's `73e99d6002…`-class run.