handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
189
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/diff-report.md
Normal file
189
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/diff-report.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Phase A diff report
|
||||
|
||||
**This report is the output of Phase A's diff harness. Divergences
|
||||
shown here are INPUT for Phase B (first-divergence localization),
|
||||
not findings of Phase A.** Phase A's job is to make the harness
|
||||
itself correct, not to analyze what it surfaces.
|
||||
|
||||
## Summary
|
||||
|
||||
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|
||||
|---|---|---|---|---|---|
|
||||
| 4 | 11 | 5 | 47573 | 9 | 5 |
|
||||
| 6 | 1 | 102014 | 329948 | 108492 | 102014 |
|
||||
| 7 | 2 | 2 | 29 | 33 | 2 |
|
||||
| 12 | 7 | 2 | 6689 | 3 | 2 |
|
||||
| 14 | 9 | 11 | 1371603 | 75 | 11 |
|
||||
| 15 | 10 | 15 | 863209 | 15 | — |
|
||||
|
||||
## canary_tid=4 → ours_tid=11
|
||||
|
||||
First divergence at `tid_event_idx=5`: payload.return_value: canary=1 ours=0
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call RtlEnterCriticalSection
|
||||
ours: [0] import.call RtlEnterCriticalSection
|
||||
canary: [1] kernel.call RtlEnterCriticalSection
|
||||
ours: [1] kernel.call RtlEnterCriticalSection
|
||||
canary: [2] kernel.return RtlEnterCriticalSection
|
||||
ours: [2] kernel.return RtlEnterCriticalSection
|
||||
canary: [3] import.call KeSetEvent
|
||||
ours: [3] import.call KeSetEvent
|
||||
canary: [4] kernel.call KeSetEvent
|
||||
ours: [4] kernel.call KeSetEvent
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [5] kernel.return KeSetEvent
|
||||
ours: [5] kernel.return KeSetEvent
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [6] import.call KeWaitForMultipleObjects
|
||||
ours: [6] import.call KeWaitForMultipleObjects
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1080594600, "kind": "kernel.return", "payload": {"name": "KeSetEvent", "return_value": 1, "side_effects": [], "status": "0x00000001"}, "schema_version": 1, "tid": 4, "tid_event_idx": 5}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 33, "host_ns": 1688874821, "kind": "kernel.return", "payload": {"name": "KeSetEvent", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 11, "tid_event_idx": 5}
|
||||
```
|
||||
|
||||
## canary_tid=6 → ours_tid=1
|
||||
|
||||
First divergence at `tid_event_idx=102014`: payload.return_value: canary=805433576 ours=0
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [102009] import.call RtlLeaveCriticalSection
|
||||
ours: [102009] import.call RtlLeaveCriticalSection
|
||||
canary: [102010] kernel.call RtlLeaveCriticalSection
|
||||
ours: [102010] kernel.call RtlLeaveCriticalSection
|
||||
canary: [102011] kernel.return RtlLeaveCriticalSection
|
||||
ours: [102011] kernel.return RtlLeaveCriticalSection
|
||||
canary: [102012] import.call RtlImageXexHeaderField
|
||||
ours: [102012] import.call RtlImageXexHeaderField
|
||||
canary: [102013] kernel.call RtlImageXexHeaderField
|
||||
ours: [102013] kernel.call RtlImageXexHeaderField
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [102014] kernel.return RtlImageXexHeaderField
|
||||
ours: [102014] kernel.return RtlImageXexHeaderField
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [102015] import.call NtCreateFile
|
||||
ours: [102015] import.call NtCreateFile
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 713920300, "kind": "kernel.return", "payload": {"name": "RtlImageXexHeaderField", "return_value": 805433576, "side_effects": [], "status": "0x3001f0e8"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102014}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 5362980, "host_ns": 473782115, "kind": "kernel.return", "payload": {"name": "RtlImageXexHeaderField", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102014}
|
||||
```
|
||||
|
||||
## canary_tid=7 → ours_tid=2
|
||||
|
||||
First divergence at `tid_event_idx=2`: payload.return_value: canary=0 ours=1896873464
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call RtlInitAnsiString
|
||||
ours: [0] import.call RtlInitAnsiString
|
||||
canary: [1] kernel.call RtlInitAnsiString
|
||||
ours: [1] kernel.call RtlInitAnsiString
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [2] kernel.return RtlInitAnsiString
|
||||
ours: [2] kernel.return RtlInitAnsiString
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [3] import.call NtCreateFile
|
||||
ours: [3] import.call NtCreateFile
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 728945300, "kind": "kernel.return", "payload": {"name": "RtlInitAnsiString", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 7, "tid_event_idx": 2}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 2475, "host_ns": 474790156, "kind": "kernel.return", "payload": {"name": "RtlInitAnsiString", "return_value": 1896873464, "side_effects": [], "status": "0x710ffdf8"}, "schema_version": 1, "tid": 2, "tid_event_idx": 2}
|
||||
```
|
||||
|
||||
## canary_tid=12 → ours_tid=7
|
||||
|
||||
First divergence at `tid_event_idx=2`: payload.return_value: canary=258 ours=0
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [0] import.call KeWaitForSingleObject
|
||||
ours: [0] import.call KeWaitForSingleObject
|
||||
canary: [1] kernel.call KeWaitForSingleObject
|
||||
ours: [1] kernel.call KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [2] kernel.return KeWaitForSingleObject
|
||||
ours: [2] kernel.return KeWaitForSingleObject
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [3] import.call RtlEnterCriticalSection
|
||||
ours: <end of stream>
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 904485700, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 258, "side_effects": [], "status": "0x00000102"}, "schema_version": 1, "tid": 12, "tid_event_idx": 2}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 30, "host_ns": 502123296, "kind": "kernel.return", "payload": {"name": "KeWaitForSingleObject", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 7, "tid_event_idx": 2}
|
||||
```
|
||||
|
||||
## canary_tid=14 → ours_tid=9
|
||||
|
||||
First divergence at `tid_event_idx=11`: payload.return_value: canary=2 ours=0
|
||||
|
||||
**Pre-context (last 5 matching events):**
|
||||
```
|
||||
canary: [6] import.call KeAcquireSpinLockAtRaisedIrql
|
||||
ours: [6] import.call KeAcquireSpinLockAtRaisedIrql
|
||||
canary: [7] kernel.call KeAcquireSpinLockAtRaisedIrql
|
||||
ours: [7] kernel.call KeAcquireSpinLockAtRaisedIrql
|
||||
canary: [8] kernel.return KeAcquireSpinLockAtRaisedIrql
|
||||
ours: [8] kernel.return KeAcquireSpinLockAtRaisedIrql
|
||||
canary: [9] import.call KeRaiseIrqlToDpcLevel
|
||||
ours: [9] import.call KeRaiseIrqlToDpcLevel
|
||||
canary: [10] kernel.call KeRaiseIrqlToDpcLevel
|
||||
ours: [10] kernel.call KeRaiseIrqlToDpcLevel
|
||||
```
|
||||
|
||||
**Divergent event:**
|
||||
```
|
||||
canary: [11] kernel.return KeRaiseIrqlToDpcLevel
|
||||
ours: [11] kernel.return KeRaiseIrqlToDpcLevel
|
||||
```
|
||||
|
||||
**Next event after the divergence (if any):**
|
||||
```
|
||||
canary: [12] import.call KeRaiseIrqlToDpcLevel
|
||||
ours: [12] import.call KeRaiseIrqlToDpcLevel
|
||||
```
|
||||
|
||||
**Raw events (JSON):**
|
||||
```json
|
||||
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1081453000, "kind": "kernel.return", "payload": {"name": "KeRaiseIrqlToDpcLevel", "return_value": 2, "side_effects": [], "status": "0x00000002"}, "schema_version": 1, "tid": 14, "tid_event_idx": 11}
|
||||
{"deterministic": true, "engine": "ours", "guest_cycle": 77, "host_ns": 1688919712, "kind": "kernel.return", "payload": {"name": "KeRaiseIrqlToDpcLevel", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 9, "tid_event_idx": 11}
|
||||
```
|
||||
|
||||
## canary_tid=15 → ours_tid=10
|
||||
|
||||
No divergence within the 15 compared events (canary has 863209, ours has 15).
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000001,
|
||||
"imports": 40454,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000001,
|
||||
"imports": 40454,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,10 @@
|
||||
{
|
||||
"instructions": 50000001,
|
||||
"imports": 40454,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
"unique_render_targets": 0,
|
||||
"shader_blobs_live": 0,
|
||||
"texture_cache_entries": 0
|
||||
}
|
||||
@@ -0,0 +1,440 @@
|
||||
diff --git a/tools/diff-events/diff_events.py b/tools/diff-events/diff_events.py
|
||||
new file mode 100644
|
||||
index 0000000..ecc2c0b
|
||||
--- /dev/null
|
||||
+++ b/tools/diff-events/diff_events.py
|
||||
@@ -0,0 +1,434 @@
|
||||
+#!/usr/bin/env python3
|
||||
+"""Phase A event-log diff tool.
|
||||
+
|
||||
+Reads two schema-v1 JSONL event logs (one per engine) and reports the
|
||||
+first behavioral divergence per guest-thread. Aligns streams by
|
||||
+`tid_event_idx`. Field-comparison rules come straight from
|
||||
+`audit-runs/phase-a-diff-harness/schema-v1.md` — keep both in sync.
|
||||
+
|
||||
+Usage:
|
||||
+ diff_events.py --canary canary.jsonl --ours ours.jsonl [--out report.md]
|
||||
+ diff_events.py --canary a.jsonl --ours b.jsonl --validate-identical
|
||||
+ diff_events.py --canary a.jsonl --ours b.jsonl --tid-map 6=1,7=2
|
||||
+"""
|
||||
+
|
||||
+import argparse
|
||||
+import json
|
||||
+import sys
|
||||
+from pathlib import Path
|
||||
+
|
||||
+SCHEMA_VERSION = 1
|
||||
+
|
||||
+# Fields the diff tool skips (engine-local or host-clock).
|
||||
+SKIP_TOP_FIELDS = {"engine", "host_ns", "guest_cycle", "deterministic"}
|
||||
+# Within a payload: skipped fields by kind (in addition to the global set).
|
||||
+SKIP_PAYLOAD_FIELDS_BY_KIND = {
|
||||
+ # raw_handle_id is engine-local; the diff key is handle_semantic_id.
|
||||
+ "handle.create": {"raw_handle_id"},
|
||||
+ "handle.destroy": {"raw_handle_id"},
|
||||
+ # wait_duration_cycles is non-deterministic (host scheduling).
|
||||
+ "wait.end": {"wait_duration_cycles"},
|
||||
+}
|
||||
+
|
||||
+# Allocator-returning kernel exports whose `kernel.return.payload.return_value`
|
||||
+# is a host-allocator-dependent guest VA. Canary and ours legitimately route
|
||||
+# allocations to different heap regions (e.g. canary `MmAllocatePhysicalMemoryEx`
|
||||
+# returns `0xBC220000` from `vC0000000` while ours returns `0x40105000` from
|
||||
+# its single user-heap region — see AUDIT-043 "ε host-allocator address-space
|
||||
+# divergence" and Phase B `report.md` ε-class). Comparing raw VAs would always
|
||||
+# diverge at the first allocator call.
|
||||
+#
|
||||
+# Canonicalization: per `(tid, export_name)` we assign a stable ordinal
|
||||
+# (0, 1, 2, …) to each successive `kernel.return.return_value`, replacing
|
||||
+# both sides' value with the sentinel string `<ALLOC_<NAME>_<ORDINAL>>`
|
||||
+# before payload comparison. As long as both engines call the same
|
||||
+# allocator the same number of times in the same order on a given thread,
|
||||
+# the comparison treats them as equivalent.
|
||||
+#
|
||||
+# Limitations (documented):
|
||||
+# * If one engine calls an allocator more times than the other, ordinals
|
||||
+# drift and subsequent allocator returns appear as divergences. That's
|
||||
+# the correct outcome — ordinal-count mismatch IS a behavioral
|
||||
+# divergence.
|
||||
+# * `payload.status` is left untouched: it's a copy of the raw VA in
|
||||
+# hex-string form, useful in diff context.
|
||||
+# * Other payload fields that happen to embed an allocator VA (e.g. a
|
||||
+# future `args_resolved.base_address` in a free-call) are NOT
|
||||
+# canonicalized — out of scope for this divergence. Extend the set
|
||||
+# below as new divergence classes surface.
|
||||
+ALLOCATOR_RETURN_FNS = frozenset(
|
||||
+ [
|
||||
+ "MmAllocatePhysicalMemoryEx",
|
||||
+ "MmAllocatePhysicalMemory",
|
||||
+ "NtAllocateVirtualMemory",
|
||||
+ "RtlAllocateHeap",
|
||||
+ "MmCreateKernelStack",
|
||||
+ ]
|
||||
+)
|
||||
+
|
||||
+
|
||||
+def canonicalize_allocator_returns(events_by_tid: dict) -> None:
|
||||
+ """In-place: rewrite `payload.return_value` for every kernel.return whose
|
||||
+ `payload.name` is in ALLOCATOR_RETURN_FNS, replacing the raw VA with
|
||||
+ `<ALLOC_<NAME>_<ORDINAL>>`. Ordinals are per (tid, name) and assigned
|
||||
+ in event order.
|
||||
+
|
||||
+ Called on each engine's stream independently; because ordinals are
|
||||
+ assigned deterministically by per-tid call order, equivalent streams
|
||||
+ produce equivalent sentinels."""
|
||||
+ for tid, evs in events_by_tid.items():
|
||||
+ # name -> next ordinal to assign on this tid
|
||||
+ counters: dict[str, int] = {}
|
||||
+ for ev in evs:
|
||||
+ if ev.get("kind") != "kernel.return":
|
||||
+ continue
|
||||
+ payload = ev.get("payload") or {}
|
||||
+ name = payload.get("name")
|
||||
+ if name not in ALLOCATOR_RETURN_FNS:
|
||||
+ continue
|
||||
+ ordinal = counters.get(name, 0)
|
||||
+ counters[name] = ordinal + 1
|
||||
+ sentinel = f"<ALLOC_{name}_{ordinal}>"
|
||||
+ payload["return_value"] = sentinel
|
||||
+ # `payload.status` mirrors `return_value` as a hex string for
|
||||
+ # allocator entries (xboxkrnl trampoline doesn't distinguish
|
||||
+ # NTSTATUS from pointer-typed returns). Canonicalize together
|
||||
+ # so they stay in lockstep.
|
||||
+ if "status" in payload:
|
||||
+ payload["status"] = sentinel
|
||||
+
|
||||
+
|
||||
+def load_events(path: Path) -> dict:
|
||||
+ """Return {tid: [event, ...]} keyed by tid, ordered by tid_event_idx.
|
||||
+
|
||||
+ Validates the schema header (first line must be schema_version=1).
|
||||
+ """
|
||||
+ events_by_tid: dict[int, list[dict]] = {}
|
||||
+ with path.open("r", encoding="utf-8") as f:
|
||||
+ first = f.readline()
|
||||
+ if not first:
|
||||
+ raise SystemExit(f"{path}: empty file")
|
||||
+ hdr = json.loads(first)
|
||||
+ if hdr.get("kind") != "schema_version":
|
||||
+ raise SystemExit(
|
||||
+ f"{path}: first event is not schema_version (got {hdr.get('kind')!r})"
|
||||
+ )
|
||||
+ if hdr.get("schema_version") != SCHEMA_VERSION:
|
||||
+ raise SystemExit(
|
||||
+ f"{path}: schema_version mismatch (expected {SCHEMA_VERSION}, got {hdr.get('schema_version')!r})"
|
||||
+ )
|
||||
+ for lineno, line in enumerate(f, start=2):
|
||||
+ line = line.rstrip("\n")
|
||||
+ if not line:
|
||||
+ continue
|
||||
+ try:
|
||||
+ ev = json.loads(line)
|
||||
+ except json.JSONDecodeError as e:
|
||||
+ raise SystemExit(f"{path}:{lineno}: invalid JSON ({e})")
|
||||
+ tid = ev.get("tid")
|
||||
+ if tid is None:
|
||||
+ raise SystemExit(f"{path}:{lineno}: missing tid")
|
||||
+ events_by_tid.setdefault(tid, []).append(ev)
|
||||
+ # Ensure each per-tid list is already monotonic by tid_event_idx.
|
||||
+ for tid, evs in events_by_tid.items():
|
||||
+ for i, ev in enumerate(evs):
|
||||
+ if ev.get("tid_event_idx") != i:
|
||||
+ # Note: the schema permits one engine to emit fewer events; we
|
||||
+ # only validate the in-file ordering is strictly monotonic.
|
||||
+ if i > 0 and ev["tid_event_idx"] <= evs[i - 1]["tid_event_idx"]:
|
||||
+ raise SystemExit(
|
||||
+ f"{path}: tid={tid} events out of order at index {i}"
|
||||
+ )
|
||||
+ return events_by_tid
|
||||
+
|
||||
+
|
||||
+def auto_tid_map(canary_evs: dict, ours_evs: dict) -> dict[int, int]:
|
||||
+ """Naive tid mapping: pair canary tids with ours tids by the first
|
||||
+ kernel.call name in each stream. Documented limitation in README."""
|
||||
+ def first_call_name(evs: list[dict]) -> str | None:
|
||||
+ for ev in evs:
|
||||
+ if ev.get("kind") == "kernel.call":
|
||||
+ return ev["payload"].get("name")
|
||||
+ return None
|
||||
+
|
||||
+ canary_by_first = {}
|
||||
+ for tid, evs in canary_evs.items():
|
||||
+ name = first_call_name(evs)
|
||||
+ if name is not None:
|
||||
+ canary_by_first.setdefault(name, []).append(tid)
|
||||
+
|
||||
+ ours_by_first = {}
|
||||
+ for tid, evs in ours_evs.items():
|
||||
+ name = first_call_name(evs)
|
||||
+ if name is not None:
|
||||
+ ours_by_first.setdefault(name, []).append(tid)
|
||||
+
|
||||
+ mapping: dict[int, int] = {}
|
||||
+ for name, c_tids in canary_by_first.items():
|
||||
+ o_tids = ours_by_first.get(name, [])
|
||||
+ for c, o in zip(sorted(c_tids), sorted(o_tids)):
|
||||
+ mapping[c] = o
|
||||
+ return mapping
|
||||
+
|
||||
+
|
||||
+def parse_tid_map_arg(s: str) -> dict[int, int]:
|
||||
+ """Parse `--tid-map 6=1,7=2` into {6: 1, 7: 2}."""
|
||||
+ out: dict[int, int] = {}
|
||||
+ for token in s.split(","):
|
||||
+ token = token.strip()
|
||||
+ if not token:
|
||||
+ continue
|
||||
+ if "=" not in token:
|
||||
+ raise SystemExit(f"--tid-map: bad token {token!r} (expected canary=ours)")
|
||||
+ a, b = token.split("=", 1)
|
||||
+ out[int(a.strip(), 0)] = int(b.strip(), 0)
|
||||
+ return out
|
||||
+
|
||||
+
|
||||
+def compare_payload(kind: str, p_canary: dict, p_ours: dict) -> str | None:
|
||||
+ """Compare two payloads. Returns None if equivalent, else a short
|
||||
+ human-readable description of the first differing field."""
|
||||
+ skip = SKIP_PAYLOAD_FIELDS_BY_KIND.get(kind, set())
|
||||
+ # Compare the union of keys excluding skipped ones, in canary's key order
|
||||
+ # first (stable), then any ours-only fields.
|
||||
+ keys_seen: set[str] = set()
|
||||
+ for k in p_canary.keys():
|
||||
+ if k in skip:
|
||||
+ continue
|
||||
+ keys_seen.add(k)
|
||||
+ vc = p_canary.get(k)
|
||||
+ vo = p_ours.get(k)
|
||||
+ if vc != vo:
|
||||
+ return f"payload.{k}: canary={vc!r} ours={vo!r}"
|
||||
+ for k in p_ours.keys():
|
||||
+ if k in skip or k in keys_seen:
|
||||
+ continue
|
||||
+ if p_ours[k] is not None:
|
||||
+ return f"payload.{k}: canary=<missing> ours={p_ours[k]!r}"
|
||||
+ return None
|
||||
+
|
||||
+
|
||||
+def compare_event(ev_canary: dict, ev_ours: dict) -> str | None:
|
||||
+ """Compare two events. Returns None if equivalent, else a short description."""
|
||||
+ # Top-level comparison: kind must match.
|
||||
+ if ev_canary.get("kind") != ev_ours.get("kind"):
|
||||
+ return f"kind: canary={ev_canary.get('kind')!r} ours={ev_ours.get('kind')!r}"
|
||||
+ # tid_event_idx must match (it's our diff key).
|
||||
+ if ev_canary.get("tid_event_idx") != ev_ours.get("tid_event_idx"):
|
||||
+ return (
|
||||
+ f"tid_event_idx: canary={ev_canary.get('tid_event_idx')!r} "
|
||||
+ f"ours={ev_ours.get('tid_event_idx')!r}"
|
||||
+ )
|
||||
+ # Payload comparison.
|
||||
+ pc = ev_canary.get("payload", {})
|
||||
+ po = ev_ours.get("payload", {})
|
||||
+ diff = compare_payload(ev_canary["kind"], pc, po)
|
||||
+ if diff:
|
||||
+ return diff
|
||||
+ return None
|
||||
+
|
||||
+
|
||||
+def render_event(ev: dict) -> str:
|
||||
+ """One-line summary of an event for the diff report."""
|
||||
+ kind = ev.get("kind", "?")
|
||||
+ idx = ev.get("tid_event_idx", "?")
|
||||
+ payload = ev.get("payload", {})
|
||||
+ if kind in ("kernel.call", "kernel.return", "import.call"):
|
||||
+ name = payload.get("name") or payload.get("ord")
|
||||
+ return f"[{idx}] {kind} {name}"
|
||||
+ if kind in ("handle.create", "handle.destroy"):
|
||||
+ sid = payload.get("handle_semantic_id", "?")
|
||||
+ return f"[{idx}] {kind} sid={sid}"
|
||||
+ if kind in ("thread.create", "thread.exit"):
|
||||
+ return f"[{idx}] {kind} {payload}"
|
||||
+ if kind in ("wait.begin", "wait.end"):
|
||||
+ return f"[{idx}] {kind} {payload}"
|
||||
+ return f"[{idx}] {kind} {payload}"
|
||||
+
|
||||
+
|
||||
+def diff_one_tid(
|
||||
+ canary_evs: list[dict], ours_evs: list[dict], canary_tid: int, ours_tid: int
|
||||
+) -> dict:
|
||||
+ """Walk one mapped pair. Stop at the first divergence."""
|
||||
+ matched = 0
|
||||
+ n = min(len(canary_evs), len(ours_evs))
|
||||
+ pre_context: list[tuple[dict, dict]] = []
|
||||
+ diverged_at: int | None = None
|
||||
+ diff_descr: str | None = None
|
||||
+ for i in range(n):
|
||||
+ ec = canary_evs[i]
|
||||
+ eo = ours_evs[i]
|
||||
+ d = compare_event(ec, eo)
|
||||
+ if d is None:
|
||||
+ matched += 1
|
||||
+ pre_context.append((ec, eo))
|
||||
+ if len(pre_context) > 5:
|
||||
+ pre_context.pop(0)
|
||||
+ continue
|
||||
+ diverged_at = i
|
||||
+ diff_descr = d
|
||||
+ break
|
||||
+ return {
|
||||
+ "canary_tid": canary_tid,
|
||||
+ "ours_tid": ours_tid,
|
||||
+ "matched": matched,
|
||||
+ "canary_total": len(canary_evs),
|
||||
+ "ours_total": len(ours_evs),
|
||||
+ "diverged_at": diverged_at,
|
||||
+ "diff_descr": diff_descr,
|
||||
+ "pre_context": pre_context,
|
||||
+ "post_canary": canary_evs[diverged_at] if diverged_at is not None else None,
|
||||
+ "post_ours": ours_evs[diverged_at] if diverged_at is not None else None,
|
||||
+ "next_canary": (
|
||||
+ canary_evs[diverged_at + 1]
|
||||
+ if diverged_at is not None and diverged_at + 1 < len(canary_evs)
|
||||
+ else None
|
||||
+ ),
|
||||
+ "next_ours": (
|
||||
+ ours_evs[diverged_at + 1]
|
||||
+ if diverged_at is not None and diverged_at + 1 < len(ours_evs)
|
||||
+ else None
|
||||
+ ),
|
||||
+ }
|
||||
+
|
||||
+
|
||||
+def render_report(per_tid_results: list[dict]) -> str:
|
||||
+ out: list[str] = []
|
||||
+ out.append("# Phase A diff report")
|
||||
+ out.append("")
|
||||
+ out.append("**This report is the output of Phase A's diff harness. Divergences")
|
||||
+ out.append("shown here are INPUT for Phase B (first-divergence localization),")
|
||||
+ out.append("not findings of Phase A.** Phase A's job is to make the harness")
|
||||
+ out.append("itself correct, not to analyze what it surfaces.")
|
||||
+ out.append("")
|
||||
+ out.append("## Summary")
|
||||
+ out.append("")
|
||||
+ out.append("| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |")
|
||||
+ out.append("|---|---|---|---|---|---|")
|
||||
+ for r in per_tid_results:
|
||||
+ div = r["diverged_at"] if r["diverged_at"] is not None else "—"
|
||||
+ out.append(
|
||||
+ f"| {r['canary_tid']} | {r['ours_tid']} | {r['matched']} | "
|
||||
+ f"{r['canary_total']} | {r['ours_total']} | {div} |"
|
||||
+ )
|
||||
+ out.append("")
|
||||
+ for r in per_tid_results:
|
||||
+ out.append(f"## canary_tid={r['canary_tid']} → ours_tid={r['ours_tid']}")
|
||||
+ out.append("")
|
||||
+ if r["diverged_at"] is None:
|
||||
+ out.append(
|
||||
+ f"No divergence within the {r['matched']} compared events "
|
||||
+ f"(canary has {r['canary_total']}, ours has {r['ours_total']})."
|
||||
+ )
|
||||
+ out.append("")
|
||||
+ continue
|
||||
+ out.append(f"First divergence at `tid_event_idx={r['diverged_at']}`: {r['diff_descr']}")
|
||||
+ out.append("")
|
||||
+ out.append("**Pre-context (last 5 matching events):**")
|
||||
+ out.append("```")
|
||||
+ for ec, eo in r["pre_context"]:
|
||||
+ out.append(f" canary: {render_event(ec)}")
|
||||
+ out.append(f" ours: {render_event(eo)}")
|
||||
+ out.append("```")
|
||||
+ out.append("")
|
||||
+ out.append("**Divergent event:**")
|
||||
+ out.append("```")
|
||||
+ out.append(f" canary: {render_event(r['post_canary'])}")
|
||||
+ out.append(f" ours: {render_event(r['post_ours'])}")
|
||||
+ out.append("```")
|
||||
+ out.append("")
|
||||
+ out.append("**Next event after the divergence (if any):**")
|
||||
+ out.append("```")
|
||||
+ if r["next_canary"]:
|
||||
+ out.append(f" canary: {render_event(r['next_canary'])}")
|
||||
+ else:
|
||||
+ out.append(" canary: <end of stream>")
|
||||
+ if r["next_ours"]:
|
||||
+ out.append(f" ours: {render_event(r['next_ours'])}")
|
||||
+ else:
|
||||
+ out.append(" ours: <end of stream>")
|
||||
+ out.append("```")
|
||||
+ out.append("")
|
||||
+ out.append("**Raw events (JSON):**")
|
||||
+ out.append("```json")
|
||||
+ out.append(json.dumps(r["post_canary"], sort_keys=True))
|
||||
+ out.append(json.dumps(r["post_ours"], sort_keys=True))
|
||||
+ out.append("```")
|
||||
+ out.append("")
|
||||
+ return "\n".join(out)
|
||||
+
|
||||
+
|
||||
+def main() -> int:
|
||||
+ ap = argparse.ArgumentParser(description="Phase A event-log diff tool")
|
||||
+ ap.add_argument("--canary", required=True, type=Path)
|
||||
+ ap.add_argument("--ours", required=True, type=Path)
|
||||
+ ap.add_argument("--out", type=Path, help="Write markdown report here (else stdout)")
|
||||
+ ap.add_argument(
|
||||
+ "--tid-map",
|
||||
+ type=str,
|
||||
+ help="Manual tid mapping like '6=1,7=2'. Overrides auto-mapping.",
|
||||
+ )
|
||||
+ ap.add_argument(
|
||||
+ "--validate-identical",
|
||||
+ action="store_true",
|
||||
+ help="Exit non-zero if any mapped tid pair has any divergence. "
|
||||
+ "Used by gate-4 negative-test and by self-diff smoke tests.",
|
||||
+ )
|
||||
+ ap.add_argument(
|
||||
+ "--no-canonicalize-allocators",
|
||||
+ action="store_true",
|
||||
+ help="Disable per-tid ordinal canonicalization of allocator return "
|
||||
+ "values (default: enabled). See ALLOCATOR_RETURN_FNS for the "
|
||||
+ "covered set. Disabling reproduces the raw-VA comparison.",
|
||||
+ )
|
||||
+ args = ap.parse_args()
|
||||
+
|
||||
+ canary_evs = load_events(args.canary)
|
||||
+ ours_evs = load_events(args.ours)
|
||||
+
|
||||
+ if not args.no_canonicalize_allocators:
|
||||
+ canonicalize_allocator_returns(canary_evs)
|
||||
+ canonicalize_allocator_returns(ours_evs)
|
||||
+
|
||||
+ if args.tid_map:
|
||||
+ tid_map = parse_tid_map_arg(args.tid_map)
|
||||
+ else:
|
||||
+ tid_map = auto_tid_map(canary_evs, ours_evs)
|
||||
+
|
||||
+ if not tid_map:
|
||||
+ sys.stderr.write(
|
||||
+ "no tid mapping (auto-mapping found no shared first-kernel-call). "
|
||||
+ "Pass --tid-map manually.\n"
|
||||
+ )
|
||||
+ return 2
|
||||
+
|
||||
+ per_tid: list[dict] = []
|
||||
+ for c_tid, o_tid in sorted(tid_map.items()):
|
||||
+ if c_tid not in canary_evs:
|
||||
+ sys.stderr.write(f"warn: canary tid {c_tid} not in stream; skipping\n")
|
||||
+ continue
|
||||
+ if o_tid not in ours_evs:
|
||||
+ sys.stderr.write(f"warn: ours tid {o_tid} not in stream; skipping\n")
|
||||
+ continue
|
||||
+ per_tid.append(diff_one_tid(canary_evs[c_tid], ours_evs[o_tid], c_tid, o_tid))
|
||||
+
|
||||
+ report = render_report(per_tid)
|
||||
+ if args.out:
|
||||
+ args.out.write_text(report, encoding="utf-8")
|
||||
+ sys.stderr.write(f"diff report written to {args.out}\n")
|
||||
+ else:
|
||||
+ sys.stdout.write(report)
|
||||
+
|
||||
+ if args.validate_identical:
|
||||
+ for r in per_tid:
|
||||
+ if r["diverged_at"] is not None:
|
||||
+ sys.stderr.write(
|
||||
+ f"validate-identical: divergence in canary_tid={r['canary_tid']} "
|
||||
+ f"at tid_event_idx={r['diverged_at']} ({r['diff_descr']})\n"
|
||||
+ )
|
||||
+ return 1
|
||||
+ return 0
|
||||
+
|
||||
+
|
||||
+if __name__ == "__main__":
|
||||
+ sys.exit(main())
|
||||
96
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/fix.diff
Normal file
96
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/fix.diff
Normal file
@@ -0,0 +1,96 @@
|
||||
Phase C+2 — Additive canonicalization in tools/diff-events/diff_events.py.
|
||||
This file is untracked in git (added during the Phase A harness session
|
||||
2026-05-13 and never committed). Below is the additive delta applied this
|
||||
session (Path α — diff-tool canonicalization of allocator returns).
|
||||
|
||||
--- a/tools/diff-events/diff_events.py (pre-C+2)
|
||||
+++ b/tools/diff-events/diff_events.py (post-C+2)
|
||||
@@ Module-level (additive, after SKIP_PAYLOAD_FIELDS_BY_KIND) @@
|
||||
|
||||
+# Allocator-returning kernel exports whose `kernel.return.payload.return_value`
|
||||
+# is a host-allocator-dependent guest VA. Canary and ours legitimately route
|
||||
+# allocations to different heap regions (e.g. canary `MmAllocatePhysicalMemoryEx`
|
||||
+# returns `0xBC220000` from `vC0000000` while ours returns `0x40105000` from
|
||||
+# its single user-heap region — see AUDIT-043 "ε host-allocator address-space
|
||||
+# divergence" and Phase B `report.md` ε-class). Comparing raw VAs would always
|
||||
+# diverge at the first allocator call.
|
||||
+#
|
||||
+# Canonicalization: per `(tid, export_name)` we assign a stable ordinal
|
||||
+# (0, 1, 2, …) to each successive `kernel.return.return_value`, replacing
|
||||
+# both sides' value with the sentinel string `<ALLOC_<NAME>_<ORDINAL>>`
|
||||
+# before payload comparison. As long as both engines call the same
|
||||
+# allocator the same number of times in the same order on a given thread,
|
||||
+# the comparison treats them as equivalent.
|
||||
+#
|
||||
+# Limitations (documented):
|
||||
+# * If one engine calls an allocator more times than the other, ordinals
|
||||
+# drift and subsequent allocator returns appear as divergences. That's
|
||||
+# the correct outcome — ordinal-count mismatch IS a behavioral
|
||||
+# divergence.
|
||||
+# * `payload.status` is left untouched: it's a copy of the raw VA in
|
||||
+# hex-string form, useful in diff context.
|
||||
+# * Other payload fields that happen to embed an allocator VA (e.g. a
|
||||
+# future `args_resolved.base_address` in a free-call) are NOT
|
||||
+# canonicalized — out of scope for this divergence. Extend the set
|
||||
+# below as new divergence classes surface.
|
||||
+ALLOCATOR_RETURN_FNS = frozenset(
|
||||
+ [
|
||||
+ "MmAllocatePhysicalMemoryEx",
|
||||
+ "MmAllocatePhysicalMemory",
|
||||
+ "NtAllocateVirtualMemory",
|
||||
+ "RtlAllocateHeap",
|
||||
+ "MmCreateKernelStack",
|
||||
+ ]
|
||||
+)
|
||||
+
|
||||
+
|
||||
+def canonicalize_allocator_returns(events_by_tid: dict) -> None:
|
||||
+ """In-place: rewrite `payload.return_value` for every kernel.return whose
|
||||
+ `payload.name` is in ALLOCATOR_RETURN_FNS, replacing the raw VA with
|
||||
+ `<ALLOC_<NAME>_<ORDINAL>>`. Ordinals are per (tid, name) and assigned
|
||||
+ in event order.
|
||||
+
|
||||
+ Called on each engine's stream independently; because ordinals are
|
||||
+ assigned deterministically by per-tid call order, equivalent streams
|
||||
+ produce equivalent sentinels."""
|
||||
+ for tid, evs in events_by_tid.items():
|
||||
+ # name -> next ordinal to assign on this tid
|
||||
+ counters: dict[str, int] = {}
|
||||
+ for ev in evs:
|
||||
+ if ev.get("kind") != "kernel.return":
|
||||
+ continue
|
||||
+ payload = ev.get("payload") or {}
|
||||
+ name = payload.get("name")
|
||||
+ if name not in ALLOCATOR_RETURN_FNS:
|
||||
+ continue
|
||||
+ ordinal = counters.get(name, 0)
|
||||
+ counters[name] = ordinal + 1
|
||||
+ sentinel = f"<ALLOC_{name}_{ordinal}>"
|
||||
+ payload["return_value"] = sentinel
|
||||
+ # `payload.status` mirrors `return_value` as a hex string for
|
||||
+ # allocator entries (xboxkrnl trampoline doesn't distinguish
|
||||
+ # NTSTATUS from pointer-typed returns). Canonicalize together
|
||||
+ # so they stay in lockstep.
|
||||
+ if "status" in payload:
|
||||
+ payload["status"] = sentinel
|
||||
+
|
||||
|
||||
@@ main() arg parsing (after --validate-identical) @@
|
||||
|
||||
+ ap.add_argument(
|
||||
+ "--no-canonicalize-allocators",
|
||||
+ action="store_true",
|
||||
+ help="Disable per-tid ordinal canonicalization of allocator return "
|
||||
+ "values (default: enabled). See ALLOCATOR_RETURN_FNS for the "
|
||||
+ "covered set. Disabling reproduces the raw-VA comparison.",
|
||||
+ )
|
||||
|
||||
@@ main() body (after load_events) @@
|
||||
|
||||
+ if not args.no_canonicalize_allocators:
|
||||
+ canonicalize_allocator_returns(canary_evs)
|
||||
+ canonicalize_allocator_returns(ours_evs)
|
||||
|
||||
End of patch. Net additive surface: ~70 LOC. Existing diff behavior preserved
|
||||
via `--no-canonicalize-allocators` flag (verified to reproduce the baseline
|
||||
161-match summary byte-identically — see re-validation.md gate 6).
|
||||
189
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/investigation.md
Normal file
189
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/investigation.md
Normal file
@@ -0,0 +1,189 @@
|
||||
# Phase C+2 — investigation: `MmAllocatePhysicalMemoryEx` at idx=161
|
||||
|
||||
## Divergence
|
||||
|
||||
| | canary | ours |
|
||||
|---|---|---|
|
||||
| `payload.return_value` (idx=161) | `18446744072570929152` = sign-ext `0xFFFFFFFF_BC220000` | `1074810880` = `0x40105000` |
|
||||
| `payload.status` | `0xbc220000` | `0x40105000` |
|
||||
| Memory region | physical heap `vC0000000` (range `0xC0000000`, size `0x20000000`, 16MB pages) | user heap (single bump region `0x40000000`–`0x6FFFFFFF`) |
|
||||
|
||||
## Step 1 — Locate `MmAllocatePhysicalMemoryEx` in both engines
|
||||
|
||||
### Canary
|
||||
|
||||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_memory.cc:415-503`
|
||||
|
||||
```c
|
||||
uint32_t xeMmAllocatePhysicalMemoryEx(uint32_t flags, uint32_t region_size,
|
||||
uint32_t protect_bits,
|
||||
uint32_t min_addr_range,
|
||||
uint32_t max_addr_range,
|
||||
uint32_t alignment) {
|
||||
...
|
||||
// page_size = 4096 | 64KB | 16MB based on X_MEM_LARGE_PAGES / X_MEM_16MB_PAGES
|
||||
...
|
||||
auto heap = static_cast<PhysicalHeap*>(
|
||||
kernel_memory()->LookupHeapByType(true, page_size));
|
||||
...
|
||||
heap->AllocRange(heap_min_addr, heap_max_addr, adjusted_size,
|
||||
adjusted_alignment, allocation_type, protect, top_down,
|
||||
&base_address);
|
||||
return base_address;
|
||||
}
|
||||
```
|
||||
|
||||
`LookupHeapByType(physical=true, page_size)` returns one of three physical
|
||||
heaps based on page_size (`xenia-canary/src/xenia/memory.cc:467-475`):
|
||||
|
||||
* `page_size ≤ 4096` → `vE0000000` (base `0xE0000000`, size `0x1FD00000`, 4KB pages)
|
||||
* `page_size ≤ 64*1024` → `vA0000000` (base `0xA0000000`, size `0x20000000`, 64KB pages)
|
||||
* else (i.e. 16MB) → `vC0000000` (base `0xC0000000`, size `0x20000000`, 16MB pages)
|
||||
|
||||
Canary returned `0xBC220000` (just below `0xC0000000` because `top_down=true`),
|
||||
so the request used `X_MEM_16MB_PAGES`.
|
||||
|
||||
### Ours
|
||||
|
||||
`xenia-rs/crates/xenia-kernel/src/exports.rs:650-682`:
|
||||
|
||||
```rust
|
||||
fn mm_allocate_physical_memory_ex(ctx, mem, state) {
|
||||
let flags = ctx.gpr[3] as u32;
|
||||
let size = ctx.gpr[4] as u32;
|
||||
if size == 0 { ctx.gpr[3] = 0; return; }
|
||||
match state.heap_alloc(size, mem) {
|
||||
Some(addr) => ctx.gpr[3] = addr as u64,
|
||||
None => ctx.gpr[3] = 0,
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Routes to `KernelState::heap_alloc` (`state.rs:956-974`):
|
||||
|
||||
```rust
|
||||
pub fn heap_alloc(&mut self, size: u32, mem) -> Option<u32> {
|
||||
let aligned_size = (size + 0xFFF) & !0xFFF;
|
||||
let base = self.heap_cursor.fetch_add(aligned_size, ...); // starts at 0x40000000
|
||||
if new_top > 0x6FFF_FFFF { return None; }
|
||||
mem.alloc(base, aligned_size, RW)?;
|
||||
Some(base)
|
||||
}
|
||||
```
|
||||
|
||||
`heap_cursor` initialized to `0x40000000` (`state.rs:325`). At idx=161 the
|
||||
cursor was advanced to `0x40105000` after ~16 prior 64KB-aligned allocations.
|
||||
|
||||
## Step 2 — Map both engines' memory layouts
|
||||
|
||||
### Canary (`xenia-canary/src/xenia/memory.h:598-608`, `memory.cc:215-242`)
|
||||
|
||||
| region | base | size | page | type | purpose |
|
||||
|---|---|---|---|---|---|
|
||||
| `v00000000` | `0x00000000` | `0x40000000` | 4KB | virtual | low system / zero page protected |
|
||||
| `v40000000` | `0x40000000` | `0x3F000000` | 64KB | virtual | user-virtual `NtAllocateVirtualMemory` |
|
||||
| `v80000000` | `0x80000000` | `0x10000000` | 64KB | XEX image | code+data |
|
||||
| `v90000000` | `0x90000000` | `0x10000000` | 4KB | XEX image | (alt) |
|
||||
| `physical` | `0x00000000` | `0x20000000` | 4KB | physical-bus | bus-address space |
|
||||
| `vA0000000` | `0xA0000000` | `0x20000000` | 64KB | **physical** | 64KB-page physical alloc |
|
||||
| `vC0000000` | `0xC0000000` | `0x20000000` | 16MB | **physical** | 16MB-page physical alloc |
|
||||
| `vE0000000` | `0xE0000000` | `0x1FD00000` | 4KB | **physical** | 4KB-page physical alloc |
|
||||
|
||||
`MmAllocatePhysicalMemoryEx` → one of the three physical heaps based on page size.
|
||||
`NtAllocateVirtualMemory` → one of the two virtual heaps based on page size.
|
||||
|
||||
### Ours (`xenia-rs/crates/xenia-kernel/src/state.rs:325-326`, `:956-985`)
|
||||
|
||||
| region | base | size | purpose |
|
||||
|---|---|---|---|
|
||||
| `heap_cursor` | `0x40000000` | up to `0x6FFFFFFF` | unified bump-alloc for ALL kernel allocs |
|
||||
| `stack_cursor` | `0x71000000` | ascending | stack pages |
|
||||
|
||||
Ours has a **single** unified user-heap-style bump region. There is **no
|
||||
distinct physical-memory region**. Both `MmAllocatePhysicalMemoryEx` and
|
||||
`NtAllocateVirtualMemory` route through `heap_alloc`. The host-side page
|
||||
table (`xenia-memory` / `heap.rs`) does have `HeapType::GuestPhysical`
|
||||
defined but the `KernelState` allocator only uses one cursor.
|
||||
|
||||
## Step 3 — (α) vs (β) classification
|
||||
|
||||
The fundamental question: is ours's memory layout a deliberate simplification
|
||||
that is later canonicalizable, OR is it a memory-model bug?
|
||||
|
||||
**Evidence for (β) — wrong region**:
|
||||
* Xbox 360 architecturally distinguishes physical-VA regions
|
||||
(`0xA0000000`+, `0xC0000000`+, `0xE0000000`+) from virtual-VA regions
|
||||
(`0x40000000`+). Game code that uses `MmGetPhysicalAddress` masks
|
||||
`& 0x1FFF_FFFF` (Xbox 360 has 512MB physical bus). Different *guest*
|
||||
VAs in different *regions* therefore map to different *physical*
|
||||
addresses, which GPU command buffers consume directly.
|
||||
* `MmGetPhysicalAddress(0xBC220000) & 0x1FFFFFFF = 0x1C220000`
|
||||
* `MmGetPhysicalAddress(0x40105000) & 0x1FFFFFFF = 0x00105000`
|
||||
* These are different bus addresses. If the game stores the VA in a
|
||||
command-buffer descriptor consumed by ours's GPU, the GPU will read
|
||||
different memory than the canary's GPU would.
|
||||
|
||||
**Evidence for (α) — same memory model, host-VA drift only**:
|
||||
* AUDIT-043 (2026-05-09) established that within a single region (canary's
|
||||
pool at `0xBC32C880` vs ours's pool at `0x40541xxx`), *same logical
|
||||
allocation* maps to *different guest VAs*. The "same VA backs different
|
||||
data" tripstone is universal — true within a region, true across
|
||||
regions. From the diff tool's perspective, both are "host-allocator
|
||||
divergence ε".
|
||||
* Phase B's `report.md` explicitly classifies ε as "catalog only".
|
||||
* Even if (β) is the real issue, fixing it in ours requires:
|
||||
- Adding physical-heap regions to `xenia-memory` / `KernelState`.
|
||||
- Wiring `MmAllocatePhysicalMemoryEx` to route by page size.
|
||||
- Re-validating all downstream code (GPU command buffer, kernel
|
||||
objects, audio mixer buffers, etc.) that touched the unified heap.
|
||||
- Likely > 100 LOC and changes ours's boot trajectory unpredictably.
|
||||
|
||||
**Decision: this session lands Path α (diff-tool canonicalization)**.
|
||||
Rationale:
|
||||
|
||||
1. The task brief explicitly authorizes Path α for ε-class divergences:
|
||||
"either (a) canonicalize the comparison (mask out heap-address fields,
|
||||
similar to image canonicalization for import slots), or (b) align
|
||||
ours's allocator region with canary's. AUDIT-043 already noted this
|
||||
is fundamental for emulator pool allocators; class ε is structural."
|
||||
2. Per "if it requires more than ~100 LOC or touches the core memory
|
||||
model significantly, STOP and report" — Path β is plausibly that
|
||||
scope. The honest move is to land the canonicalization (which extends
|
||||
the matched prefix substantially, see re-validation.md) and leave a
|
||||
clear marker that Path β is the deeper architectural cleanup, to be
|
||||
scoped as its own multi-session effort.
|
||||
3. Path α is **falsifiable**: if downstream divergences at idx 102014+
|
||||
surface evidence that the unified-region routing actually broke game
|
||||
logic (e.g. GPU command-buffer corruption, MmGetPhysicalAddress
|
||||
mismatch in payload data), that's prima facie reason to escalate to
|
||||
Path β. This session creates the conditions for that observation;
|
||||
it does not pre-commit to a model rewrite.
|
||||
|
||||
**Mixed-case acknowledgement**: ours's `MmAllocatePhysicalMemoryEx`
|
||||
*may* mis-route in a way that breaks downstream code (β-leak). The
|
||||
matched-prefix metric below (161 → 102014) is a *positive* signal that
|
||||
this is NOT the case for at least the first ~102K events: the game's
|
||||
boot sequence does not (yet) do region-arithmetic that distinguishes
|
||||
`0xBC220000` from `0x40105000`. If a later divergence (e.g. at 102014,
|
||||
`RtlImageXexHeaderField` — out of scope for this session) does turn
|
||||
out to be a downstream consequence of the wrong region, that's the
|
||||
trigger to escalate.
|
||||
|
||||
## Allocator function set covered by Path α
|
||||
|
||||
For completeness in the canonicalization (not for "widening scope" of the
|
||||
fix — the divergence at idx=161 is the only one this session targets;
|
||||
listing the other allocators only ensures the canonicalization is uniform
|
||||
and doesn't surface false ordinal-drift later):
|
||||
|
||||
* `MmAllocatePhysicalMemoryEx` — the immediate target
|
||||
* `MmAllocatePhysicalMemory` — same family
|
||||
* `NtAllocateVirtualMemory` — sibling allocator (returns user-heap VA)
|
||||
* `RtlAllocateHeap` — Rtl-side heap (returns user-heap VA)
|
||||
* `MmCreateKernelStack` — stack allocator
|
||||
|
||||
If any of these *also* diverge in raw-VA form but the surrounding code
|
||||
agrees (same ordinal call sequence), they'll silently canonicalize. If
|
||||
they diverge on call-count ordering, the ordinals drift and the
|
||||
divergence surfaces correctly at the first drifted call. That's the
|
||||
right behavior.
|
||||
111
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/re-validation.md
Normal file
111
audit-runs/phase-c2-MmAllocatePhysicalMemoryEx/re-validation.md
Normal file
@@ -0,0 +1,111 @@
|
||||
# Phase C+2 — re-validation
|
||||
|
||||
## Gate 1 — Determinism (cvar-OFF)
|
||||
|
||||
3 fresh runs of `xrs-phaseC2 check -n 50000000 --stable-digest --out …`:
|
||||
|
||||
| run | digest md5 |
|
||||
|-----|------------|
|
||||
| 1 | 608d8e8d293250698207a7d8fc0c18df |
|
||||
| 2 | 608d8e8d293250698207a7d8fc0c18df |
|
||||
| 3 | 608d8e8d293250698207a7d8fc0c18df |
|
||||
| Phase C+1 baseline | 608d8e8d293250698207a7d8fc0c18df |
|
||||
|
||||
**Result**: ✅ byte-identical to C+1 baseline. Expected: this session only
|
||||
modified `tools/diff-events/diff_events.py` (Python diff tool); zero ours
|
||||
engine code touched.
|
||||
|
||||
## Gate 2 — Phase B `image_canonical_sha256`
|
||||
|
||||
Not re-snapshotted. Inferred OK by Gate 1 (no image-loading or memory-
|
||||
layout code modified).
|
||||
|
||||
## Gate 3 — Phase A matched-prefix extension (THE KEY METRIC)
|
||||
|
||||
Re-ran the existing capture pair (canary from `phase-c-first-divergence/
|
||||
phase-a/canary.jsonl`; ours from `phase-c1-keQuerySystemTime/ours.jsonl`)
|
||||
through the upgraded diff tool. No fresh engine runs needed — diff tool
|
||||
change is comparison-side only.
|
||||
|
||||
| chain | C+1 matched (pre-canonicalize) | C+2 matched (post-canonicalize) | Δ |
|
||||
|-------|--------------------------------|--------------------------------|----|
|
||||
| canary tid=6 → ours tid=1 (main) | 161 | **102014** | **+101853** |
|
||||
| canary tid=4 → ours tid=11 | 5 | 5 | 0 |
|
||||
| canary tid=7 → ours tid=2 | 2 | 2 | 0 |
|
||||
| canary tid=12 → ours tid=7 | 2 | 2 | 0 |
|
||||
| canary tid=14 → ours tid=9 | 11 | 11 | 0 |
|
||||
| canary tid=15 → ours tid=10 | — (no div) | — (no div) | 0 |
|
||||
|
||||
**Main thread matched prefix: 161 → 102014. Gate 3 ✅** (strictly greater
|
||||
than 161, as required).
|
||||
|
||||
The new divergence at idx=102014 on tid=6→tid=1 is `RtlImageXexHeaderField`
|
||||
returning `805433576` (`0x3001F0E8`) in canary vs `0` in ours — the next
|
||||
Phase C+3 target. Off-thread divergences (tid=4/7/12/14) are unchanged by
|
||||
this fix because their divergences are not allocator-related.
|
||||
|
||||
## Gate 4 — Build
|
||||
|
||||
No engine build required (Python tool only). `python3 -c 'import
|
||||
xenia-rs.tools.diff-events.diff_events'` not applicable (script, not
|
||||
package). The tool runs end-to-end on three input pairs (validation
|
||||
checks below) without traceback.
|
||||
|
||||
## Gate 5 — Phase A determinism (emitter)
|
||||
|
||||
Emitter unchanged. The Phase A capture files used as input are byte-
|
||||
identical to those used in C+1 (no re-emit). Gate vacuous; recorded
|
||||
N/A.
|
||||
|
||||
## Gate 6 — Backward compatibility (diff-tool flag)
|
||||
|
||||
```
|
||||
python3 diff_events.py --no-canonicalize-allocators \
|
||||
--canary ...canary.jsonl --ours ...c1-ours.jsonl
|
||||
```
|
||||
|
||||
Result table:
|
||||
|
||||
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|
||||
|---|---|---|---|---|---|
|
||||
| 4 | 11 | 5 | 47573 | 9 | 5 |
|
||||
| 6 | 1 | 161 | 329948 | 108492 | 161 |
|
||||
| 7 | 2 | 2 | 29 | 33 | 2 |
|
||||
| 12 | 7 | 2 | 6689 | 3 | 2 |
|
||||
| 14 | 9 | 11 | 1371603 | 75 | 11 |
|
||||
| 15 | 10 | 15 | 863209 | 15 | — |
|
||||
|
||||
**Matches the C+1 pre-fix table byte-for-byte. ✅** The new flag reproduces
|
||||
the legacy behavior exactly.
|
||||
|
||||
## Gate 7 — Self-diff sanity (no false positive)
|
||||
|
||||
```
|
||||
python3 diff_events.py --validate-identical \
|
||||
--canary phase-a-diff-harness/canary-sanity.jsonl \
|
||||
--ours phase-a-diff-harness/canary-sanity.jsonl
|
||||
exit code: 0
|
||||
```
|
||||
|
||||
A stream diffed against itself reports zero divergences. The
|
||||
canonicalization is *idempotent under self-diff* — both sides assign the
|
||||
same ordinals on the same stream, so the sentinels match. ✅
|
||||
|
||||
```
|
||||
python3 diff_events.py --validate-identical \
|
||||
--canary phase-c1-keQuerySystemTime/ours.jsonl \
|
||||
--ours phase-c1-keQuerySystemTime/ours.jsonl
|
||||
exit code: 0
|
||||
```
|
||||
|
||||
## Summary
|
||||
|
||||
| gate | result |
|
||||
|---|---|
|
||||
| 1. cvar-OFF determinism (3 ours runs vs C+1 baseline) | ✅ all 4 = `608d8e8d…` |
|
||||
| 2. Phase B `image_canonical_sha256` | ✅ (inferred from gate 1) |
|
||||
| 3. Phase A main matched prefix > 161 | ✅ **161 → 102014** (+101853) |
|
||||
| 4. Build clean | ✅ (Python only) |
|
||||
| 5. Phase A determinism | ✅ (emitter unchanged; vacuous) |
|
||||
| 6. `--no-canonicalize-allocators` backward-compat | ✅ byte-identical to C+1 |
|
||||
| 7. Self-diff sanity | ✅ exit 0 both cases |
|
||||
Reference in New Issue
Block a user