handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,102 @@
# Phase C+15-α Schema-Wiring Audit (2026-05-14)
## Phase 1 — Wired/unwired matrix (pre-session)
| Kind | Canary emits? | Ours emits? | Status (pre) | Priority |
|---------------------|---------------|-------------|---------------|----------|
| `schema_version` | yes | yes | wired | — |
| `import.call` | yes | yes | wired | — |
| `kernel.call` | yes | yes | wired (+C+10) | — |
| `kernel.return` | yes | yes | wired | — |
| `handle.create` | declared | declared | **stubbed** | HIGH |
| `handle.destroy` | declared | declared | **stubbed** | HIGH |
| `thread.create` | declared | declared | **stubbed** | HIGH |
| `thread.exit` | declared | declared | **stubbed** | HIGH |
| `wait.begin` | declared | declared | **stubbed** | HIGH |
| `wait.end` | declared | declared | **stubbed** | HIGH |
| `thread.suspend` | declared | not in API | unwired | LOW |
| `thread.resume` | declared | not in API | unwired | LOW |
| `vfs.open` | declared | not in API | redundant? | MEDIUM |
| `vfs.read` | declared | not in API | high-vol | LOW |
| `vfs.close` | declared | not in API | redundant? | MEDIUM |
| `mem.write` | declared | not in API | opt-in | LOW |
## Phase 2/3 — Kinds wired this session
Wired symmetrically in both engines (cvar-gated default-off):
- **`handle.create`** — emitted from `KernelState::alloc_handle_for` (ours) /
`ObjectTable::AddHandle` (canary). 39+ call sites covered via centralized hook.
- **`handle.destroy`** — emitted from `nt_close` + `xam_task_close_handle` (ours) /
`ObjectTable::RemoveHandle` (canary).
- **`thread.create`** — emitted from `ex_create_thread` (ours) / `ExCreateThread`
in `xboxkrnl_threading.cc` (canary). After spawn succeeds.
- **`thread.exit`** — emitted from `ex_terminate_thread` (ours) / `XThread::Exit`
(canary). Canary's `XThread::Exit` covers both explicit `ExTerminateThread`
and implicit thread-entry returns.
- **`wait.begin`** — emitted from `nt_wait_for_single_object_ex` +
`ke_wait_for_single_object` (ours) / `xeKeWaitForSingleObject` +
`NtWaitForSingleObjectEx` (canary).
Deferred (v1.2):
- **`wait.end`** — design challenge: wait can park the guest thread, and the
wake-status path differs between engines. Sync outcome status is already
captured in the immediately-following `kernel.return`. Async wake outcome
surfaced in subsequent events.
- **`thread.suspend` / `thread.resume`** — low-frequency; defer until needed.
- **`vfs.*`** — redundant with `kernel.call` for Nt*File. Skip per schema-v1
audit recommendation.
- **`mem.write`** — opt-in only (separate cvar); high-volume.
## Code summary
### Ours (~140 LOC)
- `crates/xenia-kernel/src/event_log.rs` — registry + auto helpers
(`register_handle_semantic_id`, `lookup_handle_semantic_id`,
`forget_handle_semantic_id`, `emit_handle_create_auto`,
`emit_handle_destroy_auto`). +85 LOC.
- `crates/xenia-kernel/src/objects.rs``KernelObject::schema_object_type()`.
+14 LOC.
- `crates/xenia-kernel/src/state.rs``alloc_handle_for` emit hook. +24 LOC.
- `crates/xenia-kernel/src/exports.rs``nt_close` destroy emit,
`ex_create_thread` thread.create emit, `ex_terminate_thread` thread.exit emit,
`nt_wait_for_single_object_ex` + `ke_wait_for_single_object` wait.begin emits,
+ `decode_timeout_ns` helper. +85 LOC.
- `crates/xenia-kernel/src/xam.rs``xam_task_close_handle` destroy emit. +14 LOC.
### Canary (~130 LOC)
- `src/xenia/kernel/event_log.h` — registry API (`RegisterHandleSemanticId`,
`LookupHandleSemanticId`, `ForgetHandleSemanticId`, `EmitHandleCreateAuto`,
`EmitHandleDestroyAuto`). +20 LOC.
- `src/xenia/kernel/event_log.cc` — per-tid counter map (was per-host-thread
`thread_local`; produced duplicate `tid_event_idx` for tid=0 across host
threads — a bug in the pre-session implementation), `CurrentTid` non-asserting
via new `XThread::TryGetCurrentThread`, registry helpers, auto-emit wrappers.
+60 LOC net.
- `src/xenia/kernel/xthread.h` + `xthread.cc``TryGetCurrentThread` accessor
+ `XThread::Exit` thread.exit emit. +12 LOC.
- `src/xenia/kernel/util/object_table.cc``AddHandle`/`RemoveHandle` hooks
+ `SchemaObjectType` mapping. +35 LOC.
- `src/xenia/kernel/xboxkrnl/xboxkrnl_threading.cc``ExCreateThread`
thread.create emit, `xeKeWaitForSingleObject` + `NtWaitForSingleObjectEx`
wait.begin emits. +30 LOC.
### Diff tool
- `tools/diff-events/diff_events.py``SKIP_PAYLOAD_FIELDS_BY_KIND` now skips
`handle_semantic_id` (cross-engine `creating_tid` differs, so SIDs are
engine-local), `parent_tid`, `handles_semantic_ids`, `woken_by_semantic_id`.
+6 LOC.
## Bug found and fixed this session
**Pre-session bug**: canary's `t_tid_event_idx` was a host-thread-local global,
not a tid-keyed counter. When `AddHandle` runs from multiple host threads with
tid==0 (boot init + early XThread bootstrap before guest tid is assigned), each
host thread had its own counter starting at 0, producing duplicate
`tid_event_idx` values within the tid=0 stream. The diff tool rejected the
file with "events out of order at index 8". Fixed by replacing the thread_local
with a tid-keyed `std::unordered_map` + mutex (matches ours's design).

View File

@@ -0,0 +1,189 @@
# Phase A diff report
**This report is the output of Phase A's diff harness. Divergences
shown here are INPUT for Phase B (first-divergence localization),
not findings of Phase A.** Phase A's job is to make the harness
itself correct, not to analyze what it surfaces.
## Summary
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|---|---|---|---|---|---|
| 4 | 11 | 8 | 151690 | 9 | 8 |
| 6 | 1 | 102168 | 432396 | 108490 | 102168 |
| 7 | 2 | 30 | 32 | 32 | 30 |
| 12 | 7 | 2 | 27834 | 4 | 2 |
| 14 | 9 | 2 | 4733192 | 76 | 2 |
| 15 | 10 | 16 | 3610535 | 16 | — |
## canary_tid=4 → ours_tid=11
First divergence at `tid_event_idx=8`: kind: canary='handle.create' ours='kernel.return'
**Pre-context (last 5 matching events):**
```
canary: [3] import.call KeSetEvent
ours: [3] import.call KeSetEvent
canary: [4] kernel.call KeSetEvent
ours: [4] kernel.call KeSetEvent
canary: [5] kernel.return KeSetEvent
ours: [5] kernel.return KeSetEvent
canary: [6] import.call KeWaitForMultipleObjects
ours: [6] import.call KeWaitForMultipleObjects
canary: [7] kernel.call KeWaitForMultipleObjects
ours: [7] kernel.call KeWaitForMultipleObjects
```
**Divergent event:**
```
canary: [8] handle.create sid=bcaf14d76932b128
ours: [8] kernel.return KeWaitForMultipleObjects
```
**Next event after the divergence (if any):**
```
canary: [9] handle.create sid=0760e947bacff199
ours: <end of stream>
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1896894000, "kind": "handle.create", "payload": {"handle_semantic_id": "bcaf14d76932b128", "object_name": null, "object_type": 1, "raw_handle_id": "0xf800009c"}, "schema_version": 1, "tid": 4, "tid_event_idx": 8}
{"deterministic": true, "engine": "ours", "guest_cycle": 91, "host_ns": 1693823256, "kind": "kernel.return", "payload": {"name": "KeWaitForMultipleObjects", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 11, "tid_event_idx": 8}
```
## canary_tid=6 → ours_tid=1
First divergence at `tid_event_idx=102168`: kind: canary='kernel.return' ours='handle.destroy'
**Pre-context (last 5 matching events):**
```
canary: [102163] kernel.call XamTaskSchedule
ours: [102163] kernel.call XamTaskSchedule
canary: [102164] handle.create sid=097dca960c32feb2
ours: [102164] handle.create sid=b53a312c0ac30f49
canary: [102165] kernel.return XamTaskSchedule
ours: [102165] kernel.return XamTaskSchedule
canary: [102166] import.call XamTaskCloseHandle
ours: [102166] import.call XamTaskCloseHandle
canary: [102167] kernel.call XamTaskCloseHandle
ours: [102167] kernel.call XamTaskCloseHandle
```
**Divergent event:**
```
canary: [102168] kernel.return XamTaskCloseHandle
ours: [102168] handle.destroy sid=b53a312c0ac30f49
```
**Next event after the divergence (if any):**
```
canary: [102169] import.call KeWaitForSingleObject
ours: [102169] kernel.return XamTaskCloseHandle
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1473555500, "kind": "kernel.return", "payload": {"name": "XamTaskCloseHandle", "return_value": 1, "side_effects": [], "status": "0x00000001"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102168}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 495859368, "kind": "handle.destroy", "payload": {"handle_semantic_id": "b53a312c0ac30f49", "prior_refcount": 1, "raw_handle_id": "0x00001018"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102168}
```
## canary_tid=7 → ours_tid=2
First divergence at `tid_event_idx=30`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [25] import.call KeSetEvent
ours: [25] import.call KeSetEvent
canary: [26] kernel.call KeSetEvent
ours: [26] kernel.call KeSetEvent
canary: [27] kernel.return KeSetEvent
ours: [27] kernel.return KeSetEvent
canary: [28] import.call KeWaitForSingleObject
ours: [28] import.call KeWaitForSingleObject
canary: [29] kernel.call KeWaitForSingleObject
ours: [29] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [30] handle.create sid=e1f14feb316c28dd
ours: [30] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [31] wait.begin {'handles_semantic_ids': ['e1f14feb316c28dd'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
ours: [31] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1475668700, "kind": "handle.create", "payload": {"handle_semantic_id": "e1f14feb316c28dd", "object_name": null, "object_type": 1, "raw_handle_id": "0xf800001c"}, "schema_version": 1, "tid": 7, "tid_event_idx": 30}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 496144562, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 2, "tid_event_idx": 30}
```
## canary_tid=12 → ours_tid=7
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [2] handle.create sid=750aad55e1061f0a
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': 429466729600, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [3] wait.begin {'handles_semantic_ids': ['750aad55e1061f0a'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
ours: [3] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1660019000, "kind": "handle.create", "payload": {"handle_semantic_id": "750aad55e1061f0a", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000068"}, "schema_version": 1, "tid": 12, "tid_event_idx": 2}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 528900173, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": 429466729600, "wait_type": "any"}, "schema_version": 1, "tid": 7, "tid_event_idx": 2}
```
## canary_tid=14 → ours_tid=9
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [2] handle.create sid=3df8ca649bf76cc8
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [3] wait.begin {'handles_semantic_ids': ['3df8ca649bf76cc8'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
ours: [3] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1788314200, "kind": "handle.create", "payload": {"handle_semantic_id": "3df8ca649bf76cc8", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000098"}, "schema_version": 1, "tid": 14, "tid_event_idx": 2}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 1655554743, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 9, "tid_event_idx": 2}
```
## canary_tid=15 → ours_tid=10
No divergence within the 16 compared events (canary has 3610535, ours has 16).

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,121 @@
# Phase C+15-α — New Divergence Catalog (2026-05-14)
Surfaced by the schema-v1.1 wiring of `handle.create/destroy`,
`thread.create/exit`, `wait.begin` in both engines.
## Cold-vs-cold matched-prefix table (post-wiring)
| canary_tid | ours_tid | matched | first_divergence_at | divergence kind |
|------------|----------|---------|---------------------|------------------------|
| 6 | 1 | 102,168 | 102,168 | extra `handle.destroy` in ours (XamTaskCloseHandle refcount mismatch) |
| 15 | 10 | 16 | — | no divergence in 16 evts (canary 3.6M, ours stalls) |
| 7 | 2 | 30 | 30 | KeWaitForSingleObject native-obj handle (class E) |
| 4 | 11 | 8 | 8 | KeWaitForMultipleObjects native-obj handle (class E) |
| 12 | 7 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
| 14 | 9 | 2 | 2 | KeWaitForSingleObject native-obj handle (class E) |
Main matched prefix dropped from **104,574 (C+13/C+14)** to **102,168** — a
regression of ~2,400 events. This is the expected outcome: invisible state
divergences are now visible.
## Cataloged divergences (priority-ordered for future iterate)
### D-1 (HIGH) — main chain idx=102,168: extra `handle.destroy` on `XamTaskCloseHandle`
- **Chain**: canary tid=6 ↔ ours tid=1.
- **Event**:
- ours: `handle.destroy sid=b53a312c0ac30f49` then `kernel.return XamTaskCloseHandle return=1`
- canary: `kernel.return XamTaskCloseHandle return=1` (no `handle.destroy`)
- **Hypothesis**: Ours's `xam_task_close_handle` (xam.rs:300-344) decrements
refcount and destroys the handle when it reaches 0. Canary's
`XamTaskCloseHandle_entry``NtClose``ObjectTable::ReleaseHandle` only
destroys when refcount reaches 0; canary's spawned thread keeps an additional
ref on the thread handle (`object->Retain()` in `XThread::Create` line 408
via `RetainHandle()`). Ours's refcount of 1 at this point is wrong — should
be 2 (user ref + spawned-thread ref). Ours destroys prematurely.
- **Impact**: leaks downstream divergences; spawned thread now has a dangling
handle reference.
- **Fix scope**: ~20 LOC in `xam_task_schedule` / `ex_create_thread`
add explicit `state.handle_refcount[handle] += 1` after spawn for the
XThread's own ref. Verify against canary's `RetainHandle()` semantics.
### D-2 (HIGH) — chain tid=4 / canary, tid=11 / ours: ours stops at idx=8
- **Chain**: canary tid=4 ↔ ours tid=11.
- **Event**:
- ours: `kernel.return KeWaitForMultipleObjects status=0` at idx=8, then
stream ends (9 total events).
- canary: `handle.create sid=bcaf14d76932b128 (Event)` at idx=8, then
`handle.create sid=0760e947bacff199` at idx=9, then continues for 151,690
events.
- **Hypothesis (class E asymmetry)**: Canary's `KeWaitForMultipleObjects_entry`
iterates the object pointer array and calls
`XObject::GetNativeObject<XObject>(kernel_state, object_ptr, -1, true)`
for each — when the object has not yet been wrapped in an `XObject*`, this
CREATES a new XObject (and thus a new handle). Ours's `do_wait_multiple`
uses `resolve_pseudo_handle` which does NOT create a new XObject — it
looks up the existing handle. The "handle for the native dispatcher object"
is an engine-architectural difference: canary lazily wraps,
ours pre-registers.
- **Impact**: every Ke*Wait* that takes object pointers (not handles) creates
N extra handle.create events on the canary side. Ours emits none.
- **Fix scope**: this is class E (intentional asymmetry). Recommended action:
add `Ke{Wait,Set,Reset,...}*Object*` exports that take object pointers to a
diff-tool **suppress-handle-create-side-effect** list, OR have ours emit
a synthetic `handle.create` when `resolve_pseudo_handle` first encounters
a new pointer. Latter aligns canary's view better. ~30-50 LOC.
### D-3 (HIGH) — same class on chains 7→2 (idx=30), 12→7 (idx=2), 14→9 (idx=2)
Same root cause as D-2 — `KeWaitForSingleObject` with raw object pointer.
Canary's `xeKeWaitForSingleObject` calls `GetNativeObject` which creates a
handle for the dispatcher; ours's `resolve_pseudo_handle` does not.
Group all 4 chains under one fix in D-2.
### D-4 (MEDIUM) — wait.begin SID `0000000000000000` on tid=10 of ours
- **Chain**: canary tid=15 ↔ ours tid=10 (the only thread where prefix didn't
regress — but ours stalls at idx=16).
- **Event** at idx=2: both engines emit `wait.begin` but ours's
`handles_semantic_ids = ["0000000000000000"]` while canary's is real.
- **Hypothesis**: SID = 0 means `lookup_handle_semantic_id` returned 0 (handle
not registered). The handle being waited on must have been created before
the event_log SID registry was active (during boot / init), OR it's a
pseudo-handle from `resolve_pseudo_handle`. Pseudo-handles aren't real
handles in our model.
- **Fix scope**: when `lookup_handle_semantic_id(h) == 0`, lazy-emit a
synthetic `handle.create` for `h` (with a default object_type per
`state.objects[h]`'s schema kind). Aligns with D-2 fix. ~10 LOC.
### D-5 (LOW) — chains 7→2, 12→7, 14→9: ours streams truncated
- Ours's tid=2/7/9/10 streams are 32/4/76/16 events long; canary's are
32/27,834/4,733,192/3,610,535. Ours's worker threads stall early.
- **Hypothesis**: Downstream of D-2 / D-1 — once the main thread or peer
workers diverge, downstream threads block on signals that never come.
- **Fix scope**: deferred until D-1/D-2 land; likely no separate fix needed.
## Acceptance gate status
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13 baseline).
- **Gate 2 (cvar-on emit)**: PASS — both engines produce 14M+ / 121K events
respectively; JSONL parses cleanly; all new kinds present.
- **Gate 3 (diff tool)**: PASS — diff tool consumes new kinds, produces
6-chain divergence report. Cross-engine SID skip-comparison documented in
`SKIP_PAYLOAD_FIELDS_BY_KIND`.
- **Gate 4 (cold-vs-cold)**: PASS (with regression as designed) — main chain
prefix 104,574 → 102,168 (-2,406 events). Divergence catalog produced.
- **Gate 5 (build clean)**: PASS — canary + ours both build.
- **Gate 6 (tests)**: PASS — 181 → 181 passing (no new tests added; existing
unchanged).
## Reading-error class avoided
**Class #29 — per-host-thread tid_event_idx counter for shared synthetic tids**:
canary's pre-session `thread_local uint64_t t_tid_event_idx` was correct for
guest-tid events (1 tid : 1 host_thread) but broken for boot-time emissions
with `tid=0` because boot init runs on multiple host threads. Symptom: the
diff tool rejected the canary log with "events out of order at index 8".
Fixed via tid-keyed global map (matches ours's design).