handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,91 @@
# Phase C+16 cold-vs-cold result (2026-05-14)
## Matched-prefix table
| canary_tid | ours_tid | C+15-α | C+16 | delta | first_divergence_at | kind |
|------------|----------|---------|---------|-------|---------------------|-----------------------------------|
| 6 | 1 | 102,168 | 102,171 | **+3**| 102,171 | `handle.create` (class E) |
| 4 | 11 | 8 | 8 | 0 | 8 | `handle.create` (class E) |
| 7 | 2 | 30 | 30 | 0 | 30 | `handle.create` (class E) |
| 12 | 7 | 2 | 2 | 0 | 2 | `handle.create` (class E) |
| 14 | 9 | 2 | 2 | 0 | 2 | `handle.create` (class E) |
| 15 | 10 | 16 | 16 | — | — | no divergence |
Main matched prefix advanced 102,168 → 102,171 (+3). All 5 sister
chains unchanged.
## New first divergence (idx=102,171)
```
canary: [102169] import.call KeWaitForSingleObject
ours: [102169] import.call KeWaitForSingleObject
canary: [102170] kernel.call KeWaitForSingleObject
ours: [102170] kernel.call KeWaitForSingleObject
canary: [102171] handle.create sid=68fec8909ea5d1f5
ours: [102171] wait.begin {'handles_semantic_ids': ['0000000000000000'], ...}
canary: [102172] wait.begin {'handles_semantic_ids': ['68fec8909ea5d1f5'], ...}
ours: [102172] kernel.return KeWaitForSingleObject
```
This is **class E** — same root cause as D-2/D-3/D-4 in the C+15-α
catalog. Canary's `xeKeWaitForSingleObject` calls
`XObject::GetNativeObject<XObject>(...)` which CREATES a new handle for
the native dispatcher object on first encounter; ours's
`resolve_pseudo_handle` does not, so the `wait.begin`'s
`handles_semantic_ids` is `0000000000000000`. The next Phase C+17
target.
## Acceptance gates
- **Gate 1 (default-off digest)**: PASS — 3× reproducible at
`e1dfcb1559f987b35012a7f2dc6d93f5` (unchanged from C+13/C+15-α
baseline). The refcount fix is observation-only at the digest level;
guest behavior is unchanged because no actual code path depends on
the precise destruction timing of the closed-but-still-running thread
handle within the 50M-instruction window.
- **Gate 2 (cvar-on emit)**: PASS — both engines produce JSONL cleanly
(ours 121,537 events; canary 2,512,481 events in 90s).
- **Gate 3 (diff tool)**: PASS — diff tool consumes events, produces
6-chain divergence report; main divergence at 102,171 (was 102,168).
- **Gate 4 (cold-vs-cold)**: PASS — main matched prefix advances +3,
no sister-chain regressions.
- **Gate 5 (build clean)**: PASS — `cargo build --release` clean
(1 pre-existing dead_code warning unrelated).
- **Gate 6 (tests)**: PASS — 181 → 186 (added 5 refcount lifecycle
tests; all pass).
- **Gate 7 (Phase B image hash)**: NOT EXECUTED (no engine change
reaches XEX load); inferred unchanged from invariant cold-stable
digest.
## Sister-chain analysis
No sister chain advanced beyond C+15-α matched-prefix. tid=4→11,
tid=7→2, tid=12→7, tid=14→9 all diverge at the same indexes — the
C+16 refcount fix is on a distinct code path from class-E
KeWaitForSingleObject native-obj handle. C+17 must address class E
to advance those chains.
## Reading-error class
None new. Reading-error #28 discipline (verify framing first) was
followed; canary source was read end-to-end for `XThread::Create`,
`XObject::RetainHandle`/`ReleaseHandle`, `ObjectTable::AddHandle`/
`RetainHandle`/`ReleaseHandle`/`RemoveHandle`, and
`XamTaskSchedule_entry`/`XamTaskCloseHandle_entry` before any code
change.
## Refcount leak risk audit
Three test cases cover the lifecycle balance:
1. `ex_create_then_close_then_exit_balances_refcount` — close first,
then exit. Refcount 2→1→0. Handle destroyed. No leak.
2. `xam_task_schedule_close_then_thread_exit_destroys_handle` — same
ordering via XAM path.
3. `xam_task_thread_exit_then_close_destroys_handle` — exit first,
then close. Refcount 2→1→0. Handle destroyed. No leak.
The reverse case (no close, only exit) leaves refcount at 1
(creator-only) which is correct: the handle slot remains until the
creator explicitly closes it. This matches canary's behavior — the
guest is responsible for closing handles it allocated.

View File

@@ -0,0 +1,189 @@
# Phase A diff report
**This report is the output of Phase A's diff harness. Divergences
shown here are INPUT for Phase B (first-divergence localization),
not findings of Phase A.** Phase A's job is to make the harness
itself correct, not to analyze what it surfaces.
## Summary
| canary_tid | ours_tid | matched | canary_total | ours_total | first_divergence_at |
|---|---|---|---|---|---|
| 4 | 11 | 8 | 23935 | 9 | 8 |
| 6 | 1 | 102171 | 347863 | 108489 | 102171 |
| 7 | 2 | 30 | 32 | 32 | 30 |
| 12 | 7 | 2 | 28174 | 4 | 2 |
| 14 | 9 | 2 | 553617 | 76 | 2 |
| 15 | 10 | 16 | 334560 | 16 | — |
## canary_tid=4 → ours_tid=11
First divergence at `tid_event_idx=8`: kind: canary='handle.create' ours='kernel.return'
**Pre-context (last 5 matching events):**
```
canary: [3] import.call KeSetEvent
ours: [3] import.call KeSetEvent
canary: [4] kernel.call KeSetEvent
ours: [4] kernel.call KeSetEvent
canary: [5] kernel.return KeSetEvent
ours: [5] kernel.return KeSetEvent
canary: [6] import.call KeWaitForMultipleObjects
ours: [6] import.call KeWaitForMultipleObjects
canary: [7] kernel.call KeWaitForMultipleObjects
ours: [7] kernel.call KeWaitForMultipleObjects
```
**Divergent event:**
```
canary: [8] handle.create sid=bcaf14d76932b128
ours: [8] kernel.return KeWaitForMultipleObjects
```
**Next event after the divergence (if any):**
```
canary: [9] handle.create sid=0760e947bacff199
ours: <end of stream>
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1925822900, "kind": "handle.create", "payload": {"handle_semantic_id": "bcaf14d76932b128", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000098"}, "schema_version": 1, "tid": 4, "tid_event_idx": 8}
{"deterministic": true, "engine": "ours", "guest_cycle": 91, "host_ns": 1676162438, "kind": "kernel.return", "payload": {"name": "KeWaitForMultipleObjects", "return_value": 0, "side_effects": [], "status": "0x00000000"}, "schema_version": 1, "tid": 11, "tid_event_idx": 8}
```
## canary_tid=6 → ours_tid=1
First divergence at `tid_event_idx=102171`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [102166] import.call XamTaskCloseHandle
ours: [102166] import.call XamTaskCloseHandle
canary: [102167] kernel.call XamTaskCloseHandle
ours: [102167] kernel.call XamTaskCloseHandle
canary: [102168] kernel.return XamTaskCloseHandle
ours: [102168] kernel.return XamTaskCloseHandle
canary: [102169] import.call KeWaitForSingleObject
ours: [102169] import.call KeWaitForSingleObject
canary: [102170] kernel.call KeWaitForSingleObject
ours: [102170] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [102171] handle.create sid=68fec8909ea5d1f5
ours: [102171] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [102172] wait.begin {'handles_semantic_ids': ['68fec8909ea5d1f5'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
ours: [102172] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1446013900, "kind": "handle.create", "payload": {"handle_semantic_id": "68fec8909ea5d1f5", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000014"}, "schema_version": 1, "tid": 6, "tid_event_idx": 102171}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 465155319, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 1, "tid_event_idx": 102171}
```
## canary_tid=7 → ours_tid=2
First divergence at `tid_event_idx=30`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [25] import.call KeSetEvent
ours: [25] import.call KeSetEvent
canary: [26] kernel.call KeSetEvent
ours: [26] kernel.call KeSetEvent
canary: [27] kernel.return KeSetEvent
ours: [27] kernel.return KeSetEvent
canary: [28] import.call KeWaitForSingleObject
ours: [28] import.call KeWaitForSingleObject
canary: [29] kernel.call KeWaitForSingleObject
ours: [29] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [30] handle.create sid=e1f14feb316c28dd
ours: [30] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [31] wait.begin {'handles_semantic_ids': ['e1f14feb316c28dd'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
ours: [31] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1448036300, "kind": "handle.create", "payload": {"handle_semantic_id": "e1f14feb316c28dd", "object_name": null, "object_type": 1, "raw_handle_id": "0xf800001c"}, "schema_version": 1, "tid": 7, "tid_event_idx": 30}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 465402043, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 2, "tid_event_idx": 30}
```
## canary_tid=12 → ours_tid=7
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [2] handle.create sid=750aad55e1061f0a
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': 429466729600, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [3] wait.begin {'handles_semantic_ids': ['750aad55e1061f0a'], 'timeout_ns': -30000000, 'alertable': False, 'wait_type': 'any'}
ours: [3] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1639759400, "kind": "handle.create", "payload": {"handle_semantic_id": "750aad55e1061f0a", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000064"}, "schema_version": 1, "tid": 12, "tid_event_idx": 2}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 491840482, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": 429466729600, "wait_type": "any"}, "schema_version": 1, "tid": 7, "tid_event_idx": 2}
```
## canary_tid=14 → ours_tid=9
First divergence at `tid_event_idx=2`: kind: canary='handle.create' ours='wait.begin'
**Pre-context (last 5 matching events):**
```
canary: [0] import.call KeWaitForSingleObject
ours: [0] import.call KeWaitForSingleObject
canary: [1] kernel.call KeWaitForSingleObject
ours: [1] kernel.call KeWaitForSingleObject
```
**Divergent event:**
```
canary: [2] handle.create sid=3df8ca649bf76cc8
ours: [2] wait.begin {'handles_semantic_ids': ['0000000000000000'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
```
**Next event after the divergence (if any):**
```
canary: [3] wait.begin {'handles_semantic_ids': ['3df8ca649bf76cc8'], 'timeout_ns': -1, 'alertable': False, 'wait_type': 'any'}
ours: [3] kernel.return KeWaitForSingleObject
```
**Raw events (JSON):**
```json
{"deterministic": true, "engine": "canary", "guest_cycle": 0, "host_ns": 1763660400, "kind": "handle.create", "payload": {"handle_semantic_id": "3df8ca649bf76cc8", "object_name": null, "object_type": 1, "raw_handle_id": "0xf8000094"}, "schema_version": 1, "tid": 14, "tid_event_idx": 2}
{"deterministic": true, "engine": "ours", "guest_cycle": 0, "host_ns": 1613114265, "kind": "wait.begin", "payload": {"alertable": false, "handles_semantic_ids": ["0000000000000000"], "timeout_ns": -1, "wait_type": "any"}, "schema_version": 1, "tid": 9, "tid_event_idx": 2}
```
## canary_tid=15 → ours_tid=10
No divergence within the 16 compared events (canary has 334560, ours has 16).

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,10 @@
{
"instructions": 50000007,
"imports": 40390,
"unimpl": 0,
"draws": 0,
"swaps": 1,
"unique_render_targets": 0,
"shader_blobs_live": 0,
"texture_cache_entries": 0
}

View File

@@ -0,0 +1,124 @@
# Phase C+16 Investigation — XamTaskCloseHandle refcount (2026-05-14)
## Framing verification (reading-error #28 discipline)
C+15-α's catalog D-1 hypothesis was: "canary's spawned thread keeps an
additional ref on the thread handle (`object->Retain()` in `XThread::Create`
line 408 via `RetainHandle()`)". Verified against canary source.
### Canary's refcount model (xobject.cc + util/object_table.cc)
Two separate refcounts on `XObject`:
1. **`pointer_ref_count_`** — the C++ object pointer refcount.
- Bumped by `XObject::Retain()` / dropped by `XObject::Release()`.
- `AddHandle()` calls `object->Retain()` once when inserting into table.
2. **`handle_ref_count`** in `ObjectTableEntry` — the per-handle (per-slot)
refcount that determines when the object is removed from the object
table.
- Initialized to 1 in `AddHandle()` (object_table.cc:164).
- Bumped by `RetainHandle()` (object_table.cc:218-228 → `entry->handle_ref_count++`).
- Decremented by `ReleaseHandle()` (object_table.cc:230-249).
- On reaching 0, calls `RemoveHandle()` which emits `handle.destroy`
and releases the pointer ref (`object->Release()`).
### Canary's XThread lifecycle
- `XObject` ctor → `AddHandle(this)``handle_ref_count = 1`,
emits `handle.create`.
- `XThread::Create()` (xthread.cc:414) → `RetainHandle()`
`handle_ref_count = 2`. Comment: *"Always retain when starting - the
thread owns itself until exited."*
- User calls `NtClose(handle)``ReleaseHandle()``handle_ref_count = 1`.
Object SURVIVES; no `handle.destroy` emitted.
- Thread exits via `XThread::Exit()` (xthread.cc:524) → `ReleaseHandle()`
`handle_ref_count = 0``RemoveHandle()` → emits `handle.destroy`
+ drops pointer ref → object destroyed.
### Canary's XAM task lifecycle (xam/xam_task.cc:43-94)
`XamTaskSchedule_entry` creates an `XThread` (which adds it to the
object table) then calls `thread->Create()` (xthread.cc:315) which adds
the self-ref via `RetainHandle()`. The handle written to `handle_ptr`
is `12345` (a stub!), not the real thread handle. The actual thread
handle lives on the `XThread` object.
`XamTaskCloseHandle_entry` calls `xboxkrnl::NtClose(obj_handle)`. Even
when `obj_handle=12345` (stub), `NtClose` of an invalid handle returns
`X_STATUS_INVALID_HANDLE` and the function returns false. But our test
data shows it returns 1 (success) on both engines, indicating the
SHIM-VS-GAME handle plumbing produces a valid handle in practice on
the main chain. (Possibly the game passes the actual thread handle.)
The crucial behavior is: after `NtClose`, canary's refcount went 2→1,
so no `handle.destroy` event. Ours's refcount went 1→0, emitting the
extra `handle.destroy`. **Hypothesis confirmed.**
## Ours's pre-fix state
- `alloc_handle_for``handle_refcount.insert(h, 1)`.
- `ex_create_thread` / `xam_task_schedule` after spawn → no retain.
- `nt_close` / `xam_task_close_handle` → decrement, destroy on 0.
- `ex_terminate_thread` → marks scheduler Exited, wakes joiners, does
NOT release the (missing) self-ref.
- Main thread (`install_initial_thread`) — refcount=1, never closed.
So ours's spawned threads had `handle_refcount = 1` (creator only). Any
guest `NtClose` on a thread handle destroyed it.
## Fix design
Mirror canary precisely:
1. After successful spawn in `ex_create_thread` + `xam_task_schedule`:
call `state.retain_handle(handle)` (refcount 1 → 2).
2. In `ex_terminate_thread` (explicit `ExTerminateThread`) and in the
main-loop LR-sentinel implicit-exit path (`main.rs`): call
`state.release_handle(handle)` after the scheduler `exit_current`
bookkeeping.
3. Main thread (`install_initial_thread`): symmetric retain (canary's
main also goes through `Create()::RetainHandle()`). Released at the
LR-sentinel path on main thread shutdown.
New helpers in `state.rs`:
- `KernelState::retain_handle(handle) -> u32` — saturating increment;
returns new refcount.
- `KernelState::release_handle(handle) -> bool` — saturating decrement;
on hitting zero: removes object, scrubs async_file_handles +
disarm_timer, emits `handle.destroy`, returns true. False if other
refs remain.
The implicit-exit path in `main.rs` also gained the missing
`thread.exit` schema event (previously only `ex_terminate_thread`
emitted it; canary's `XThread::Exit` covers both explicit and implicit
paths, so this is a symmetry fix even though it didn't cause the C+16
divergence directly).
## Code summary
~75 LOC additive across 4 files; pure additive, no refactor:
- `crates/xenia-kernel/src/state.rs``retain_handle` + `release_handle`
helpers. +50 LOC.
- `crates/xenia-kernel/src/exports.rs` — retain in `ex_create_thread`,
release in `ex_terminate_thread`. +20 LOC.
- `crates/xenia-kernel/src/xam.rs` — retain in `xam_task_schedule`.
+10 LOC.
- `crates/xenia-app/src/main.rs` — implicit-exit path: emit `thread.exit`,
release self-ref; `install_initial_thread` post-call retain. +20 LOC.
Tests: +5 (181 → 186 total).
- `xam_task_schedule_close_then_thread_exit_destroys_handle`
refcount lifecycle balance (close-first).
- `xam_task_thread_exit_then_close_destroys_handle`
refcount lifecycle balance (exit-first).
- `xam_task_schedule_then_close_round_trip_returns_one` — extended
with refcount asserts (post-spawn=2, post-close=1).
- `ex_create_thread_installs_self_reference` — verifies refcount=2
after spawn.
- `ex_terminate_thread_releases_self_reference` — verifies refcount=1
after terminate.
- `ex_create_then_close_then_exit_balances_refcount` — end-to-end
three-step lifecycle.