Files
xenia-rs/audit-runs/phase-c16-XamTaskCloseHandle-refcount/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

125 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase C+16 Investigation — XamTaskCloseHandle refcount (2026-05-14)
## Framing verification (reading-error #28 discipline)
C+15-α's catalog D-1 hypothesis was: "canary's spawned thread keeps an
additional ref on the thread handle (`object->Retain()` in `XThread::Create`
line 408 via `RetainHandle()`)". Verified against canary source.
### Canary's refcount model (xobject.cc + util/object_table.cc)
Two separate refcounts on `XObject`:
1. **`pointer_ref_count_`** — the C++ object pointer refcount.
- Bumped by `XObject::Retain()` / dropped by `XObject::Release()`.
- `AddHandle()` calls `object->Retain()` once when inserting into table.
2. **`handle_ref_count`** in `ObjectTableEntry` — the per-handle (per-slot)
refcount that determines when the object is removed from the object
table.
- Initialized to 1 in `AddHandle()` (object_table.cc:164).
- Bumped by `RetainHandle()` (object_table.cc:218-228 → `entry->handle_ref_count++`).
- Decremented by `ReleaseHandle()` (object_table.cc:230-249).
- On reaching 0, calls `RemoveHandle()` which emits `handle.destroy`
and releases the pointer ref (`object->Release()`).
### Canary's XThread lifecycle
- `XObject` ctor → `AddHandle(this)``handle_ref_count = 1`,
emits `handle.create`.
- `XThread::Create()` (xthread.cc:414) → `RetainHandle()`
`handle_ref_count = 2`. Comment: *"Always retain when starting - the
thread owns itself until exited."*
- User calls `NtClose(handle)``ReleaseHandle()``handle_ref_count = 1`.
Object SURVIVES; no `handle.destroy` emitted.
- Thread exits via `XThread::Exit()` (xthread.cc:524) → `ReleaseHandle()`
`handle_ref_count = 0``RemoveHandle()` → emits `handle.destroy`
+ drops pointer ref → object destroyed.
### Canary's XAM task lifecycle (xam/xam_task.cc:43-94)
`XamTaskSchedule_entry` creates an `XThread` (which adds it to the
object table) then calls `thread->Create()` (xthread.cc:315) which adds
the self-ref via `RetainHandle()`. The handle written to `handle_ptr`
is `12345` (a stub!), not the real thread handle. The actual thread
handle lives on the `XThread` object.
`XamTaskCloseHandle_entry` calls `xboxkrnl::NtClose(obj_handle)`. Even
when `obj_handle=12345` (stub), `NtClose` of an invalid handle returns
`X_STATUS_INVALID_HANDLE` and the function returns false. But our test
data shows it returns 1 (success) on both engines, indicating the
SHIM-VS-GAME handle plumbing produces a valid handle in practice on
the main chain. (Possibly the game passes the actual thread handle.)
The crucial behavior is: after `NtClose`, canary's refcount went 2→1,
so no `handle.destroy` event. Ours's refcount went 1→0, emitting the
extra `handle.destroy`. **Hypothesis confirmed.**
## Ours's pre-fix state
- `alloc_handle_for``handle_refcount.insert(h, 1)`.
- `ex_create_thread` / `xam_task_schedule` after spawn → no retain.
- `nt_close` / `xam_task_close_handle` → decrement, destroy on 0.
- `ex_terminate_thread` → marks scheduler Exited, wakes joiners, does
NOT release the (missing) self-ref.
- Main thread (`install_initial_thread`) — refcount=1, never closed.
So ours's spawned threads had `handle_refcount = 1` (creator only). Any
guest `NtClose` on a thread handle destroyed it.
## Fix design
Mirror canary precisely:
1. After successful spawn in `ex_create_thread` + `xam_task_schedule`:
call `state.retain_handle(handle)` (refcount 1 → 2).
2. In `ex_terminate_thread` (explicit `ExTerminateThread`) and in the
main-loop LR-sentinel implicit-exit path (`main.rs`): call
`state.release_handle(handle)` after the scheduler `exit_current`
bookkeeping.
3. Main thread (`install_initial_thread`): symmetric retain (canary's
main also goes through `Create()::RetainHandle()`). Released at the
LR-sentinel path on main thread shutdown.
New helpers in `state.rs`:
- `KernelState::retain_handle(handle) -> u32` — saturating increment;
returns new refcount.
- `KernelState::release_handle(handle) -> bool` — saturating decrement;
on hitting zero: removes object, scrubs async_file_handles +
disarm_timer, emits `handle.destroy`, returns true. False if other
refs remain.
The implicit-exit path in `main.rs` also gained the missing
`thread.exit` schema event (previously only `ex_terminate_thread`
emitted it; canary's `XThread::Exit` covers both explicit and implicit
paths, so this is a symmetry fix even though it didn't cause the C+16
divergence directly).
## Code summary
~75 LOC additive across 4 files; pure additive, no refactor:
- `crates/xenia-kernel/src/state.rs``retain_handle` + `release_handle`
helpers. +50 LOC.
- `crates/xenia-kernel/src/exports.rs` — retain in `ex_create_thread`,
release in `ex_terminate_thread`. +20 LOC.
- `crates/xenia-kernel/src/xam.rs` — retain in `xam_task_schedule`.
+10 LOC.
- `crates/xenia-app/src/main.rs` — implicit-exit path: emit `thread.exit`,
release self-ref; `install_initial_thread` post-call retain. +20 LOC.
Tests: +5 (181 → 186 total).
- `xam_task_schedule_close_then_thread_exit_destroys_handle`
refcount lifecycle balance (close-first).
- `xam_task_thread_exit_then_close_destroys_handle`
refcount lifecycle balance (exit-first).
- `xam_task_schedule_then_close_round_trip_returns_one` — extended
with refcount asserts (post-spawn=2, post-close=1).
- `ex_create_thread_installs_self_reference` — verifies refcount=2
after spawn.
- `ex_terminate_thread_releases_self_reference` — verifies refcount=1
after terminate.
- `ex_create_then_close_then_exit_balances_refcount` — end-to-end
three-step lifecycle.