Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
10 KiB
Phase C+19 investigation — NtDuplicateObject handle.create (2026-05-14)
Verified canary semantics (reading-error #28 discipline)
NtDuplicateObject_entry — xboxkrnl_ob.cc:389-412
X_HANDLE new_handle = X_INVALID_HANDLE_VALUE;
X_STATUS result = kernel_state()->object_table()->DuplicateHandle(handle, &new_handle);
if (new_handle_ptr) { *new_handle_ptr = new_handle; }
if (options == 1 /* DUPLICATE_CLOSE_SOURCE */) {
kernel_state()->object_table()->RemoveHandle(handle);
}
return result;
ObjectTable::DuplicateHandle — object_table.cc:210-223
X_STATUS ObjectTable::DuplicateHandle(X_HANDLE handle, X_HANDLE* out_handle) {
handle = TranslateHandle(handle);
XObject* object = LookupObject(handle, false); // refcount +1
if (object) {
result = AddHandle(object, out_handle); // alloc fresh slot, refcount +1, EMIT handle.create
object->Release(); // refcount -1 (offset LookupObject)
}
return result;
}
ObjectTable::AddHandle — object_table.cc:148-208
- Finds a fresh slot via
FindFreeSlot. - Stores
entry.object = object; entry.handle_ref_count = 1; - Bumps
handle = (slot << 2) + kHandleBase(or+ kHandleHostBase). object->handles().push_back(handle).object->Retain().- Emits
handle.createviaphase_a::EmitHandleCreateAuto(cvar-gated, default-off) using the new handle's tid + tid_event_idx for SID, NOT short-circuited because we're not insideGetNativeObject.
Net effect (source: S, dup: D, underlying XObject: O)
Before dup: O.refcount = 1, slots = {S → O}, handle.create(S) emitted earlier.
After dup:
O.refcount = 2(one for each slot).entry[S].handle_ref_count = 1.entry[D].handle_ref_count = 1.- handle.create(D) emitted at this dup.
Subsequent NtClose on either S or D:
- ReleaseHandle →
entry.handle_ref_count--. If 0 →RemoveHandle→entry.object = nullptr+object->Release()→ ifO.refcount == 0→ object dtor (emithandle.destroy).
So:
NtClose(S)after dup:entry[S].handle_ref_count: 1→0→RemoveHandle(S)→O.refcount: 2→1. Object STILL ALIVE through D. NO handle.destroy.
Wait — re-reading object_table.cc:294-295: phase_a::EmitHandleDestroyAuto(handle, ...) is emitted from inside RemoveHandle, which fires whenever a slot's ref_count hits 0. So canary emits handle.destroy on EVERY NtClose of EVERY slot, regardless of whether the underlying object still has other slots.
That means: canary emits handle.create(D) AND on close emits handle.destroy(D), then later handle.destroy(S). Two handle.create events / two handle.destroy events across the dup pair. Symmetric.
Ours's current behavior — exports.rs:5210-5240
fn nt_duplicate_object(...) {
let source = resolve_pseudo_handle(state, ctx.gpr[3] as u32);
if !state.objects.contains_key(&source) { return STATUS_INVALID_HANDLE; }
if out_ptr != 0 { mem.write_u32(out_ptr, source); } // dup_id = source_id
if options & DUPLICATE_CLOSE_SOURCE == 0 {
if let Some(c) = state.handle_refcount.get_mut(&source) { *c += 1; }
}
ctx.gpr[3] = STATUS_SUCCESS;
}
dup_id is aliased to source_id. Bumps state.handle_refcount[source] so the later NtClose pair (one per logical reference) doesn't destroy mid-flight. No handle.create event because no new id was allocated.
Subsequent nt_close(handle) decrements handle_refcount[handle], emits handle.destroy only when it reaches 0.
Phase A divergence
At main idx=102553, canary's tid=6 sequence after NtDuplicateObject:
[102551] import.call NtDuplicateObject
[102552] kernel.call NtDuplicateObject
[102553] handle.create sid=df686b147b291902
[102554] kernel.return NtDuplicateObject
Ours's tid=1:
[102551] import.call NtDuplicateObject
[102552] kernel.call NtDuplicateObject
[102553] kernel.return NtDuplicateObject ← canary's [102554]
The visible delta is the missing handle.create between kernel.call and kernel.return.
AUDIT-062 risk assessment (CRITICAL)
What AUDIT-062 verified
ours DOES dup the wedge (kernel-aliasing hypothesis falsified): tid=13 cycle=26711 r3=0x000012ac r4=0x40541E80 (out_ptr). Per ours's exports.rs:4263, NtDup aliases — dup_id = source_id = 0x12AC, refcount++. NOT a kernel bug.
The original AUDIT-062 framing said "NtDup aliasing is correct because the
dup_id resolves to the same KernelObject in state.objects". The wedge bug
was downstream (producer-side NtSetEvent(worker_idle_event) never firing).
What is load-bearing about the aliasing
The wedge case in AUDIT-062 was:
- tid=13 creates event
0x12AC. - Some descendant calls
NtDuplicateObject(0x12AC, &dup)→ dup0x12AC(aliased). - tid=13 calls
KeWaitForSingleObject(0x12AC)(the source). - Worker thread (eventually) calls
NtSetEvent(dup)on0x12AC.
The load-bearing invariant is: signal on dup wakes wait on source. Why
this works today: both ids ARE the same id, so state.objects.get(&0x12AC)
finds the same KernelObject::Event with the same waiters Vec.
The trap
If we change nt_duplicate_object to allocate a fresh dup_id and store it
as a NEW state.objects entry (e.g. cloning the Event), then signal-on-dup
sets the CLONED event's signaled flag, NOT the source's. tid=13 waiting on
source will sleep forever. WEDGE REGRESSION.
The fix
Allocate a fresh dup_id, do NOT clone the object. Instead store a
handle alias dup_id → source_id in state.handle_aliases. Whenever
the guest passes dup_id to any Nt*/Ke* call, resolve through the alias to
get source_id. Lookup state.objects[source_id]. The single underlying
KernelObject::Event retains the unified waiters list and signaled
flag. Signal-on-dup still wakes wait-on-source because both ids
canonicalize to the same source.
This mirrors canary's LookupObject which always indexes by slot, but the
underlying XObject* is shared. We achieve the same with the alias map.
Refcount lifecycle
- Source close after dup: alias entry
dup_id → source_idstays; underlying object stays alive becausehandle_refcount[source_id]was bumped innt_duplicate_object. Nohandle.destroyemit (refcount > 0 after decrement).
Actually — to match canary's per-slot handle.destroy emission, we need each NtClose on EITHER source or dup to emit handle.destroy (with the closed slot's SID), and we only drop the underlying object when ALL slots are gone.
Cleanest design: track per-handle-id refcount separately:
handle_refcount[source_id]: counts the source slot's references.handle_refcount[dup_id]: counts the dup slot's references.
Both start at 1 (fresh allocation, fresh dup).
nt_close(source_id): decrement handle_refcount[source_id]. If 0, emit
handle.destroy(source_id), remove the alias entries pointing AT source_id
if applicable, and decrement underlying-object refcount.
Actually that's complex. Let me simplify: mirror canary's two-level refcount exactly via a new struct.
Simplest model that preserves AUDIT-062 + emits handle.create
state.handle_aliases: HashMap<u32, u32>(alias_id → canonical_id).state.handle_refcount[id]continues to mean: how manyNtClosecalls are needed on THIS id before its slot goes away.nt_duplicate_object:- Compute
canonical = resolve_alias(source)(in case source itself is an alias). - Alloc
dup_idviastate.alloc_handle(). - Insert alias
dup_id → canonical. handle_refcount.insert(dup_id, 1).- Emit
handle.create(dup_id, object_type)usingstate.objects[canonical].schema_object_type(). - If
options & DUPLICATE_CLOSE_SOURCE, treat as aNtClose(source)after.
- Compute
nt_close(handle):- Decrement
handle_refcount[handle]as today. - If reaches 0: emit
handle.destroy(handle). Remove the alias entry forhandle(if it's an alias). If there are NO MORE alias slots pointing to canonical, ANDhandle == canonical, removestate.objects[canonical]. - To know "any more slots pointing to canonical", maintain
canonical_refcount: HashMap<u32, u32>= number of live handle slots bound to canonical. Bumped at alloc/dup, decremented at close-with-rc-0.
- Decrement
state.resolve_handle(h): returnshandle_aliases.get(&h).copied().unwrap_or(h).- Every Nt*/Ke* handler that looks up
state.objectsvia a guest-provided handle id must callstate.resolve_handle(h)first.
Coverage of state.objects lookups
resolve_pseudo_handle (18 call sites in exports.rs) will be extended to
chain through state.resolve_handle. Direct ctx.gpr[3] as u32 → state.objects.get
sites need explicit resolution. Survey identified the following direct sites
that need state.resolve_handle insertion:
- nt_read_file (1630):
let handle = ctx.gpr[3] as u32; - nt_write_file (similar)
- nt_set_event (4628)
- nt_clear_event (4651)
- nt_query_information_file, nt_set_information_file, nt_query_directory_file, nt_query_volume_information_file, nt_flush_buffers_file (file operations)
- nt_create_io_completion (and friends)
Will sweep these in the patch.
Tests to add
nt_duplicate_object_allocates_fresh_handle_id: dup != source.nt_duplicate_object_emits_handle_create_event: cvar-on, both handle.create events present.nt_duplicate_object_alias_resolves_to_canonical:state.resolve_handle(dup) == source.nt_duplicate_object_signal_on_dup_wakes_wait_on_source(AUDIT-062 regression test): create event, dup, simulate NtSetEvent(dup), confirmstate.objects[source].signaled == true.nt_duplicate_object_signal_on_source_wakes_wait_on_dup(reverse symmetry).nt_duplicate_object_then_close_dup_keeps_source_live: refcount and object presence after dup-close.nt_duplicate_object_then_close_source_keeps_dup_live: reverse.nt_duplicate_object_close_source_then_close_dup_destroys_object: final close destroys underlying.nt_duplicate_object_with_close_source_flag: dup + close source in one call.nt_duplicate_object_invalid_handle_returns_status_invalid_handle.nt_duplicate_object_writes_handle_id_to_out_ptr.
Plan
- Implement
state.handle_aliases+state.canonical_refcount+resolve_handle. - Rewrite
nt_duplicate_objectper Section "Simplest model". - Adjust
nt_closeandresolve_pseudo_handle. - Sweep direct
state.objects.getsites: insertstate.resolve_handle(). - Add 11 unit tests.
- Build + test.
- Cold-vs-cold rebaseline.