# Phase C+19 investigation — `NtDuplicateObject` handle.create (2026-05-14) ## Verified canary semantics (reading-error #28 discipline) ### `NtDuplicateObject_entry` — xboxkrnl_ob.cc:389-412 ```cpp X_HANDLE new_handle = X_INVALID_HANDLE_VALUE; X_STATUS result = kernel_state()->object_table()->DuplicateHandle(handle, &new_handle); if (new_handle_ptr) { *new_handle_ptr = new_handle; } if (options == 1 /* DUPLICATE_CLOSE_SOURCE */) { kernel_state()->object_table()->RemoveHandle(handle); } return result; ``` ### `ObjectTable::DuplicateHandle` — object_table.cc:210-223 ```cpp X_STATUS ObjectTable::DuplicateHandle(X_HANDLE handle, X_HANDLE* out_handle) { handle = TranslateHandle(handle); XObject* object = LookupObject(handle, false); // refcount +1 if (object) { result = AddHandle(object, out_handle); // alloc fresh slot, refcount +1, EMIT handle.create object->Release(); // refcount -1 (offset LookupObject) } return result; } ``` ### `ObjectTable::AddHandle` — object_table.cc:148-208 - Finds a fresh slot via `FindFreeSlot`. - Stores `entry.object = object; entry.handle_ref_count = 1;` - Bumps `handle = (slot << 2) + kHandleBase` (or `+ kHandleHostBase`). - `object->handles().push_back(handle)`. - `object->Retain()`. - **Emits `handle.create`** via `phase_a::EmitHandleCreateAuto` (cvar-gated, default-off) using the new handle's tid + tid_event_idx for SID, NOT short-circuited because we're not inside `GetNativeObject`. ### Net effect (source: S, dup: D, underlying XObject: O) Before dup: `O.refcount = 1`, slots = {S → O}, handle.create(S) emitted earlier. After dup: - `O.refcount = 2` (one for each slot). - `entry[S].handle_ref_count = 1`. - `entry[D].handle_ref_count = 1`. - handle.create(D) emitted at this dup. Subsequent NtClose on either S or D: - ReleaseHandle → `entry.handle_ref_count--`. If 0 → `RemoveHandle` → `entry.object = nullptr` + `object->Release()` → if `O.refcount == 0` → object dtor (emit `handle.destroy`). So: - `NtClose(S)` after dup: `entry[S].handle_ref_count: 1→0` → `RemoveHandle(S)` → `O.refcount: 2→1`. Object STILL ALIVE through D. NO handle.destroy. Wait — re-reading object_table.cc:294-295: `phase_a::EmitHandleDestroyAuto(handle, ...)` is emitted from inside `RemoveHandle`, which fires whenever a slot's ref_count hits 0. So canary emits handle.destroy on EVERY NtClose of EVERY slot, regardless of whether the underlying object still has other slots. That means: canary emits handle.create(D) AND on close emits handle.destroy(D), then later handle.destroy(S). Two handle.create events / two handle.destroy events across the dup pair. Symmetric. ## Ours's current behavior — exports.rs:5210-5240 ```rust fn nt_duplicate_object(...) { let source = resolve_pseudo_handle(state, ctx.gpr[3] as u32); if !state.objects.contains_key(&source) { return STATUS_INVALID_HANDLE; } if out_ptr != 0 { mem.write_u32(out_ptr, source); } // dup_id = source_id if options & DUPLICATE_CLOSE_SOURCE == 0 { if let Some(c) = state.handle_refcount.get_mut(&source) { *c += 1; } } ctx.gpr[3] = STATUS_SUCCESS; } ``` `dup_id` is aliased to `source_id`. Bumps `state.handle_refcount[source]` so the later `NtClose` pair (one per logical reference) doesn't destroy mid-flight. **No `handle.create` event** because no new id was allocated. Subsequent `nt_close(handle)` decrements `handle_refcount[handle]`, emits `handle.destroy` only when it reaches 0. ## Phase A divergence At main idx=102553, canary's tid=6 sequence after `NtDuplicateObject`: ``` [102551] import.call NtDuplicateObject [102552] kernel.call NtDuplicateObject [102553] handle.create sid=df686b147b291902 [102554] kernel.return NtDuplicateObject ``` Ours's tid=1: ``` [102551] import.call NtDuplicateObject [102552] kernel.call NtDuplicateObject [102553] kernel.return NtDuplicateObject ← canary's [102554] ``` The visible delta is the missing `handle.create` between `kernel.call` and `kernel.return`. ## AUDIT-062 risk assessment (CRITICAL) ### What AUDIT-062 verified > ours DOES dup the wedge (kernel-aliasing hypothesis falsified): > tid=13 cycle=26711 r3=0x000012ac r4=0x40541E80 (out_ptr). > Per ours's exports.rs:4263, NtDup aliases — dup_id = source_id = 0x12AC, > refcount++. NOT a kernel bug. The original AUDIT-062 framing said "NtDup aliasing is correct because the dup_id resolves to the same KernelObject in `state.objects`". The wedge bug was downstream (producer-side `NtSetEvent(worker_idle_event)` never firing). ### What is load-bearing about the aliasing The wedge case in AUDIT-062 was: 1. tid=13 creates event `0x12AC`. 2. Some descendant calls `NtDuplicateObject(0x12AC, &dup)` → dup `0x12AC` (aliased). 3. tid=13 calls `KeWaitForSingleObject(0x12AC)` (the source). 4. Worker thread (eventually) calls `NtSetEvent(dup)` on `0x12AC`. The load-bearing invariant is: **signal on dup wakes wait on source**. Why this works today: both ids ARE the same id, so `state.objects.get(&0x12AC)` finds the same `KernelObject::Event` with the same `waiters` Vec. ### The trap If we change `nt_duplicate_object` to allocate a fresh `dup_id` and store it as a NEW `state.objects` entry (e.g. cloning the Event), then signal-on-dup sets the CLONED event's `signaled` flag, NOT the source's. tid=13 waiting on source will sleep forever. **WEDGE REGRESSION.** ### The fix Allocate a fresh `dup_id`, do NOT clone the object. Instead store a **handle alias** `dup_id → source_id` in `state.handle_aliases`. Whenever the guest passes `dup_id` to any Nt*/Ke* call, resolve through the alias to get `source_id`. Lookup `state.objects[source_id]`. The single underlying `KernelObject::Event` retains the unified `waiters` list and `signaled` flag. **Signal-on-dup still wakes wait-on-source** because both ids canonicalize to the same source. This mirrors canary's `LookupObject` which always indexes by slot, but the underlying `XObject*` is shared. We achieve the same with the alias map. ### Refcount lifecycle - Source close after dup: alias entry `dup_id → source_id` stays; underlying object stays alive because `handle_refcount[source_id]` was bumped in `nt_duplicate_object`. No `handle.destroy` emit (refcount > 0 after decrement). Actually — to match canary's per-slot handle.destroy emission, we need each NtClose on EITHER source or dup to emit handle.destroy (with the closed slot's SID), and we only drop the underlying object when ALL slots are gone. Cleanest design: track per-handle-id refcount separately: - `handle_refcount[source_id]`: counts the source slot's references. - `handle_refcount[dup_id]`: counts the dup slot's references. Both start at 1 (fresh allocation, fresh dup). `nt_close(source_id)`: decrement `handle_refcount[source_id]`. If 0, emit `handle.destroy(source_id)`, remove the alias entries pointing AT source_id if applicable, and decrement underlying-object refcount. Actually that's complex. Let me simplify: mirror canary's two-level refcount exactly via a new struct. ### Simplest model that preserves AUDIT-062 + emits handle.create 1. `state.handle_aliases: HashMap` (alias_id → canonical_id). 2. `state.handle_refcount[id]` continues to mean: how many `NtClose` calls are needed on THIS id before its slot goes away. 3. `nt_duplicate_object`: - Compute `canonical = resolve_alias(source)` (in case source itself is an alias). - Alloc `dup_id` via `state.alloc_handle()`. - Insert alias `dup_id → canonical`. - `handle_refcount.insert(dup_id, 1)`. - Emit `handle.create(dup_id, object_type)` using `state.objects[canonical].schema_object_type()`. - If `options & DUPLICATE_CLOSE_SOURCE`, treat as a `NtClose(source)` after. 4. `nt_close(handle)`: - Decrement `handle_refcount[handle]` as today. - If reaches 0: emit `handle.destroy(handle)`. Remove the alias entry for `handle` (if it's an alias). If there are NO MORE alias slots pointing to canonical, AND `handle == canonical`, remove `state.objects[canonical]`. - To know "any more slots pointing to canonical", maintain `canonical_refcount: HashMap` = number of live handle slots bound to canonical. Bumped at alloc/dup, decremented at close-with-rc-0. 5. `state.resolve_handle(h)`: returns `handle_aliases.get(&h).copied().unwrap_or(h)`. 6. Every Nt*/Ke* handler that looks up `state.objects` via a guest-provided handle id must call `state.resolve_handle(h)` first. ### Coverage of state.objects lookups `resolve_pseudo_handle` (18 call sites in exports.rs) will be extended to chain through `state.resolve_handle`. Direct `ctx.gpr[3] as u32 → state.objects.get` sites need explicit resolution. Survey identified the following direct sites that need `state.resolve_handle` insertion: - nt_read_file (1630): `let handle = ctx.gpr[3] as u32;` - nt_write_file (similar) - nt_set_event (4628) - nt_clear_event (4651) - nt_query_information_file, nt_set_information_file, nt_query_directory_file, nt_query_volume_information_file, nt_flush_buffers_file (file operations) - nt_create_io_completion (and friends) Will sweep these in the patch. ### Tests to add 1. `nt_duplicate_object_allocates_fresh_handle_id`: dup != source. 2. `nt_duplicate_object_emits_handle_create_event`: cvar-on, both handle.create events present. 3. `nt_duplicate_object_alias_resolves_to_canonical`: `state.resolve_handle(dup) == source`. 4. `nt_duplicate_object_signal_on_dup_wakes_wait_on_source` (AUDIT-062 regression test): create event, dup, simulate NtSetEvent(dup), confirm `state.objects[source].signaled == true`. 5. `nt_duplicate_object_signal_on_source_wakes_wait_on_dup` (reverse symmetry). 6. `nt_duplicate_object_then_close_dup_keeps_source_live`: refcount and object presence after dup-close. 7. `nt_duplicate_object_then_close_source_keeps_dup_live`: reverse. 8. `nt_duplicate_object_close_source_then_close_dup_destroys_object`: final close destroys underlying. 9. `nt_duplicate_object_with_close_source_flag`: dup + close source in one call. 10. `nt_duplicate_object_invalid_handle_returns_status_invalid_handle`. 11. `nt_duplicate_object_writes_handle_id_to_out_ptr`. ## Plan 1. Implement `state.handle_aliases` + `state.canonical_refcount` + `resolve_handle`. 2. Rewrite `nt_duplicate_object` per Section "Simplest model". 3. Adjust `nt_close` and `resolve_pseudo_handle`. 4. Sweep direct `state.objects.get` sites: insert `state.resolve_handle()`. 5. Add 11 unit tests. 6. Build + test. 7. Cold-vs-cold rebaseline.