Files
xenia-rs/audit-runs/phase-c17-keWait-native-object/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

9.4 KiB
Raw Blame History

Phase C+17 Investigation — KeWait native-object handle synthesis (2026-05-14)

Framing verification (reading-error #28 discipline)

C+15-α / C+16 catalog D-2/D-3/D-4 hypothesis: ours's KeWait* doesn't emit handle.create when passed a raw native dispatcher object pointer (PKEVENT / PKSEMAPHORE), while canary's xeKeWaitForSingleObject / KeWaitForMultipleObjects_entry call XObject::GetNativeObject which lazy-synthesizes an XEvent/XSemaphore/XMutant/XTimer wrapper and inserts it in the object table — ObjectTable::AddHandle fires phase_a::EmitHandleCreateAuto (object_table.cc:191-198).

Canary's GetNativeObject semantics (xobject.cc:397-483)

Triggered by: KeWait* (and family) is called with a raw kernel-object pointer. The first action of xeKeWaitForSingleObject is to call XObject::GetNativeObject<XObject>(kernel_state, object_ptr) (threading.cc:972, threading.cc:1070).

GetNativeObject(kernel_state, native_ptr, as_type=-1, already_locked=false):

  1. Read X_DISPATCH_HEADER at native_ptr. as_type defaults to header->type (the dispatcher-type byte: 0=manual event, 1=auto event, 2=mutant, 5=semaphore, 8/9=timer).
  2. Check the wait_list.flink_ptr magic: if it equals kXObjSignature ('X','E','N','\0' = 0x58454E00) the dispatcher has already been adopted; read the existing handle from wait_list.blink_ptr and return the existing XObject via LookupObject<XObject>(handle, true).
  3. Otherwise FIRST USE — synthesize:
    • case 0 / 1: new XEvent(kernel_state) → calls XEvent::InitializeNative(native_ptr, header) then assigns to result.
    • case 2: new XMutant + InitializeNative (but body asserts — unsupported).
    • case 5: new XSemaphore + InitializeNative (semaphore->limit / signal_state).
    • case 3/4/6/7/8/9/18..24: assert_always(). Timer not handled here.
  4. After construction, call StashHandle(header, object->handle()) — writes kXObjSignature to wait_list.flink_ptr and the new handle to wait_list.blink_ptr. This guarantees idempotency: next call returns the same handle.

Crucially, the XObject ctor XObject(KernelState*, Type, host_object) (xobject.cc:35-48) always calls kernel_state->object_table()->AddHandle(this, nullptr), which (C+15-α-wired) emits handle.create via phase_a::EmitHandleCreateAuto (object_table.cc:148-201).

So: first call → 1× handle.create emit; subsequent calls (signature matches) → 0 emits.

Canary KeWaitForSingleObject entry ordering (threading.cc:969-1013)

xeKeWaitForSingleObject(object_ptr, ...):
  auto object = XObject::GetNativeObject<XObject>(kernel_state(), object_ptr);
                ^^^ emits handle.create on first use (object_type=1 / 3 / etc)
  if (!object) { return X_STATUS_ABANDONED_WAIT_0; }
  if (phase_a::IsEnabled()) {
    uint64_t sid = 0;
    if (!object->handles().empty()) {
      sid = phase_a::LookupHandleSemanticId(object->handles()[0]);
    }
    phase_a::EmitWaitBegin(&sid, 1, ...);   // wait.begin with real SID
  }
  result = object->Wait(...);

So canary's emit order on first use is: handle.createwait.begin, exactly as observed on the cold log (idx=102171 → 102172).

Lifetime / refcount

The synthesized XObject lives until its handle_ref_count reaches 0. Since AddHandle initializes it to 1, and there's no balancing RemoveHandle elsewhere in the lazy-wrap path, the wrapper survives for the rest of the session (no handle.destroy is emitted by canary either — confirmed by absence in canary's log post-102171). This is structurally consistent with canary's "stash the handle in the dispatcher; reuse forever" pattern.

For ours we mirror this: emit one handle.create on first ensure_dispatcher_object adoption; no handle.destroy thereafter.

Object-type mapping

dispatcher header.type canary symbol ours KernelObject variant ours object_type code (event_log)
0 (manual event) XEvent (notification) Event { manual_reset=true } EVENT = 1
1 (auto event) XEvent (synchronization) Event { manual_reset=false } EVENT = 1
5 (semaphore) XSemaphore Semaphore { .. } SEMAPHORE = 3
8 (notif timer) XTimer (canary asserts) Timer { manual_reset=true } TIMER = 4
9 (sync timer) XTimer (canary asserts) Timer { manual_reset=false } TIMER = 4
2 (mutant) XMutant (canary asserts) (no shadow — return early) n/a

Note canary's GetNativeObject assert_always()s for timer types 8/9 — it panics on unsupported dispatcher types. Sylpheed apparently never hits these in canary (canary keeps running, so the assert is never tripped in our cold log). Ours's ensure_dispatcher_object historically supports timer/8/9 via the shadow path; we keep that for ours's robustness and emit object_type=TIMER for them. Cross-engine SID matching only matters for codes both engines emit; ours's extra timer emits would surface as new divergences (acceptable per the catalog).

Ours's pre-fix behavior

  • resolve_pseudo_handle (exports.rs:4321): only translates the magic 0xFFFF_FFFF / 0xFFFF_FFFE self-handle. For any other value it's a pass-through. Native dispatcher pointers and real handles both reach the next step unchanged.
  • ensure_dispatcher_object (exports.rs:4363): on first encounter of a guest pointer (ptr >= 0x1_0000 and not already in state.objects), reads the dispatcher header, creates the shadow KernelObject::{Event, Semaphore, Timer}, inserts into state.objects, stamps kXObjSignature at +0x08/+0x0C. Does NOT emit handle.create. Does NOT bump handle_refcount (entry stays absent).
  • ke_wait_for_single_object (exports.rs:4954): calls resolve_pseudo_handleensure_dispatcher_objectrefresh_pkevent_shadow_from_guest → emits wait.begin with lookup_handle_semantic_id(handle) = 0 (since no SID was ever registered) → calls do_wait_single.

Result observed at idx=102171: ours emits wait.begin handles_semantic_ids=['0000000000000000'] and zero handle.create events.

Fix shape

Symmetric: extend ensure_dispatcher_object to do the equivalent of canary's XObject::AddHandle post-construction emit. Specifically:

  1. After inserting the shadow into state.objects (existing line ~4409), and when this is a fresh adoption (the inserted-before check is the guard at line 4367), seed handle_refcount.insert(ptr, 1) for lifecycle symmetry (no canary-side handle.destroy is expected, but consistency with alloc_handle_for is worth ~1 LOC).
  2. When event_log::is_enabled(), call event_log::emit_handle_create_auto(tid, cycle, /* pc */ 0, object_type, raw_handle_id=ptr, object_name=None). The chosen object_type matches the variant: Event=1, Semaphore=3, Timer=4. This both emits the event AND registers the SID in the registry so the subsequent wait.begin resolves non-zero.

Order in ke_wait_for_single_object already matches canary: synth (now emits handle.create) before wait.begin. No re-ordering needed.

For ke_wait_for_multiple_objects the same applies — the loop already calls ensure_dispatcher_object per pointer (exports.rs:5022). Each first adoption emits one handle.create and the SID array used by wait.begin becomes non-zero per element.

Idempotency / refcount lifecycle

  • First-touch: shadow inserted + handle_refcount[ptr] = 1 + emit handle.create.
  • Re-touch (same pointer): early return at the contains_key guard → no emit, no refcount change. Matches canary's "already-initialized" branch.
  • Destroy: there is no path that destroys these shadows in ours today (parity with canary). If someone later wires handle.destroy on shadow-removal, the refcount will be present and decrement-to-zero will fire the symmetric event. Not in scope here.

Scope

C+17 strictly addresses D-2/D-3/D-4. We do not touch:

  • NtWait* (handle-based; already SID-resolves through the registry once the underlying Nt*Create* emit fires handle.create).
  • Ke{Set,Reset,Pulse}Event / KeReleaseSemaphore paths that also call ensure_dispatcher_object. These will now emit handle.create on their first-touch — that's EXPECTED engine-symmetric behavior, and matches canary (every entry into GetNativeObject may emit). The wait-side has pre-context emits in both engines, so observable order is preserved.

Tripstone register

  • Reading-error #28 (canary semantics first): VERIFIED.
  • Reading-error #23 (widely-used primitive flip): MITIGATED via cold-vs-cold gate and HARD-REVERT-IF-MAIN-REGRESSES discipline.
  • Reading-error #19 (host-side emits): event_log::is_enabled() guard preserved on every new emit — default-off zero cost.
  • Refcount semantics: matches canary's "stash forever" lazy-wrap pattern; not symmetric with alloc_handle_for's NtClose-balanced lifecycle (which is correct — these are different kinds of handles).

Cascade prediction (for the run)

A=verify canary's GetNativeObject semantics: DONE. B=land symmetric ~30-50 LOC fix: PENDING. C=main matched-prefix > 102,171: ~75%. D=sister chains advance (4 chains): ~75%. E=NEW divergences surface (downstream): ~80% (intended).