Files
xenia-rs/audit-runs/stage2-tier1-sweep/deferred.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

6.1 KiB
Raw Blame History

Stage 2 — Deferred work (2026-05-14)

Imports that Stage 2 intentionally did not address. Each entry records why and what would unblock it.

Suspect cases deferred

XMACreateContext (Tier-3 audio)

DIVERGENT: canary writes the allocated context VA to *context_out_ptr (r3), returns NTSTATUS via r5; ours writes the handle to gpr[3] without touching the OUT-ptr. Defer to a dedicated audio/XMA session — fix should land alongside XMAReleaseContext and audio host-pump work (AUDIT-032/048).

XamUserGetSigninState, XamUserGetXUID (Tier-3 XAM)

DIVERGENT: both rely on canary's XamState / user-profile table; ours returns hardcoded constants. Defer to a dedicated XAM session that threads the profile registry through KernelState.

KeQuerySystemTime (clock infra)

DIVERGENT: ours writes the static fake 132_500_000_000_000_000 constant; canary reads Clock::QueryGuestSystemTime() (a deterministic guest-tick-based clock) and also updates a KeTimestampBundle kernel struct.

Required infra (not in Stage 2 LOW scope):

  • KernelState::guest_filetime() deriving a u64 FILETIME from scheduler.current_tick() (or instruction counter).
  • Allocate KeTimestampBundle in guest memory at startup, update on every call.

Estimated ~60-100 LOC plus reading-error #23 risk: a live clock value flows directly into guest branch decisions (timer expiry, frame pacing). Phase A matched-prefix could regress.

Tier-1+2 LOW STUBs deferred

MmFreePhysicalMemory

Canary calls heap->Release(base_address). Ours uses a one-way bump allocator (KernelState::heap_alloc in state.rs:968) with no release semantics. Implementing free requires adding a free-list / reclaim algorithm — not a LOW change. The current stub_success match-by-equivalence: both engines effectively "leak" the physical allocation (canary's heap->Release is bookkeeping-only, not page reclaim, on the physical heaps within the matched-prefix horizon).

Recommendation: re-classify as MATCH-by-equivalence (similar to MmGetPhysicalAddress). Revisit when a Stage surfaces a divergence involving allocator state arithmetic.

ExRegisterTitleTerminateNotification

STUB: canary builds a callback list (push/pop). Ours stubs to SUCCESS. Implementing the callback list is ~30-40 LOC, but the fire site (game shutdown) is not yet implemented in ours either — so a registered callback would just sit unused. Defer until a shutdown sequence becomes relevant.

Estimated #23 risk: MED-HIGH. If game expects the registration to succeed but later fires the callback to gate shutdown, the asymmetry could surface as a divergence.

ObLookupThreadByThreadId

STUB: canary calls kernel_state()->GetThreadByID(tid) + RetainHandle

  • write thread->guest_object() to OUT-ptr. Ours has scheduler threads but no find_by_tid registry indexed by guest TID.

Required infra:

  • Add HashMap<u32 guest_tid, ThreadRef> to KernelState (or expose scheduler's find_by_tid).
  • Implement guest-object materialization (PCR base or similar).
  • Write OUT-ptr + STATUS_SUCCESS on hit, STATUS_INVALID_PARAMETER on miss.

Estimated 40-50 LOC. Defer to a session that touches the scheduler abstractions.

ObOpenObjectByPointer

STUB: canary does XObject::GetNativeObject<XObject>(kernel_state(), object_ptr) — a guest-VA-to-host-RTTI translation that ours doesn't have. Ours kernel objects (events, semaphores, threads) live in host Rust memory; canary embeds them in a guest-accessible object heap with RTTI dispatch.

Required infra: guest-side object heap + RTTI-equivalent dispatch. Architectural change, estimated 80-120 LOC plus refactoring. Defer to a dedicated Object Manager session.

STUB: canary maintains a VFS symbolic-link table (HDD:/MU:/Cache: aliases). Ours has VFS mounts but no symlink resolution layer in the kernel-state path. Defer to a VFS session.

Tier-3 subsystem STUBs (12 items)

All HIGH-effort STUBs per Stage 1 audit. Listed for completeness; each needs its own subsystem session:

  • GPU/Video (Vd)*: VdInitializeEngines, VdInitializeScalerCommandBuffer, VdQueryVideoFlags, VdSetDisplayMode, VdPersistDisplay, VdGetCurrentDisplayInformation, VdIsHSIOTrainingSucceeded, VdRetrainEDRAM. None on matched-prefix critical path at idx 102068.
  • Audio (XAudio/XMA)**: XAudioSubmitRenderDriverFrame, XMACreateContext (also a Stage 2 suspect, see above).
  • XAM (Xam/XMsg/XNotify*)**: XamEnumerate, XamContentCreateEnumerator, XamTaskCloseHandle, XMsgInProcessCall, XMsgStartIORequestEx, XNotifyPositionUI, XamResetInactivity, XamEnableInactivityProcessing, XGetGameRegion. Many dormant siblings (33 xam dormants per Stage 1 Appendix A) waiting for boot progress.

Memory model debt (Phase C+2)

Deferred 3-physical-heap memory model: canary's MmAllocatePhysicalMemoryEx routes through LookupHeapByType(physical, page_size) to one of three regions (vA0000000/vC0000000/vE0000000). Ours has a single user-heap bump cursor at 0x40000000. Phase C+2 canonicalized the return values in the diff tool (Path α); the engine fix (Path β) is estimated >100 LOC plus GPU/audio bridge re-validation.

Out of scope for Stage 2. Revisit when a divergence surfaces involving GPU bus arithmetic or audio DMA descriptors derived from the physical VA — at which point the canonicalization mask in diff_events.py would need extension OR the engine-side memory model would need a refactor.

False-alarm: RtlFillMemoryUlong

Stage 1 listed RtlFillMemoryUlong as MATCH-but-suspect with "endianness — canary byte-swaps the fill pattern". On Stage 2 inspection this is a false alarm: both engines write the same big-endian bytes to memory. Canary does *p = byte_swap(pattern) (host LE write of byte-swapped value); ours does mem.write_u32(addr, pattern) (uses val.to_be_bytes() internally). The output bytes are bit-identical. See phase-2-0-suspects.md §A.

No engine fix required. The Stage 1 suspicion stemmed from naive C++ vs Rust source comparison without tracing through to memory-byte semantics — a reading-error class to remember (§24, see Stage 2 memory entry).