Files
xenia-rs/audit-runs/phase-c6-call-name-divergence/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

7.2 KiB
Raw Blame History

Phase C+6 — investigation: call-name divergence at idx=102132

Divergence

canary ours (pre-fix)
import.call at idx=102132, tid=6→1 NtClose (ord 207) IoDismountVolumeByFileHandle (ord 60)

Phase 0 — Rule out ord→name lookup bug

Both engines map ord 0x3C (60) to IoDismountVolumeByFileHandle and ord 0xCF (207) to NtClose. Cross-check:

  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_table.inc:74 XE_EXPORT(xboxkrnl, 0x0000003C, IoDismountVolumeByFileHandle, kFunction)
  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_table.inc:221 XE_EXPORT(xboxkrnl, 0x000000CF, NtClose, kFunction)
  • xenia-rs/crates/xenia-kernel/src/exports.rs:31 register_export(Xboxkrnl, 0x3C, "IoDismountVolumeByFileHandle", stub_success)
  • xenia-rs/crates/xenia-kernel/src/exports.rs:91 register_export(Xboxkrnl, 0xCF, "NtClose", nt_close)

Ord→name mapping is byte-identical. Phase 0 rules out emitter name-lookup bug. This is a real event-stream divergence.

Phase 1 — Capture context around idx=102132 in both streams

audit-runs/phase-c5-NtWriteFile/ours.jsonl lines 102134..102137 (idx 102132..102135) and phase-c-first-divergence/phase-a/canary.jsonl matching tid=6 events:

ours[102132] import.call IoDismountVolumeByFileHandle (ord 60)
ours[102133] kernel.call IoDismountVolumeByFileHandle
ours[102134] kernel.return IoDismountVolumeByFileHandle returns 0
ours[102135] import.call NtClose (ord 207)
ours[102136] kernel.call NtClose
ours[102137] kernel.return NtClose returns 0

canary[102132] import.call NtClose (ord 207)
canary[102133] kernel.call NtClose
canary[102134] kernel.return NtClose returns 0
canary[102135] import.call NtOpenFile (ord 223)

Observation: Ours's events 102135..102137 (NtClose triple) are bit-identical to canary's events 102132..102134. Ours has 3 EXTRA events (IoDismountVolumeByFileHandle triple) injected at idx=102132 that canary's stream does NOT contain. After ours's 3-event surplus, both streams realign: ours[102138] = canary[102135] = NtOpenFile.

So the game DOES call IoDismountVolumeByFileHandle in BOTH engines; the difference is purely whether the Phase A emitter fires.

Phase 2 — Source-read both emitter paths

Canary emit logic (path A: declared export)

xenia-canary/src/xenia/kernel/util/shim_utils.h:597-602:

const bool phase_a_on = phase_a_bridge::Enabled();
if (phase_a_on) {
  phase_a_bridge::EmitImportAndCall(
      phase_a_bridge::KernelModuleIdName(MODULE), ORDINAL,
      export_entry->name);
}

Inside Trampoline, only reachable when a DECLARE_XBOXKRNL_EXPORT shim wires export_entry->function_data.trampoline = &X::Trampoline.

Canary emit logic (path B: table-entry-only, no DECLARE)

xenia-canary/src/xenia/cpu/xex_module.cc:1310-1335 import-thunk generator: when kernel_export->function_data.trampoline == nullptr (no DECLARE shim), the thunk is rewritten to sc 2; blr — the syscall form. The "extern handler" wired in SetupExtern(handler=nullptr, ...) forwards the call to PPCFrontend::SyscallHandler (ppc_frontend.cc:83-92):

void SyscallHandler(PPCContext* ppc_context, void* arg0, void* arg1) {
  uint64_t syscall_number = ppc_context->r[0];
  switch (syscall_number) {
    default:
      assert_unhandled_case(syscall_number);
      XELOGE("Unhandled syscall {}!", syscall_number);
      break;
  }
}

No phase_a_bridge::EmitImportAndCall call. Canary emits NO Phase A events for table-entry-only exports. Verified by grepping xenia-canary for DECLARE_XBOXKRNL_EXPORT declarations — only 287 ords have implementations; IoDismountVolumeByFileHandle is NOT among them. (See /tmp/canary_decl.txt snapshot in investigation.md's work artifacts.)

Ours emit logic

xenia-rs/crates/xenia-kernel/src/state.rs:585-632 (pre-fix):

if let Some(&(name, func)) = self.exports.get(&(module, ordinal)) {
    ...
    let phase_a_on = crate::event_log::is_enabled();
    ...
    if phase_a_on {
        crate::event_log::emit_import_call(...);
        crate::event_log::emit_kernel_call(...);
    }
    func(&mut ctx, mem, self);
    if phase_a_on {
        let return_value = if is_void { 0 } else { ctx.gpr[3] };
        crate::event_log::emit_kernel_return(...);
    }
}

Ours emits import.call/kernel.call/kernel.return for EVERY registered ord, regardless of whether canary has a real shim or a syscall-thunk. IoDismountVolumeByFileHandle is registered as stub_success and therefore generates 3 spurious Phase A events.

Phase 3 — Classification

Class (E) Phase A emitter framing — Phase A coverage gap. Ours's emitter fires for stubs that canary leaves silent (canary uses the syscall thunk for table-entry-only exports, which does not reach Trampoline).

Not class (A) (no guest-code branch flip — events realign at +3), not class (α) (this is not canonicalization — it's an emitter asymmetry that we can fix at source), not class (D) (no deferred- item interaction — heap region / clock not involved).

Sister bugs surfaced (out-of-scope — documented for follow-up)

comm -23 <ours-xboxkrnl> <canary-DECLARE_XBOXKRNL_EXPORT> lists 12 xboxkrnl ords ours registers that canary doesn't have a shim for. Of those, the following actually fire in the current 50M run and would also drift Phase A alignment if their callers reached them:

  • IoDismountVolumeByFileHandle (ord 0x3C, called 1× tid=1 main — fixed in this session)
  • StfsCreateDevice (ord 0x259, called 1× tid=2 — drives tid=7→tid=2 divergence at idx=15; out of scope per session-scope rule)

The other 10 (DbgPrint, RtlCaptureContext, RtlUnwind, sprintf, _vsnprintf, __C_specific_handler, XeKeysConsoleSignatureVerification, StfsControlDevice) are not yet called in the 50M run; they will become relevant only after the game progresses further.

Sister bug (different class): ord 0x82 is KeQueryInterruptTime in canary but ours mis-labels it KeQueryIdealProcessor; ord 0x98 is KeSetBackgroundProcessors in canary but ours mis-labels it KeSetIdealProcessor. These are name-lookup bugs (Stage 2 cleanup class) and are NOT addressed here; would require renaming the function-pointer fn and either dropping the existing semantics or moving them to a different ord.

Reading-error class #26 (additive)

Phase A emitter coverage asymmetry across the "table-entry-only" vs "shim-implemented" axis. Canary's emitter fires only from Trampoline (wired by DECLARE_XBOXKRNL_EXPORT). Ours's emitter fires for every register_export regardless of canary equivalence. When a guest import is in canary's table but has no DECLARE shim, canary routes it through a no-op syscall thunk with no Phase A emission — ours, by registering a stub_success, injects 3 spurious events per call.

Discipline addition: when registering a kernel export as stub_success/stub_return_zero etc., grep canary for DECLARE_XBOXKRNL_EXPORT(<name> first. If absent, use register_unimplemented_export instead of register_export so the Phase A emitter stays silent (matching canary). Reading-error #21 (C+1's gpr[3]-as-return-value for void exports) and #25 (C+5's wrong-register read) were both kernel-export-emitter bugs; this is the third in that family.