Files
xenia-rs/audit-runs/phase-c5-NtWriteFile/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.5 KiB
Raw Blame History

Phase C+5 — investigation: NtWriteFile at idx=102068

Divergence

canary ours (pre-fix)
payload.return_value at idx=102068, tid=6→1 259 = 0x103 (STATUS_PENDING) 0 (STATUS_SUCCESS)
payload.status 0x00000103 0x00000000
Surrounding context (idx 102060..102067): RtlEnterCriticalSection/RtlLeaveCriticalSectionNtWriteFile
Game thread tid=6 main tid=1 main
Next 6 events (102069..102074) three more NtWriteFile calls, ALL return 0x103 in canary same calls, ours returns 0

Step 1 — Event context at idx=102068

Both engines emit identical call sequences leading to this point:

102016 NtCreateFile  (sync; both return 0)
102019 NtReadFile    (both return 0)
102022 NtClose
102025 NtCreateFile  (sync; both return 0)
102034 NtReadFile    (both return 0)
102037 NtWriteFile   (both return 0 - sync file)
102040 NtClose
102052 NtOpenFile    (both return 0)
102055 NtDeviceIoControlFile (GET_DRIVE_GEOMETRY)
102058 NtDeviceIoControlFile (GET_PARTITION_INFO)
102067 NtWriteFile   (canary returns 0x103, ours returns 0) ← divergence

The handle written at 102067 was opened by NtOpenFile at 102052. Per canary log audit-065-canary.log line for cache:\,...,00000003, the open_options passed by the game is 0x00000003 = FILE_DIRECTORY_FILE | FILE_WRITE_THROUGH. No FILE_SYNCHRONOUS_IO_* bit — file is async in canary.

Step 2 — Source-read both engines

Canary NtWriteFile_entry (xboxkrnl_io.cc:304-389)

// Write completes synchronously (the `if (true || ...)` short-circuit
// at line 327 always takes the sync path).
if (!file->is_synchronous()) {
  result = X_STATUS_PENDING;  // ← line 351-353
}

is_synchronous_ is the bool stored on XFile, derived at open time from create_options & (FILE_SYNCHRONOUS_IO_ALERT | FILE_SYNCHRONOUS_IO_NONALERT) (xboxkrnl_io.cc:94-97 inside NtCreateFile_entry). NtOpenFile_entry forwards open_options straight into NtCreateFile_entry's create_options slot (xboxkrnl_io.cc:118-122).

So canary's invariant: a file opened without bit 0x10 or 0x20 in its create_options is async, and NtWriteFile on it returns STATUS_PENDING after the synchronous write completes. The IO_STATUS_BLOCK still records STATUS_SUCCESS; only the function-return value flips.

Ours nt_write_file (exports.rs:1484-1554)

Pre-fix: returns STATUS_SUCCESS unconditionally after a successful write. The KernelObject::File enum does not track is_synchronous.

Ours nt_open_file (exports.rs:1317-1335)

Pre-fix: reads open_options from ctx.gpr[8] (= r8). This is the wrong register.

Canary's NtOpenFile_entry signature is

dword_result_t NtOpenFile_entry(
    lpdword_t handle_out,         // r3
    dword_t   desired_access,     // r4
    pointer_t<X_OBJECT_ATTRIBUTES> object_attributes,  // r5
    pointer_t<X_IO_STATUS_BLOCK>   io_status_block,    // r6
    dword_t   open_options);      // r7

5 args, so per Xenia's shim_utils::Param::LoadValue (util/shim_utils.h:158-167), the 5th dword arrives in r3 + (ordinal_) = r7.

Live capture (Phase C+5 debug log):

nt_open_file: r7=0x3 r8=0x800021   ← cache:\ probe
nt_open_file: r7=0x3 r8=0x800021
nt_open_file: r7=0x3 r8=0x4021
nt_open_file: r7=0x7 r8=0x4040
nt_open_file: r7=0x7 r8=0x4020

r7=0x3 matches canary's logged value exactly. Ours's r8=0x4021,0x4020,... are residuals from prior register usage that happen to have the FILE_DIRECTORY_FILE bit (0x01) set — which is why the AUDIT-053/054 hierarchical-create fix worked at all. But the 0x20 bit (FILE_SYNCHRONOUS_IO_NONALERT) was also frequently set in r8 residual data, making every NtOpenFile-derived file appear synchronous in ours.

Step 3 — Classification

Class (A) — Engine bug, two interlinked defects:

  1. Wrong-register bug in nt_open_file: reads open_options from r8 instead of r7. Confirmed by canary-side ground truth (r7=0x3 matches canary log cache:\,…,00000003).
  2. Missing async/sync tracking on KernelObject::File: even with correct open_options, ours had no machinery to remember the sync/async state and flip NtWriteFile returns.

Both defects must be fixed together to align Phase A's matched prefix past idx=102068. Fixing only #2 (without #1) leaves the file marked sync (because r8 has bit 0x20 from residual register usage), so NtWriteFile returns STATUS_SUCCESS and the divergence persists — which we observed in the first fix iteration (matched-prefix stayed at 102068 after a Path-A fix that relied on r8).

Why this is class (A) not (α) canonicalization

Examining events 102069..102074 in canary: three more NtWriteFile calls on the same handle, all returning STATUS_PENDING. Then at idx=102132 canary calls NtClose; ours diverges at 102132 by calling IoDismountVolumeByFileHandle instead. The game branches on the return value of these writes — without aligning the return values, ours's downstream code path stays divergent. Canonicalization would mask this, not fix it.

Tripstone #2 (reading-error #23) check

A "fix the upstream cause" change could in principle flip a CRT branch. Empirically, after the fix:

  • imports counter: 40452 → 40470 (game responded to the new return values by issuing additional kernel calls — expected for async-IO semantics).
  • main matched prefix: 102068 → 102132 (+64). No regression.
  • All sub-chains' matched prefixes unchanged.

Reading-error #23 risk DID NOT MATERIALIZE because the new return values match canary's, so the CRT branches identically downstream.