Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
5.5 KiB
Phase C+5 — investigation: NtWriteFile at idx=102068
Divergence
| canary | ours (pre-fix) | |
|---|---|---|
payload.return_value at idx=102068, tid=6→1 |
259 = 0x103 (STATUS_PENDING) |
0 (STATUS_SUCCESS) |
payload.status |
0x00000103 |
0x00000000 |
Surrounding context (idx 102060..102067): RtlEnterCriticalSection/RtlLeaveCriticalSection → NtWriteFile |
||
| Game thread | tid=6 main | tid=1 main |
| Next 6 events (102069..102074) | three more NtWriteFile calls, ALL return 0x103 in canary |
same calls, ours returns 0 |
Step 1 — Event context at idx=102068
Both engines emit identical call sequences leading to this point:
102016 NtCreateFile (sync; both return 0)
102019 NtReadFile (both return 0)
102022 NtClose
102025 NtCreateFile (sync; both return 0)
102034 NtReadFile (both return 0)
102037 NtWriteFile (both return 0 - sync file)
102040 NtClose
102052 NtOpenFile (both return 0)
102055 NtDeviceIoControlFile (GET_DRIVE_GEOMETRY)
102058 NtDeviceIoControlFile (GET_PARTITION_INFO)
102067 NtWriteFile (canary returns 0x103, ours returns 0) ← divergence
The handle written at 102067 was opened by NtOpenFile at 102052. Per
canary log audit-065-canary.log line for cache:\,...,00000003, the
open_options passed by the game is 0x00000003 =
FILE_DIRECTORY_FILE | FILE_WRITE_THROUGH. No FILE_SYNCHRONOUS_IO_*
bit — file is async in canary.
Step 2 — Source-read both engines
Canary NtWriteFile_entry (xboxkrnl_io.cc:304-389)
// Write completes synchronously (the `if (true || ...)` short-circuit
// at line 327 always takes the sync path).
if (!file->is_synchronous()) {
result = X_STATUS_PENDING; // ← line 351-353
}
is_synchronous_ is the bool stored on XFile, derived at open time
from create_options & (FILE_SYNCHRONOUS_IO_ALERT | FILE_SYNCHRONOUS_IO_NONALERT)
(xboxkrnl_io.cc:94-97 inside NtCreateFile_entry). NtOpenFile_entry
forwards open_options straight into NtCreateFile_entry's
create_options slot (xboxkrnl_io.cc:118-122).
So canary's invariant: a file opened without bit 0x10 or 0x20 in
its create_options is async, and NtWriteFile on it returns
STATUS_PENDING after the synchronous write completes. The IO_STATUS_BLOCK
still records STATUS_SUCCESS; only the function-return value flips.
Ours nt_write_file (exports.rs:1484-1554)
Pre-fix: returns STATUS_SUCCESS unconditionally after a successful
write. The KernelObject::File enum does not track is_synchronous.
Ours nt_open_file (exports.rs:1317-1335)
Pre-fix: reads open_options from ctx.gpr[8] (= r8). This is the
wrong register.
Canary's NtOpenFile_entry signature is
dword_result_t NtOpenFile_entry(
lpdword_t handle_out, // r3
dword_t desired_access, // r4
pointer_t<X_OBJECT_ATTRIBUTES> object_attributes, // r5
pointer_t<X_IO_STATUS_BLOCK> io_status_block, // r6
dword_t open_options); // r7
— 5 args, so per Xenia's shim_utils::Param::LoadValue
(util/shim_utils.h:158-167), the 5th dword arrives in r3 + (ordinal_) = r7.
Live capture (Phase C+5 debug log):
nt_open_file: r7=0x3 r8=0x800021 ← cache:\ probe
nt_open_file: r7=0x3 r8=0x800021
nt_open_file: r7=0x3 r8=0x4021
nt_open_file: r7=0x7 r8=0x4040
nt_open_file: r7=0x7 r8=0x4020
r7=0x3 matches canary's logged value exactly. Ours's
r8=0x4021,0x4020,... are residuals from prior register usage that
happen to have the FILE_DIRECTORY_FILE bit (0x01) set — which is why
the AUDIT-053/054 hierarchical-create fix worked at all. But the
0x20 bit (FILE_SYNCHRONOUS_IO_NONALERT) was also frequently set in
r8 residual data, making every NtOpenFile-derived file appear
synchronous in ours.
Step 3 — Classification
Class (A) — Engine bug, two interlinked defects:
- Wrong-register bug in
nt_open_file: readsopen_optionsfrom r8 instead of r7. Confirmed by canary-side ground truth (r7=0x3 matches canary logcache:\,…,00000003). - Missing async/sync tracking on
KernelObject::File: even with correctopen_options, ours had no machinery to remember the sync/async state and flipNtWriteFilereturns.
Both defects must be fixed together to align Phase A's matched prefix
past idx=102068. Fixing only #2 (without #1) leaves the file marked
sync (because r8 has bit 0x20 from residual register usage), so
NtWriteFile returns STATUS_SUCCESS and the divergence persists —
which we observed in the first fix iteration (matched-prefix stayed
at 102068 after a Path-A fix that relied on r8).
Why this is class (A) not (α) canonicalization
Examining events 102069..102074 in canary: three more NtWriteFile
calls on the same handle, all returning STATUS_PENDING. Then at
idx=102132 canary calls NtClose; ours diverges at 102132 by calling
IoDismountVolumeByFileHandle instead. The game branches on the
return value of these writes — without aligning the return values,
ours's downstream code path stays divergent. Canonicalization would
mask this, not fix it.
Tripstone #2 (reading-error #23) check
A "fix the upstream cause" change could in principle flip a CRT branch. Empirically, after the fix:
- imports counter: 40452 → 40470 (game responded to the new return values by issuing additional kernel calls — expected for async-IO semantics).
- main matched prefix: 102068 → 102132 (+64). No regression.
- All sub-chains' matched prefixes unchanged.
Reading-error #23 risk DID NOT MATERIALIZE because the new return values match canary's, so the CRT branches identically downstream.