handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
140
audit-runs/phase-c5-NtWriteFile/investigation.md
Normal file
140
audit-runs/phase-c5-NtWriteFile/investigation.md
Normal file
@@ -0,0 +1,140 @@
|
||||
# Phase C+5 — investigation: `NtWriteFile` at idx=102068
|
||||
|
||||
## Divergence
|
||||
|
||||
| | canary | ours (pre-fix) |
|
||||
|---|---|---|
|
||||
| `payload.return_value` at idx=102068, tid=6→1 | `259 = 0x103` (`STATUS_PENDING`) | `0` (`STATUS_SUCCESS`) |
|
||||
| `payload.status` | `0x00000103` | `0x00000000` |
|
||||
| Surrounding context (idx 102060..102067): `RtlEnterCriticalSection`/`RtlLeaveCriticalSection` → `NtWriteFile` | | |
|
||||
| Game thread | tid=6 main | tid=1 main |
|
||||
| Next 6 events (102069..102074) | three more `NtWriteFile` calls, ALL return `0x103` in canary | same calls, ours returns `0` |
|
||||
|
||||
## Step 1 — Event context at idx=102068
|
||||
|
||||
Both engines emit identical call sequences leading to this point:
|
||||
|
||||
```
|
||||
102016 NtCreateFile (sync; both return 0)
|
||||
102019 NtReadFile (both return 0)
|
||||
102022 NtClose
|
||||
102025 NtCreateFile (sync; both return 0)
|
||||
102034 NtReadFile (both return 0)
|
||||
102037 NtWriteFile (both return 0 - sync file)
|
||||
102040 NtClose
|
||||
102052 NtOpenFile (both return 0)
|
||||
102055 NtDeviceIoControlFile (GET_DRIVE_GEOMETRY)
|
||||
102058 NtDeviceIoControlFile (GET_PARTITION_INFO)
|
||||
102067 NtWriteFile (canary returns 0x103, ours returns 0) ← divergence
|
||||
```
|
||||
|
||||
The handle written at 102067 was opened by `NtOpenFile` at 102052. Per
|
||||
canary log `audit-065-canary.log` line for `cache:\,...,00000003`, the
|
||||
`open_options` passed by the game is `0x00000003` =
|
||||
`FILE_DIRECTORY_FILE | FILE_WRITE_THROUGH`. **No `FILE_SYNCHRONOUS_IO_*`
|
||||
bit** — file is async in canary.
|
||||
|
||||
## Step 2 — Source-read both engines
|
||||
|
||||
### Canary `NtWriteFile_entry` (xboxkrnl_io.cc:304-389)
|
||||
|
||||
```cpp
|
||||
// Write completes synchronously (the `if (true || ...)` short-circuit
|
||||
// at line 327 always takes the sync path).
|
||||
if (!file->is_synchronous()) {
|
||||
result = X_STATUS_PENDING; // ← line 351-353
|
||||
}
|
||||
```
|
||||
|
||||
`is_synchronous_` is the bool stored on `XFile`, derived at open time
|
||||
from `create_options & (FILE_SYNCHRONOUS_IO_ALERT | FILE_SYNCHRONOUS_IO_NONALERT)`
|
||||
(xboxkrnl_io.cc:94-97 inside `NtCreateFile_entry`). `NtOpenFile_entry`
|
||||
forwards `open_options` straight into `NtCreateFile_entry`'s
|
||||
`create_options` slot (xboxkrnl_io.cc:118-122).
|
||||
|
||||
So canary's invariant: a file opened **without** bit 0x10 or 0x20 in
|
||||
its `create_options` is async, and `NtWriteFile` on it returns
|
||||
`STATUS_PENDING` after the synchronous write completes. The IO_STATUS_BLOCK
|
||||
still records `STATUS_SUCCESS`; only the function-return value flips.
|
||||
|
||||
### Ours `nt_write_file` (exports.rs:1484-1554)
|
||||
|
||||
Pre-fix: returns `STATUS_SUCCESS` unconditionally after a successful
|
||||
write. The `KernelObject::File` enum does not track `is_synchronous`.
|
||||
|
||||
### Ours `nt_open_file` (exports.rs:1317-1335)
|
||||
|
||||
Pre-fix: reads `open_options` from `ctx.gpr[8]` (= r8). **This is the
|
||||
wrong register.**
|
||||
|
||||
Canary's `NtOpenFile_entry` signature is
|
||||
```cpp
|
||||
dword_result_t NtOpenFile_entry(
|
||||
lpdword_t handle_out, // r3
|
||||
dword_t desired_access, // r4
|
||||
pointer_t<X_OBJECT_ATTRIBUTES> object_attributes, // r5
|
||||
pointer_t<X_IO_STATUS_BLOCK> io_status_block, // r6
|
||||
dword_t open_options); // r7
|
||||
```
|
||||
|
||||
— **5 args**, so per Xenia's `shim_utils::Param::LoadValue`
|
||||
(util/shim_utils.h:158-167), the 5th dword arrives in `r3 + (ordinal_) = r7`.
|
||||
|
||||
Live capture (Phase C+5 debug log):
|
||||
|
||||
```
|
||||
nt_open_file: r7=0x3 r8=0x800021 ← cache:\ probe
|
||||
nt_open_file: r7=0x3 r8=0x800021
|
||||
nt_open_file: r7=0x3 r8=0x4021
|
||||
nt_open_file: r7=0x7 r8=0x4040
|
||||
nt_open_file: r7=0x7 r8=0x4020
|
||||
```
|
||||
|
||||
`r7=0x3` matches canary's logged value exactly. Ours's
|
||||
`r8=0x4021,0x4020,...` are residuals from prior register usage that
|
||||
happen to have the FILE_DIRECTORY_FILE bit (0x01) set — which is why
|
||||
the AUDIT-053/054 hierarchical-create fix worked at all. But the
|
||||
**0x20 bit (FILE_SYNCHRONOUS_IO_NONALERT)** was also frequently set in
|
||||
r8 residual data, making every NtOpenFile-derived file appear
|
||||
synchronous in ours.
|
||||
|
||||
## Step 3 — Classification
|
||||
|
||||
**Class (A) — Engine bug, two interlinked defects:**
|
||||
|
||||
1. **Wrong-register bug** in `nt_open_file`: reads `open_options` from
|
||||
r8 instead of r7. Confirmed by canary-side ground truth (r7=0x3
|
||||
matches canary log `cache:\,…,00000003`).
|
||||
2. **Missing async/sync tracking** on `KernelObject::File`: even with
|
||||
correct `open_options`, ours had no machinery to remember the
|
||||
sync/async state and flip `NtWriteFile` returns.
|
||||
|
||||
Both defects must be fixed together to align Phase A's matched prefix
|
||||
past idx=102068. Fixing only #2 (without #1) leaves the file marked
|
||||
sync (because r8 has bit 0x20 from residual register usage), so
|
||||
`NtWriteFile` returns `STATUS_SUCCESS` and the divergence persists —
|
||||
which we observed in the first fix iteration (matched-prefix stayed
|
||||
at 102068 after a Path-A fix that relied on r8).
|
||||
|
||||
### Why this is class (A) not (α) canonicalization
|
||||
|
||||
Examining events 102069..102074 in canary: three more `NtWriteFile`
|
||||
calls on the same handle, all returning `STATUS_PENDING`. Then at
|
||||
idx=102132 canary calls `NtClose`; ours diverges at 102132 by calling
|
||||
`IoDismountVolumeByFileHandle` instead. **The game branches on the
|
||||
return value of these writes** — without aligning the return values,
|
||||
ours's downstream code path stays divergent. Canonicalization would
|
||||
mask this, not fix it.
|
||||
|
||||
### Tripstone #2 (reading-error #23) check
|
||||
|
||||
A "fix the upstream cause" change could in principle flip a CRT
|
||||
branch. Empirically, after the fix:
|
||||
- imports counter: 40452 → 40470 (game responded to the new return
|
||||
values by issuing additional kernel calls — expected for async-IO
|
||||
semantics).
|
||||
- main matched prefix: 102068 → **102132 (+64)**. No regression.
|
||||
- All sub-chains' matched prefixes unchanged.
|
||||
|
||||
Reading-error #23 risk DID NOT MATERIALIZE because the new return
|
||||
values match canary's, so the CRT branches identically downstream.
|
||||
Reference in New Issue
Block a user