handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
214
audit-runs/phase-c3-RtlImageXexHeaderField/investigation.md
Normal file
214
audit-runs/phase-c3-RtlImageXexHeaderField/investigation.md
Normal file
@@ -0,0 +1,214 @@
|
||||
# Phase C+3 — investigation: `RtlImageXexHeaderField` at idx=102014
|
||||
|
||||
## Divergence
|
||||
|
||||
| | canary | ours (pre-fix) |
|
||||
|---|---|---|
|
||||
| `payload.return_value` (idx=102014, tid=6→1) | `805433576` = `0x3001F0E8` | `0` |
|
||||
| `payload.status` | `0x3001f0e8` | `0x00000000` |
|
||||
| Surrounding context (idx 102009..102013): `RtlLeaveCriticalSection` → `RtlImageXexHeaderField`. ||
|
||||
| Game thread | tid=6 main | tid=1 main |
|
||||
| Next event (idx=102015) | `NtCreateFile` | `NtCreateFile` (matches) |
|
||||
|
||||
`0x3001F0E8` is in canary's virtual-heap region (`0x30xxxxxx`) — the
|
||||
`Memory::SystemHeapAlloc` band — so the value is a guest VA pointing
|
||||
inside canary's in-guest XEX header copy (allocated in
|
||||
`user_module.cc:224` as `guest_xex_header_`). Ours returns 0 because
|
||||
its stub `rtl_image_xex_header_field` (`exports.rs:2391-2395`) returned
|
||||
0 unconditionally.
|
||||
|
||||
## Step 1 — Event context at idx=102014
|
||||
|
||||
From canary's existing Phase A capture
|
||||
(`xenia-rs/audit-runs/phase-c-first-divergence/phase-a/canary.jsonl`),
|
||||
canary's tid=6 makes only **two** `RtlImageXexHeaderField` calls in the
|
||||
matched prefix:
|
||||
|
||||
| event idx | event kind | payload |
|
||||
|---|---|---|
|
||||
| 0 | import.call | `RtlImageXexHeaderField` |
|
||||
| 1 | kernel.call | `RtlImageXexHeaderField` (args:{} — schema-v1 doesn't capture args) |
|
||||
| 2 | kernel.return | `return_value=0 status=0x00000000` |
|
||||
| 102012 | import.call | `RtlImageXexHeaderField` |
|
||||
| 102013 | kernel.call | `RtlImageXexHeaderField` |
|
||||
| 102014 | kernel.return | `return_value=805433576 status=0x3001f0e8` |
|
||||
|
||||
Ours pre-fix makes the same call sequence (verified by capture in
|
||||
`phase-c1-keQuerySystemTime/ours.jsonl`) — both `RtlImageXexHeaderField`
|
||||
calls returned 0.
|
||||
|
||||
Schema-v1 records empty `args:{}`, so `field_key` (r4) and `xex_header_ptr`
|
||||
(r3) aren't directly readable from the JSONL. A one-shot `eprintln` in
|
||||
ours's stub revealed both calls pass:
|
||||
|
||||
* call #1: `xex_header_ptr=0x00000000 field_key=0x00020401` (DEFAULT_HEAP_SIZE — not present in this XEX, so even with a valid header pointer the result would be 0)
|
||||
* call #2: `xex_header_ptr=0x00000000 field_key=0x00040006` (EXECUTION_INFO — low byte `0x06`, "else" class, returns `header_base + offset(0x10E8)`)
|
||||
|
||||
`xenia-rs/target/release/xenia-rs info` against the ISO confirms the
|
||||
in-XEX optional-header table. Key `0x00040006` is present with value
|
||||
`0x000010E8`; key `0x00020401` is not present. So canary's `0x3001F0E8`
|
||||
= `0x3001E000 + 0x10E8` — canary's `guest_xex_header_` lives at
|
||||
`0x3001E000`. The game queries `EXECUTION_INFO` and uses the
|
||||
returned VA to read media_id / title_id / disc_number / disc_count.
|
||||
|
||||
## Step 2 — Source-read both engines
|
||||
|
||||
### Canary
|
||||
|
||||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:501-515`:
|
||||
|
||||
```c
|
||||
pointer_result_t RtlImageXexHeaderField_entry(pointer_t<xex2_header> xex_header,
|
||||
dword_t field_dword) {
|
||||
uint32_t field_value = 0;
|
||||
uint32_t field = field_dword;
|
||||
if (!xex_header) {
|
||||
return field_value;
|
||||
}
|
||||
UserModule::GetOptHeader(kernel_memory(), xex_header, xex2_header_keys(field),
|
||||
&field_value);
|
||||
return field_value;
|
||||
}
|
||||
```
|
||||
|
||||
`UserModule::GetOptHeader` (`user_module.cc:335-369`):
|
||||
|
||||
```c
|
||||
for (uint32_t i = 0; i < header->header_count; i++) {
|
||||
auto& opt_header = header->headers[i];
|
||||
if (opt_header.key != key) continue;
|
||||
switch (opt_header.key & 0xFF) {
|
||||
case 0x00: field_value = opt_header.value; break;
|
||||
case 0x01: field_value = memory->HostToGuestVirtual(&opt_header.value); break;
|
||||
default: field_value = memory->HostToGuestVirtual(header) + opt_header.offset; break;
|
||||
}
|
||||
break;
|
||||
}
|
||||
*out = field_value;
|
||||
```
|
||||
|
||||
The argument `xex_header` is a guest VA pointing at the in-guest copy of
|
||||
the raw XEX header bytes (allocated by `user_module.cc:223-227`'s
|
||||
`guest_xex_header_ = SystemHeapAlloc(header->header_size); memcpy(...)`).
|
||||
The game reaches it via `*XexExecutableModuleHandle → hmodule_ptr →
|
||||
*(hmodule + 0x58) = xex_header_base` (canary `xmodule.h:49`).
|
||||
|
||||
### Ours
|
||||
|
||||
`xenia-rs/crates/xenia-kernel/src/exports.rs:2391-2395` (pre-fix):
|
||||
|
||||
```rust
|
||||
fn rtl_image_xex_header_field(ctx, _mem, _state) {
|
||||
// r3 = xex_header_ptr, r4 = field_id
|
||||
// Return 0 for all fields
|
||||
ctx.gpr[3] = 0;
|
||||
}
|
||||
```
|
||||
|
||||
A complete stub. The entire function body is wrong.
|
||||
|
||||
`xenia-rs/crates/xenia-app/src/main.rs:1440-1442` (pre-fix):
|
||||
|
||||
```rust
|
||||
("xboxkrnl.exe", 0x0193) => {
|
||||
// XexExecutableModuleHandle -> image base
|
||||
mem.write_u32(addr, base);
|
||||
}
|
||||
```
|
||||
|
||||
Writes `image_base` (e.g. `0x82000000`) at the variable slot instead of
|
||||
a guest VA pointing to an `X_LDR_DATA_TABLE_ENTRY`. The game's CRT
|
||||
derefs `*XexExecutableModuleHandle = base`, then walks `*(base + 0x58)`
|
||||
which reads PE OptionalHeader bytes (`0x61602063` for this ISO). Game
|
||||
treats that as invalid → falls through to call `RtlImageXexHeaderField`
|
||||
with `r3=NULL` regardless of which key it wants to query.
|
||||
|
||||
## Step 3 — Classification
|
||||
|
||||
This is **class (B-extreme)**: not "missing handler for one field key"
|
||||
but "the entire function body is a stub returning 0". The XEX header
|
||||
data IS parsed by ours's loader (`xenia-xex/src/header.rs` defines
|
||||
`Vec<Xex2OptionalHeader>`), but never made available to the kernel
|
||||
import handler.
|
||||
|
||||
Additionally, the upstream LDR chain is also wrong: `XexExecutableModule
|
||||
Handle` doesn't point to a real LDR_DATA_TABLE_ENTRY. But fixing THAT
|
||||
turned out to be Phase-A-regressing — see below.
|
||||
|
||||
### Sub-finding: LDR fix shifts boot trajectory
|
||||
|
||||
The first fix attempt (initial commit: replace `mem.write_u32(addr, base)`
|
||||
with a proper `X_LDR_DATA_TABLE_ENTRY` allocation that pointed to a
|
||||
copy of the XEX header) BROKE the matched-prefix metric:
|
||||
|
||||
| approach | tid=6→tid=1 matched |
|
||||
|---|---|
|
||||
| pre-fix (C+2 baseline) | 102014 |
|
||||
| with full LDR setup (first attempt) | **0** (regression) |
|
||||
| header-bytes-only, KernelState fallback in handler (final) | **102032** (+18 past 102014) |
|
||||
|
||||
Reason: ours's CRT entry path examines `*XexExecutableModuleHandle`.
|
||||
When it's `0x82000000` (image base), the CRT takes the "module not yet
|
||||
queryable" path which makes an early `RtlImageXexHeaderField(NULL, key)`
|
||||
probe (returning 0 — matches canary). When `*XexExecutableModuleHandle`
|
||||
is `0x4xxxxxxx` (a real LDR allocated by `KernelState::heap_alloc`), the
|
||||
CRT takes the "module queryable" path and skips the early probe call
|
||||
entirely. The two engines' event sequences then drift starting at idx=0.
|
||||
|
||||
Canary's `hmodule_ptr` lands at `0x4xxxxxxx` too (via
|
||||
`Memory::SystemHeapAlloc` — actually canary's lookup gives `0x30xxxxxx`
|
||||
for the virtual heap; ours lands in `0x4xxxxxxx`). Either way it
|
||||
should be the same "queryable" address class — but canary's CRT still
|
||||
makes the early probe. Possibly because of cycle-level timing
|
||||
differences in when `*XexExecutableModuleHandle` gets the final
|
||||
hmodule_ptr value (canary writes it during `LaunchModule` which is
|
||||
called after some PreLaunch initialization; ours writes it during the
|
||||
xenia-app's Phase 3 variable-import patcher, which runs before any
|
||||
guest code). This is too deep to chase in this session.
|
||||
|
||||
Final approach **preserves** the pre-fix CRT branch (game still passes
|
||||
ptr=NULL on most calls) by keeping `*XexExecutableModuleHandle = base`,
|
||||
then routes the handler through a KernelState fallback to recover the
|
||||
correct return value. The handler now returns `xex_header_va + 0x10E8`
|
||||
for the EXECUTION_INFO query at idx=102014.
|
||||
|
||||
## Step 4 — Pick the fix
|
||||
|
||||
Three deltas:
|
||||
|
||||
1. **`KernelState::xex_header_guest_ptr: u32`** — record where the
|
||||
guest-memory copy of the raw XEX header lives.
|
||||
2. **`xenia-app::cmd_exec`** at the `XexExecutableModuleHandle` patcher:
|
||||
keep `*XexExecutableModuleHandle = base` (don't disturb the CRT
|
||||
branch), but additionally allocate `header.header_size` bytes in
|
||||
guest memory and `mem.write_bulk(&data[..header_size])` to copy the
|
||||
raw header in. Record the resulting guest VA in
|
||||
`kernel.xex_header_guest_ptr`.
|
||||
3. **`rtl_image_xex_header_field`** — implement the lookup mirroring
|
||||
canary's `UserModule::GetOptHeader`. Fall back to
|
||||
`state.xex_header_guest_ptr` when the caller passes NULL.
|
||||
|
||||
Plus a python-side canonicalization addition:
|
||||
|
||||
4. **`diff_events.py`** — add `RtlImageXexHeaderField` to
|
||||
`ALLOCATOR_RETURN_FNS`. The return value for "else"-class keys is a
|
||||
guest VA inside the in-guest XEX header copy, which is
|
||||
host-allocator-dependent (`0x30xxxxxx` in canary,
|
||||
`0x4xxxxxxx` in ours). Per-(tid, name) ordinal sentinels mask the
|
||||
VA divergence — same pattern as Phase C+2's allocator canonicalization.
|
||||
|
||||
Total: ~80 LOC, 4 files.
|
||||
|
||||
## Cross-validation
|
||||
|
||||
* Pre-fix `eprintln` trace confirms `xex_header_ptr=0` for both ours
|
||||
calls; field keys are `0x00020401` (not in XEX → returns 0) and
|
||||
`0x00040006` (in XEX, "else" class → returns `header_base + 0x10E8`).
|
||||
* Canary's idx=102014 return value `0x3001F0E8 = 0x3001E000 + 0x10E8`
|
||||
confirms canary's `guest_xex_header_` is at `0x3001E000` and key
|
||||
`0x00040006`'s offset entry is `0x10E8`.
|
||||
* ours's `xenia-rs info` against the ISO confirms key `0x00040006`
|
||||
is present with value `0x000010E8`.
|
||||
|
||||
All three independent evidence sources converge on the same field
|
||||
semantics.
|
||||
Reference in New Issue
Block a user