Files
xenia-rs/audit-runs/phase-c3-RtlImageXexHeaderField/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

215 lines
9.0 KiB
Markdown

# Phase C+3 — investigation: `RtlImageXexHeaderField` at idx=102014
## Divergence
| | canary | ours (pre-fix) |
|---|---|---|
| `payload.return_value` (idx=102014, tid=6→1) | `805433576` = `0x3001F0E8` | `0` |
| `payload.status` | `0x3001f0e8` | `0x00000000` |
| Surrounding context (idx 102009..102013): `RtlLeaveCriticalSection``RtlImageXexHeaderField`. ||
| Game thread | tid=6 main | tid=1 main |
| Next event (idx=102015) | `NtCreateFile` | `NtCreateFile` (matches) |
`0x3001F0E8` is in canary's virtual-heap region (`0x30xxxxxx`) — the
`Memory::SystemHeapAlloc` band — so the value is a guest VA pointing
inside canary's in-guest XEX header copy (allocated in
`user_module.cc:224` as `guest_xex_header_`). Ours returns 0 because
its stub `rtl_image_xex_header_field` (`exports.rs:2391-2395`) returned
0 unconditionally.
## Step 1 — Event context at idx=102014
From canary's existing Phase A capture
(`xenia-rs/audit-runs/phase-c-first-divergence/phase-a/canary.jsonl`),
canary's tid=6 makes only **two** `RtlImageXexHeaderField` calls in the
matched prefix:
| event idx | event kind | payload |
|---|---|---|
| 0 | import.call | `RtlImageXexHeaderField` |
| 1 | kernel.call | `RtlImageXexHeaderField` (args:{} — schema-v1 doesn't capture args) |
| 2 | kernel.return | `return_value=0 status=0x00000000` |
| 102012 | import.call | `RtlImageXexHeaderField` |
| 102013 | kernel.call | `RtlImageXexHeaderField` |
| 102014 | kernel.return | `return_value=805433576 status=0x3001f0e8` |
Ours pre-fix makes the same call sequence (verified by capture in
`phase-c1-keQuerySystemTime/ours.jsonl`) — both `RtlImageXexHeaderField`
calls returned 0.
Schema-v1 records empty `args:{}`, so `field_key` (r4) and `xex_header_ptr`
(r3) aren't directly readable from the JSONL. A one-shot `eprintln` in
ours's stub revealed both calls pass:
* call #1: `xex_header_ptr=0x00000000 field_key=0x00020401` (DEFAULT_HEAP_SIZE — not present in this XEX, so even with a valid header pointer the result would be 0)
* call #2: `xex_header_ptr=0x00000000 field_key=0x00040006` (EXECUTION_INFO — low byte `0x06`, "else" class, returns `header_base + offset(0x10E8)`)
`xenia-rs/target/release/xenia-rs info` against the ISO confirms the
in-XEX optional-header table. Key `0x00040006` is present with value
`0x000010E8`; key `0x00020401` is not present. So canary's `0x3001F0E8`
= `0x3001E000 + 0x10E8` — canary's `guest_xex_header_` lives at
`0x3001E000`. The game queries `EXECUTION_INFO` and uses the
returned VA to read media_id / title_id / disc_number / disc_count.
## Step 2 — Source-read both engines
### Canary
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_rtl.cc:501-515`:
```c
pointer_result_t RtlImageXexHeaderField_entry(pointer_t<xex2_header> xex_header,
dword_t field_dword) {
uint32_t field_value = 0;
uint32_t field = field_dword;
if (!xex_header) {
return field_value;
}
UserModule::GetOptHeader(kernel_memory(), xex_header, xex2_header_keys(field),
&field_value);
return field_value;
}
```
`UserModule::GetOptHeader` (`user_module.cc:335-369`):
```c
for (uint32_t i = 0; i < header->header_count; i++) {
auto& opt_header = header->headers[i];
if (opt_header.key != key) continue;
switch (opt_header.key & 0xFF) {
case 0x00: field_value = opt_header.value; break;
case 0x01: field_value = memory->HostToGuestVirtual(&opt_header.value); break;
default: field_value = memory->HostToGuestVirtual(header) + opt_header.offset; break;
}
break;
}
*out = field_value;
```
The argument `xex_header` is a guest VA pointing at the in-guest copy of
the raw XEX header bytes (allocated by `user_module.cc:223-227`'s
`guest_xex_header_ = SystemHeapAlloc(header->header_size); memcpy(...)`).
The game reaches it via `*XexExecutableModuleHandle → hmodule_ptr →
*(hmodule + 0x58) = xex_header_base` (canary `xmodule.h:49`).
### Ours
`xenia-rs/crates/xenia-kernel/src/exports.rs:2391-2395` (pre-fix):
```rust
fn rtl_image_xex_header_field(ctx, _mem, _state) {
// r3 = xex_header_ptr, r4 = field_id
// Return 0 for all fields
ctx.gpr[3] = 0;
}
```
A complete stub. The entire function body is wrong.
`xenia-rs/crates/xenia-app/src/main.rs:1440-1442` (pre-fix):
```rust
("xboxkrnl.exe", 0x0193) => {
// XexExecutableModuleHandle -> image base
mem.write_u32(addr, base);
}
```
Writes `image_base` (e.g. `0x82000000`) at the variable slot instead of
a guest VA pointing to an `X_LDR_DATA_TABLE_ENTRY`. The game's CRT
derefs `*XexExecutableModuleHandle = base`, then walks `*(base + 0x58)`
which reads PE OptionalHeader bytes (`0x61602063` for this ISO). Game
treats that as invalid → falls through to call `RtlImageXexHeaderField`
with `r3=NULL` regardless of which key it wants to query.
## Step 3 — Classification
This is **class (B-extreme)**: not "missing handler for one field key"
but "the entire function body is a stub returning 0". The XEX header
data IS parsed by ours's loader (`xenia-xex/src/header.rs` defines
`Vec<Xex2OptionalHeader>`), but never made available to the kernel
import handler.
Additionally, the upstream LDR chain is also wrong: `XexExecutableModule
Handle` doesn't point to a real LDR_DATA_TABLE_ENTRY. But fixing THAT
turned out to be Phase-A-regressing — see below.
### Sub-finding: LDR fix shifts boot trajectory
The first fix attempt (initial commit: replace `mem.write_u32(addr, base)`
with a proper `X_LDR_DATA_TABLE_ENTRY` allocation that pointed to a
copy of the XEX header) BROKE the matched-prefix metric:
| approach | tid=6→tid=1 matched |
|---|---|
| pre-fix (C+2 baseline) | 102014 |
| with full LDR setup (first attempt) | **0** (regression) |
| header-bytes-only, KernelState fallback in handler (final) | **102032** (+18 past 102014) |
Reason: ours's CRT entry path examines `*XexExecutableModuleHandle`.
When it's `0x82000000` (image base), the CRT takes the "module not yet
queryable" path which makes an early `RtlImageXexHeaderField(NULL, key)`
probe (returning 0 — matches canary). When `*XexExecutableModuleHandle`
is `0x4xxxxxxx` (a real LDR allocated by `KernelState::heap_alloc`), the
CRT takes the "module queryable" path and skips the early probe call
entirely. The two engines' event sequences then drift starting at idx=0.
Canary's `hmodule_ptr` lands at `0x4xxxxxxx` too (via
`Memory::SystemHeapAlloc` — actually canary's lookup gives `0x30xxxxxx`
for the virtual heap; ours lands in `0x4xxxxxxx`). Either way it
should be the same "queryable" address class — but canary's CRT still
makes the early probe. Possibly because of cycle-level timing
differences in when `*XexExecutableModuleHandle` gets the final
hmodule_ptr value (canary writes it during `LaunchModule` which is
called after some PreLaunch initialization; ours writes it during the
xenia-app's Phase 3 variable-import patcher, which runs before any
guest code). This is too deep to chase in this session.
Final approach **preserves** the pre-fix CRT branch (game still passes
ptr=NULL on most calls) by keeping `*XexExecutableModuleHandle = base`,
then routes the handler through a KernelState fallback to recover the
correct return value. The handler now returns `xex_header_va + 0x10E8`
for the EXECUTION_INFO query at idx=102014.
## Step 4 — Pick the fix
Three deltas:
1. **`KernelState::xex_header_guest_ptr: u32`** — record where the
guest-memory copy of the raw XEX header lives.
2. **`xenia-app::cmd_exec`** at the `XexExecutableModuleHandle` patcher:
keep `*XexExecutableModuleHandle = base` (don't disturb the CRT
branch), but additionally allocate `header.header_size` bytes in
guest memory and `mem.write_bulk(&data[..header_size])` to copy the
raw header in. Record the resulting guest VA in
`kernel.xex_header_guest_ptr`.
3. **`rtl_image_xex_header_field`** — implement the lookup mirroring
canary's `UserModule::GetOptHeader`. Fall back to
`state.xex_header_guest_ptr` when the caller passes NULL.
Plus a python-side canonicalization addition:
4. **`diff_events.py`** — add `RtlImageXexHeaderField` to
`ALLOCATOR_RETURN_FNS`. The return value for "else"-class keys is a
guest VA inside the in-guest XEX header copy, which is
host-allocator-dependent (`0x30xxxxxx` in canary,
`0x4xxxxxxx` in ours). Per-(tid, name) ordinal sentinels mask the
VA divergence — same pattern as Phase C+2's allocator canonicalization.
Total: ~80 LOC, 4 files.
## Cross-validation
* Pre-fix `eprintln` trace confirms `xex_header_ptr=0` for both ours
calls; field keys are `0x00020401` (not in XEX → returns 0) and
`0x00040006` (in XEX, "else" class → returns `header_base + 0x10E8`).
* Canary's idx=102014 return value `0x3001F0E8 = 0x3001E000 + 0x10E8`
confirms canary's `guest_xex_header_` is at `0x3001E000` and key
`0x00040006`'s offset entry is `0x10E8`.
* ours's `xenia-rs info` against the ISO confirms key `0x00040006`
is present with value `0x000010E8`.
All three independent evidence sources converge on the same field
semantics.