handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,182 @@
# Phase C+6 — investigation: call-name divergence at idx=102132
## Divergence
| | canary | ours (pre-fix) |
|---|---|---|
| `import.call` at idx=102132, tid=6→1 | `NtClose` (ord 207) | `IoDismountVolumeByFileHandle` (ord 60) |
## Phase 0 — Rule out ord→name lookup bug
Both engines map ord 0x3C (60) to `IoDismountVolumeByFileHandle` and
ord 0xCF (207) to `NtClose`. Cross-check:
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_table.inc:74`
`XE_EXPORT(xboxkrnl, 0x0000003C, IoDismountVolumeByFileHandle, kFunction)`
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_table.inc:221`
`XE_EXPORT(xboxkrnl, 0x000000CF, NtClose, kFunction)`
* `xenia-rs/crates/xenia-kernel/src/exports.rs:31`
`register_export(Xboxkrnl, 0x3C, "IoDismountVolumeByFileHandle", stub_success)`
* `xenia-rs/crates/xenia-kernel/src/exports.rs:91`
`register_export(Xboxkrnl, 0xCF, "NtClose", nt_close)`
Ord→name mapping is byte-identical. **Phase 0 rules out emitter
name-lookup bug.** This is a real event-stream divergence.
## Phase 1 — Capture context around idx=102132 in both streams
`audit-runs/phase-c5-NtWriteFile/ours.jsonl` lines 102134..102137
(idx 102132..102135) and `phase-c-first-divergence/phase-a/canary.jsonl`
matching tid=6 events:
```
ours[102132] import.call IoDismountVolumeByFileHandle (ord 60)
ours[102133] kernel.call IoDismountVolumeByFileHandle
ours[102134] kernel.return IoDismountVolumeByFileHandle returns 0
ours[102135] import.call NtClose (ord 207)
ours[102136] kernel.call NtClose
ours[102137] kernel.return NtClose returns 0
canary[102132] import.call NtClose (ord 207)
canary[102133] kernel.call NtClose
canary[102134] kernel.return NtClose returns 0
canary[102135] import.call NtOpenFile (ord 223)
```
**Observation**: Ours's events 102135..102137 (`NtClose` triple) are
bit-identical to canary's events 102132..102134. Ours has 3 EXTRA
events (`IoDismountVolumeByFileHandle` triple) injected at idx=102132
that canary's stream does NOT contain. After ours's 3-event surplus,
both streams realign: `ours[102138] = canary[102135] = NtOpenFile`.
So the game DOES call IoDismountVolumeByFileHandle in BOTH engines; the
difference is purely whether the Phase A emitter fires.
## Phase 2 — Source-read both emitter paths
### Canary emit logic (path A: declared export)
`xenia-canary/src/xenia/kernel/util/shim_utils.h:597-602`:
```cpp
const bool phase_a_on = phase_a_bridge::Enabled();
if (phase_a_on) {
phase_a_bridge::EmitImportAndCall(
phase_a_bridge::KernelModuleIdName(MODULE), ORDINAL,
export_entry->name);
}
```
Inside `Trampoline`, only reachable when a `DECLARE_XBOXKRNL_EXPORT`
shim wires `export_entry->function_data.trampoline = &X::Trampoline`.
### Canary emit logic (path B: table-entry-only, no DECLARE)
`xenia-canary/src/xenia/cpu/xex_module.cc:1310-1335` import-thunk
generator: when `kernel_export->function_data.trampoline == nullptr`
(no DECLARE shim), the thunk is rewritten to `sc 2; blr` — the syscall
form. The "extern handler" wired in `SetupExtern(handler=nullptr, ...)`
forwards the call to `PPCFrontend::SyscallHandler`
(`ppc_frontend.cc:83-92`):
```cpp
void SyscallHandler(PPCContext* ppc_context, void* arg0, void* arg1) {
uint64_t syscall_number = ppc_context->r[0];
switch (syscall_number) {
default:
assert_unhandled_case(syscall_number);
XELOGE("Unhandled syscall {}!", syscall_number);
break;
}
}
```
No `phase_a_bridge::EmitImportAndCall` call. **Canary emits NO Phase A
events for table-entry-only exports.** Verified by grepping
xenia-canary for `DECLARE_XBOXKRNL_EXPORT` declarations — only 287
ords have implementations; `IoDismountVolumeByFileHandle` is NOT among
them. (See `/tmp/canary_decl.txt` snapshot in `investigation.md`'s
work artifacts.)
### Ours emit logic
`xenia-rs/crates/xenia-kernel/src/state.rs:585-632` (pre-fix):
```rust
if let Some(&(name, func)) = self.exports.get(&(module, ordinal)) {
...
let phase_a_on = crate::event_log::is_enabled();
...
if phase_a_on {
crate::event_log::emit_import_call(...);
crate::event_log::emit_kernel_call(...);
}
func(&mut ctx, mem, self);
if phase_a_on {
let return_value = if is_void { 0 } else { ctx.gpr[3] };
crate::event_log::emit_kernel_return(...);
}
}
```
Ours emits `import.call`/`kernel.call`/`kernel.return` for EVERY
registered ord, regardless of whether canary has a real shim or
a syscall-thunk. `IoDismountVolumeByFileHandle` is registered as
`stub_success` and therefore generates 3 spurious Phase A events.
## Phase 3 — Classification
**Class (E) Phase A emitter framing — Phase A coverage gap.** Ours's
emitter fires for stubs that canary leaves silent (canary uses the
syscall thunk for table-entry-only exports, which does not reach
`Trampoline`).
Not class (A) (no guest-code branch flip — events realign at +3),
not class (α) (this is not canonicalization — it's an emitter
asymmetry that we can fix at source), not class (D) (no deferred-
item interaction — heap region / clock not involved).
## Sister bugs surfaced (out-of-scope — documented for follow-up)
`comm -23 <ours-xboxkrnl> <canary-DECLARE_XBOXKRNL_EXPORT>` lists
**12 xboxkrnl ords** ours registers that canary doesn't have a shim
for. Of those, the following actually fire in the current 50M run
and would also drift Phase A alignment if their callers reached them:
* `IoDismountVolumeByFileHandle` (ord 0x3C, called 1× tid=1 main —
**fixed in this session**)
* `StfsCreateDevice` (ord 0x259, called 1× tid=2 — drives tid=7→tid=2
divergence at idx=15; out of scope per session-scope rule)
The other 10 (DbgPrint, RtlCaptureContext, RtlUnwind, sprintf,
_vsnprintf, __C_specific_handler, XeKeysConsoleSignatureVerification,
StfsControlDevice) are not yet called in the 50M run; they will
become relevant only after the game progresses further.
**Sister bug (different class)**: ord 0x82 is `KeQueryInterruptTime`
in canary but ours mis-labels it `KeQueryIdealProcessor`; ord 0x98 is
`KeSetBackgroundProcessors` in canary but ours mis-labels it
`KeSetIdealProcessor`. These are name-lookup bugs (Stage 2 cleanup
class) and are NOT addressed here; would require renaming the
function-pointer fn and either dropping the existing semantics or
moving them to a different ord.
## Reading-error class #26 (additive)
**Phase A emitter coverage asymmetry across the "table-entry-only"
vs "shim-implemented" axis.** Canary's emitter fires only from
`Trampoline` (wired by `DECLARE_XBOXKRNL_EXPORT`). Ours's emitter
fires for every `register_export` regardless of canary equivalence.
When a guest import is in canary's table but has no DECLARE shim,
canary routes it through a no-op syscall thunk with no Phase A
emission — ours, by registering a `stub_success`, injects 3 spurious
events per call.
Discipline addition: when registering a kernel export as
`stub_success`/`stub_return_zero` etc., grep canary for
`DECLARE_XBOXKRNL_EXPORT(<name>` first. If absent, use
`register_unimplemented_export` instead of `register_export` so the
Phase A emitter stays silent (matching canary). Reading-error #21
(C+1's `gpr[3]`-as-return-value for void exports) and #25 (C+5's
wrong-register read) were both kernel-export-emitter bugs; this is
the third in that family.