handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,201 @@
# Phase C+23 — `VdQueryVideoFlags` constant return
**Date:** 2026-05-26
**Mode:** WRITE — engine change (~5 LOC functional + ~2 LOC registration
edit + ~40 LOC tests). Diff tool UNCHANGED.
**Status:** LANDED. Main matched-prefix 105,138 → 105,286 (+148).
## TL;DR
The post-C+22 first divergence at canary tid=6 ↔ ours tid=1 idx 105,138
is `kernel.return VdQueryVideoFlags`:
```
canary: kernel.return VdQueryVideoFlags { return_value: 3, status: "0x00000003" }
ours: kernel.return VdQueryVideoFlags { return_value: 0, status: "0x00000000" }
```
Canary's `VdQueryVideoFlags_entry`
(`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:231-241`)
computes a bitmask from the queried video mode:
```cpp
dword_result_t VdQueryVideoFlags_entry() {
X_VIDEO_MODE mode;
VdQueryVideoMode(&mode, false);
uint32_t flags = 0;
flags |= mode.is_widescreen ? 1 : 0;
flags |= mode.display_width >= 1280 ? 2 : 0;
flags |= mode.display_width >= 1920 ? 4 : 0;
return flags;
}
```
Under canary's shipping defaults (`cvars::widescreen=true` from
`xboxkrnl_video.cc:31`; `cvars::internal_display_resolution=8` from
`graphics_system.cc:26``{1280, 720}` from
`graphics_system.h:38-54`), the computed value is:
```
is_widescreen=1 → +1
display_width=1280 ≥ 1280 → +2
display_width=1280 ≥ 1920 → +0
= 3
```
Ours's previous registration mapped the export to `stub_return_zero`
(`exports.rs:215` pre-change), which placed `0` in `r3`. The fix is a
1:1 mirror of canary's semantics under the same defaults that
ours's `vd_query_video_mode` already reports (width=1280,
is_widescreen=1).
## Why a constant works (no infrastructure needed)
Ours's `vd_query_video_mode`
(`exports.rs:3986-3996` pre-change, now :3997-4007 in the new file)
hard-codes `display_width=1280, is_widescreen=1, refresh_rate=60`
— it has no cvar plumbing. As long as `vd_query_video_mode`'s
payload is itself fixed, the bitmask is also fixed. Implementing a
cvar-driven flags path would require first introducing a
`widescreen` / `internal_display_resolution` cvar machinery; out of
scope per the escalation rule.
A unit test (`vd_query_video_flags_matches_vd_query_video_mode_payload`)
ties the return value to the *actual* payload `vd_query_video_mode`
writes, so the two functions stay in sync if the mode payload is
ever updated to actual cvar-driven values.
## The fix
```rust
// exports.rs:215 (registration)
state.register_export(Xboxkrnl, 0x01C9, "VdQueryVideoFlags", vd_query_video_flags);
// exports.rs:3998-4023 (new function)
fn vd_query_video_flags(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
// is_widescreen=1, display_width=1280 → bits 0 + 1 = 3
ctx.gpr[3] = 0x3;
}
```
Total: 1 export-table line change + ~6 lines of function (with
doc comment) + ~40 lines of unit tests = ~50 LOC.
## Tests
2 new tests in `exports.rs`:
1. `vd_query_video_flags_returns_three` — sentinel-overwrite +
pinned return value `0x3`.
2. `vd_query_video_flags_matches_vd_query_video_mode_payload`
computes the canary bitmask formula over `vd_query_video_mode`'s
actual written payload and asserts equality with the
`vd_query_video_flags` return. Catches drift if either function
is updated without the other.
Total: previous 224 + 2 new = **226 tests, all PASS**.
## Cold-vs-cold verification (3-jitter table)
ours cold-1 jsonl: `/tmp/ours-c23-vd-cold-1.jsonl` (28.7 MB,
108,507 events on tid=1). Captured under `XENIA_CACHE_WIPE=1` with
the freshly built `xrs-c23` binary.
| jitter | matched | first divergence at | first-divergence kind / payload |
|---|---|---|---|
| 1 | **105,286** | 105,286 | `import.call VdGetCurrentDisplayGamma` (canary) vs `import.call KeAcquireSpinLockAtRaisedIrql` (ours) |
| 2 | **105,286** | 105,286 | (same) |
| 3 | **105,286** | 105,286 | (same) |
Delta vs C+22 baseline (105,138): **+148 events** in main matched-
prefix, on all three jitters. The new first divergence is genuine
and identical across all three jitters.
## Absorber counters (sanity)
| jitter | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|
| 1 | 0 / 0 | 1 / 0 |
| 2 | 0 / 0 | 0 / 0 |
| 3 | 1 / 0 | 3 / 0 |
Jitter-to-jitter variance in absorber counts is the expected
scheduling-jitter window. Matched-prefix stable at 105,286 across
all three.
## Sister chains
No sister chains exercised in the 105,138105,286 window. The 148
absorbed events are all on the main `tid=6 → tid=1` chain. The diff
report lists only the main chain row, confirming no regressions on
any sister chain.
## Determinism (3 cold runs)
| run | md5 | matched-prefix vs jitter-1 |
|---|---|---|
| cold-1 | `4e2e781ff0609f3a0a08f573dee4be4e` | 105,286 |
| cold-2 | `b195d82a1b61e87d6f54a2ac2b3e091b` | 105,286 |
| cold-3 | `e6b94d4dc151007c924b81bbc5c9faf5` | 105,286 |
Byte-level digests differ across the 3 cold runs because of
`host_ns` / `guest_cycle` wall-time jitter (unchanged from
pre-C+23 behavior). Logical semantic state — matched-prefix,
ours_total=108,507, first-divergence index — is bit-stable across
all 3 runs.
## Files touched
- `xenia-rs/crates/xenia-kernel/src/exports.rs`:
- Line 215: registration `stub_return_zero``vd_query_video_flags`.
- Lines ~3998-4023: new `vd_query_video_flags` function with
full doc comment.
- Lines ~9750-9803: 2 new unit tests in the `tests` module.
NO Phase B loader changes. NO diff-tool changes. NO new cvars.
NO refactor of video state model.
## Phase B `image_canonical_sha256`
Pinned hash `ea8d160e…` UNCHANGED — only kernel-export logic
modified; XEX loader path untouched.
## Cascade
- A (verify canary return): **PASS** — canary returns 3 under
shipping defaults; verified by direct source read of
`xboxkrnl_video.cc:231-241` + `graphics_system.cc:26` +
`graphics_system.h:38-54`. Confidence HIGH.
- B (implement + tests): **PASS** — 2 new tests, 226 total PASS,
release build clean (1 pre-existing dead-code warning on
`walk_committed_regions` — unrelated).
- C (3-jitter verification): **PASS** — all three jitters advance
105,138 → 105,286 (+148), same downstream divergence.
- D (determinism + sister chains): **PASS** — 3 cold runs converge
to identical matched-prefix=105,286 against jitter-1. No sister
chain regressions.
- E (canary caches unchanged): **PASS** — archived jitter set used,
no fresh canary run made (per C+22 precedent), `cache/` and
`cache_host/` directories unchanged from session start.
## Next divergence (C+24 candidate)
`import.call VdGetCurrentDisplayGamma` at canary idx 105,293 vs
`import.call KeAcquireSpinLockAtRaisedIrql` at ours idx 105,286.
Both engines just exited a `VdSwap` (5 matching prior events
ending in `kernel.return VdSwap`). The two engines then take
different code paths inside the post-VdSwap return path.
Possible interpretations:
- Different control flow inside a Vd post-swap hook on the canary
side (canary calls `VdGetCurrentDisplayGamma` after `VdSwap`;
ours doesn't).
- Different scheduler interleaving: ours main thread re-enters a
spinlock-protected section that canary's post-VdSwap walk avoids.
Investigation should start by looking at the canary `VdSwap`
post-handler to see if canary unconditionally calls
`VdGetCurrentDisplayGamma` (and if so, whether ours stubs it out)
or if this is a game-code branch driven by guest memory state.
Out of scope for C+23.