Files
xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

202 lines
7.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Phase C+23 — `VdQueryVideoFlags` constant return
**Date:** 2026-05-26
**Mode:** WRITE — engine change (~5 LOC functional + ~2 LOC registration
edit + ~40 LOC tests). Diff tool UNCHANGED.
**Status:** LANDED. Main matched-prefix 105,138 → 105,286 (+148).
## TL;DR
The post-C+22 first divergence at canary tid=6 ↔ ours tid=1 idx 105,138
is `kernel.return VdQueryVideoFlags`:
```
canary: kernel.return VdQueryVideoFlags { return_value: 3, status: "0x00000003" }
ours: kernel.return VdQueryVideoFlags { return_value: 0, status: "0x00000000" }
```
Canary's `VdQueryVideoFlags_entry`
(`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:231-241`)
computes a bitmask from the queried video mode:
```cpp
dword_result_t VdQueryVideoFlags_entry() {
X_VIDEO_MODE mode;
VdQueryVideoMode(&mode, false);
uint32_t flags = 0;
flags |= mode.is_widescreen ? 1 : 0;
flags |= mode.display_width >= 1280 ? 2 : 0;
flags |= mode.display_width >= 1920 ? 4 : 0;
return flags;
}
```
Under canary's shipping defaults (`cvars::widescreen=true` from
`xboxkrnl_video.cc:31`; `cvars::internal_display_resolution=8` from
`graphics_system.cc:26``{1280, 720}` from
`graphics_system.h:38-54`), the computed value is:
```
is_widescreen=1 → +1
display_width=1280 ≥ 1280 → +2
display_width=1280 ≥ 1920 → +0
= 3
```
Ours's previous registration mapped the export to `stub_return_zero`
(`exports.rs:215` pre-change), which placed `0` in `r3`. The fix is a
1:1 mirror of canary's semantics under the same defaults that
ours's `vd_query_video_mode` already reports (width=1280,
is_widescreen=1).
## Why a constant works (no infrastructure needed)
Ours's `vd_query_video_mode`
(`exports.rs:3986-3996` pre-change, now :3997-4007 in the new file)
hard-codes `display_width=1280, is_widescreen=1, refresh_rate=60`
— it has no cvar plumbing. As long as `vd_query_video_mode`'s
payload is itself fixed, the bitmask is also fixed. Implementing a
cvar-driven flags path would require first introducing a
`widescreen` / `internal_display_resolution` cvar machinery; out of
scope per the escalation rule.
A unit test (`vd_query_video_flags_matches_vd_query_video_mode_payload`)
ties the return value to the *actual* payload `vd_query_video_mode`
writes, so the two functions stay in sync if the mode payload is
ever updated to actual cvar-driven values.
## The fix
```rust
// exports.rs:215 (registration)
state.register_export(Xboxkrnl, 0x01C9, "VdQueryVideoFlags", vd_query_video_flags);
// exports.rs:3998-4023 (new function)
fn vd_query_video_flags(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
// is_widescreen=1, display_width=1280 → bits 0 + 1 = 3
ctx.gpr[3] = 0x3;
}
```
Total: 1 export-table line change + ~6 lines of function (with
doc comment) + ~40 lines of unit tests = ~50 LOC.
## Tests
2 new tests in `exports.rs`:
1. `vd_query_video_flags_returns_three` — sentinel-overwrite +
pinned return value `0x3`.
2. `vd_query_video_flags_matches_vd_query_video_mode_payload`
computes the canary bitmask formula over `vd_query_video_mode`'s
actual written payload and asserts equality with the
`vd_query_video_flags` return. Catches drift if either function
is updated without the other.
Total: previous 224 + 2 new = **226 tests, all PASS**.
## Cold-vs-cold verification (3-jitter table)
ours cold-1 jsonl: `/tmp/ours-c23-vd-cold-1.jsonl` (28.7 MB,
108,507 events on tid=1). Captured under `XENIA_CACHE_WIPE=1` with
the freshly built `xrs-c23` binary.
| jitter | matched | first divergence at | first-divergence kind / payload |
|---|---|---|---|
| 1 | **105,286** | 105,286 | `import.call VdGetCurrentDisplayGamma` (canary) vs `import.call KeAcquireSpinLockAtRaisedIrql` (ours) |
| 2 | **105,286** | 105,286 | (same) |
| 3 | **105,286** | 105,286 | (same) |
Delta vs C+22 baseline (105,138): **+148 events** in main matched-
prefix, on all three jitters. The new first divergence is genuine
and identical across all three jitters.
## Absorber counters (sanity)
| jitter | floating_create (c/o) | floating_wait (c/o) |
|---|---|---|
| 1 | 0 / 0 | 1 / 0 |
| 2 | 0 / 0 | 0 / 0 |
| 3 | 1 / 0 | 3 / 0 |
Jitter-to-jitter variance in absorber counts is the expected
scheduling-jitter window. Matched-prefix stable at 105,286 across
all three.
## Sister chains
No sister chains exercised in the 105,138105,286 window. The 148
absorbed events are all on the main `tid=6 → tid=1` chain. The diff
report lists only the main chain row, confirming no regressions on
any sister chain.
## Determinism (3 cold runs)
| run | md5 | matched-prefix vs jitter-1 |
|---|---|---|
| cold-1 | `4e2e781ff0609f3a0a08f573dee4be4e` | 105,286 |
| cold-2 | `b195d82a1b61e87d6f54a2ac2b3e091b` | 105,286 |
| cold-3 | `e6b94d4dc151007c924b81bbc5c9faf5` | 105,286 |
Byte-level digests differ across the 3 cold runs because of
`host_ns` / `guest_cycle` wall-time jitter (unchanged from
pre-C+23 behavior). Logical semantic state — matched-prefix,
ours_total=108,507, first-divergence index — is bit-stable across
all 3 runs.
## Files touched
- `xenia-rs/crates/xenia-kernel/src/exports.rs`:
- Line 215: registration `stub_return_zero``vd_query_video_flags`.
- Lines ~3998-4023: new `vd_query_video_flags` function with
full doc comment.
- Lines ~9750-9803: 2 new unit tests in the `tests` module.
NO Phase B loader changes. NO diff-tool changes. NO new cvars.
NO refactor of video state model.
## Phase B `image_canonical_sha256`
Pinned hash `ea8d160e…` UNCHANGED — only kernel-export logic
modified; XEX loader path untouched.
## Cascade
- A (verify canary return): **PASS** — canary returns 3 under
shipping defaults; verified by direct source read of
`xboxkrnl_video.cc:231-241` + `graphics_system.cc:26` +
`graphics_system.h:38-54`. Confidence HIGH.
- B (implement + tests): **PASS** — 2 new tests, 226 total PASS,
release build clean (1 pre-existing dead-code warning on
`walk_committed_regions` — unrelated).
- C (3-jitter verification): **PASS** — all three jitters advance
105,138 → 105,286 (+148), same downstream divergence.
- D (determinism + sister chains): **PASS** — 3 cold runs converge
to identical matched-prefix=105,286 against jitter-1. No sister
chain regressions.
- E (canary caches unchanged): **PASS** — archived jitter set used,
no fresh canary run made (per C+22 precedent), `cache/` and
`cache_host/` directories unchanged from session start.
## Next divergence (C+24 candidate)
`import.call VdGetCurrentDisplayGamma` at canary idx 105,293 vs
`import.call KeAcquireSpinLockAtRaisedIrql` at ours idx 105,286.
Both engines just exited a `VdSwap` (5 matching prior events
ending in `kernel.return VdSwap`). The two engines then take
different code paths inside the post-VdSwap return path.
Possible interpretations:
- Different control flow inside a Vd post-swap hook on the canary
side (canary calls `VdGetCurrentDisplayGamma` after `VdSwap`;
ours doesn't).
- Different scheduler interleaving: ours main thread re-enters a
spinlock-protected section that canary's post-VdSwap walk avoids.
Investigation should start by looking at the canary `VdSwap`
post-handler to see if canary unconditionally calls
`VdGetCurrentDisplayGamma` (and if so, whether ours stubs it out)
or if this is a game-code branch driven by guest memory state.
Out of scope for C+23.