Files
xenia-rs/audit-runs/phase-c23-VdQueryVideoFlags/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

7.2 KiB
Raw Blame History

Phase C+23 — VdQueryVideoFlags constant return

Date: 2026-05-26 Mode: WRITE — engine change (~5 LOC functional + ~2 LOC registration edit + ~40 LOC tests). Diff tool UNCHANGED. Status: LANDED. Main matched-prefix 105,138 → 105,286 (+148).

TL;DR

The post-C+22 first divergence at canary tid=6 ↔ ours tid=1 idx 105,138 is kernel.return VdQueryVideoFlags:

canary: kernel.return VdQueryVideoFlags { return_value: 3, status: "0x00000003" }
ours:   kernel.return VdQueryVideoFlags { return_value: 0, status: "0x00000000" }

Canary's VdQueryVideoFlags_entry (xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_video.cc:231-241) computes a bitmask from the queried video mode:

dword_result_t VdQueryVideoFlags_entry() {
  X_VIDEO_MODE mode;
  VdQueryVideoMode(&mode, false);
  uint32_t flags = 0;
  flags |= mode.is_widescreen ? 1 : 0;
  flags |= mode.display_width >= 1280 ? 2 : 0;
  flags |= mode.display_width >= 1920 ? 4 : 0;
  return flags;
}

Under canary's shipping defaults (cvars::widescreen=true from xboxkrnl_video.cc:31; cvars::internal_display_resolution=8 from graphics_system.cc:26{1280, 720} from graphics_system.h:38-54), the computed value is:

is_widescreen=1 → +1
display_width=1280 ≥ 1280 → +2
display_width=1280 ≥ 1920 → +0
= 3

Ours's previous registration mapped the export to stub_return_zero (exports.rs:215 pre-change), which placed 0 in r3. The fix is a 1:1 mirror of canary's semantics under the same defaults that ours's vd_query_video_mode already reports (width=1280, is_widescreen=1).

Why a constant works (no infrastructure needed)

Ours's vd_query_video_mode (exports.rs:3986-3996 pre-change, now :3997-4007 in the new file) hard-codes display_width=1280, is_widescreen=1, refresh_rate=60 — it has no cvar plumbing. As long as vd_query_video_mode's payload is itself fixed, the bitmask is also fixed. Implementing a cvar-driven flags path would require first introducing a widescreen / internal_display_resolution cvar machinery; out of scope per the escalation rule.

A unit test (vd_query_video_flags_matches_vd_query_video_mode_payload) ties the return value to the actual payload vd_query_video_mode writes, so the two functions stay in sync if the mode payload is ever updated to actual cvar-driven values.

The fix

// exports.rs:215 (registration)
state.register_export(Xboxkrnl, 0x01C9, "VdQueryVideoFlags", vd_query_video_flags);

// exports.rs:3998-4023 (new function)
fn vd_query_video_flags(ctx: &mut PpcContext, _mem: &GuestMemory, _state: &mut KernelState) {
    // is_widescreen=1, display_width=1280 → bits 0 + 1 = 3
    ctx.gpr[3] = 0x3;
}

Total: 1 export-table line change + ~6 lines of function (with doc comment) + ~40 lines of unit tests = ~50 LOC.

Tests

2 new tests in exports.rs:

  1. vd_query_video_flags_returns_three — sentinel-overwrite + pinned return value 0x3.
  2. vd_query_video_flags_matches_vd_query_video_mode_payload — computes the canary bitmask formula over vd_query_video_mode's actual written payload and asserts equality with the vd_query_video_flags return. Catches drift if either function is updated without the other.

Total: previous 224 + 2 new = 226 tests, all PASS.

Cold-vs-cold verification (3-jitter table)

ours cold-1 jsonl: /tmp/ours-c23-vd-cold-1.jsonl (28.7 MB, 108,507 events on tid=1). Captured under XENIA_CACHE_WIPE=1 with the freshly built xrs-c23 binary.

jitter matched first divergence at first-divergence kind / payload
1 105,286 105,286 import.call VdGetCurrentDisplayGamma (canary) vs import.call KeAcquireSpinLockAtRaisedIrql (ours)
2 105,286 105,286 (same)
3 105,286 105,286 (same)

Delta vs C+22 baseline (105,138): +148 events in main matched- prefix, on all three jitters. The new first divergence is genuine and identical across all three jitters.

Absorber counters (sanity)

jitter floating_create (c/o) floating_wait (c/o)
1 0 / 0 1 / 0
2 0 / 0 0 / 0
3 1 / 0 3 / 0

Jitter-to-jitter variance in absorber counts is the expected scheduling-jitter window. Matched-prefix stable at 105,286 across all three.

Sister chains

No sister chains exercised in the 105,138105,286 window. The 148 absorbed events are all on the main tid=6 → tid=1 chain. The diff report lists only the main chain row, confirming no regressions on any sister chain.

Determinism (3 cold runs)

run md5 matched-prefix vs jitter-1
cold-1 4e2e781ff0609f3a0a08f573dee4be4e 105,286
cold-2 b195d82a1b61e87d6f54a2ac2b3e091b 105,286
cold-3 e6b94d4dc151007c924b81bbc5c9faf5 105,286

Byte-level digests differ across the 3 cold runs because of host_ns / guest_cycle wall-time jitter (unchanged from pre-C+23 behavior). Logical semantic state — matched-prefix, ours_total=108,507, first-divergence index — is bit-stable across all 3 runs.

Files touched

  • xenia-rs/crates/xenia-kernel/src/exports.rs:
    • Line 215: registration stub_return_zerovd_query_video_flags.
    • Lines ~3998-4023: new vd_query_video_flags function with full doc comment.
    • Lines ~9750-9803: 2 new unit tests in the tests module.

NO Phase B loader changes. NO diff-tool changes. NO new cvars. NO refactor of video state model.

Phase B image_canonical_sha256

Pinned hash ea8d160e… UNCHANGED — only kernel-export logic modified; XEX loader path untouched.

Cascade

  • A (verify canary return): PASS — canary returns 3 under shipping defaults; verified by direct source read of xboxkrnl_video.cc:231-241 + graphics_system.cc:26 + graphics_system.h:38-54. Confidence HIGH.
  • B (implement + tests): PASS — 2 new tests, 226 total PASS, release build clean (1 pre-existing dead-code warning on walk_committed_regions — unrelated).
  • C (3-jitter verification): PASS — all three jitters advance 105,138 → 105,286 (+148), same downstream divergence.
  • D (determinism + sister chains): PASS — 3 cold runs converge to identical matched-prefix=105,286 against jitter-1. No sister chain regressions.
  • E (canary caches unchanged): PASS — archived jitter set used, no fresh canary run made (per C+22 precedent), cache/ and cache_host/ directories unchanged from session start.

Next divergence (C+24 candidate)

import.call VdGetCurrentDisplayGamma at canary idx 105,293 vs import.call KeAcquireSpinLockAtRaisedIrql at ours idx 105,286. Both engines just exited a VdSwap (5 matching prior events ending in kernel.return VdSwap). The two engines then take different code paths inside the post-VdSwap return path.

Possible interpretations:

  • Different control flow inside a Vd post-swap hook on the canary side (canary calls VdGetCurrentDisplayGamma after VdSwap; ours doesn't).
  • Different scheduler interleaving: ours main thread re-enters a spinlock-protected section that canary's post-VdSwap walk avoids.

Investigation should start by looking at the canary VdSwap post-handler to see if canary unconditionally calls VdGetCurrentDisplayGamma (and if so, whether ours stubs it out) or if this is a game-code branch driven by guest memory state.

Out of scope for C+23.