Files
xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

15 KiB
Raw Blame History

Iterate 2.AI — tid=1 main-loop wedge fix (NtCreateEvent polarity)

Date: 2026-06-02. LOC delta: engine +16 / -2 LOC (1 substantive change + 14 doc lines + 1-LOC negation) in crates/xenia-kernel/src/exports.rs nt_create_event. Retained. Tests: xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS, 0 regressions.

Headline

WEDGE-PACED-CASCADE-FOLLOWS.

Sub-hypothesis C-1 confirmed and dispatched. tid=1's main update loop sub_822F1AA8 no longer fast-paths through Event 0x000010e8 1.05 M times. The wait now correctly blocks (waiting on a real signaler — the VSync ISR), tid=1 reaches 18 wedge entries downstream, and the trace expands from 45.2 M events / 152.2 s (2.AF) to 65.7 M events / 208.3 s (2.AI), a 1.45× event growth and 1.37× wallclock progression.

Sub-hypothesis selection

The wedge handle 0x000010e8 (semid 9ad1bebb6cae28c4) was created by tid=1's NtCreateEvent at host_ns 838 ms. In 2.AF, the handle then received 1,077,846 wait.begin events + handle.create + ZERO signal.match, ZERO wake.requested, ZERO handle.destroy — across 152 s.

Decision matrix:

sub-hyp requires observed verdict
C-1 Event manual-reset + initial-signaled handle_signaled()==true forever, no real signaler needed, handle_consume no-op matches exactly (zero signal events, fast-path returns rv=0 each call) chosen
C-2 refresh_pkevent_shadow_from_guest re-signals each wait callsite must run before wait nt_wait_for_single_object_ex does NOT call refresh (only ke_wait_* do); handle is small-int NT handle not guest pointer falsified at source
C-3 VSync ISR over-fires repeated wake/signal events on the handle zero signal events on it falsified

Source read confirmed the precise bug. nt_create_event (exports.rs:3040-3060) had manual_reset = ctx.gpr[5] != 0. Canary's NtCreateEvent_entry (xboxkrnl_threading.cc:601-632) does ev->Initialize(!event_type, !!initial_state) — i.e., manual_reset = !event_type. The polarity is inverted relative to NT semantics (NotificationEvent = type 0 = manual-reset; SynchronizationEvent = type 1 = auto-reset), and is also inconsistent with our own ensure_dispatcher_object (exports.rs:4970-4980), which correctly maps type 0 → manual, type 1 → auto. So:

  • Game passes event_type=1 (SynchronizationEvent / auto-reset) + initial_state=1 (signaled).
  • Pre-fix: manual_reset = (1 != 0) = true → Event{manual=true, signaled=true}. Permanently signaled, never consumed (manual-reset).
  • Post-fix: manual_reset = (1 == 0) = false → Event{manual=false, signaled=true}. First wait consumes signal, subsequent waits block.

Sister export nt_create_timer (exports.rs:3087-3116) already had the correct polarity (manual_reset: timer_type == 0). nt_create_event was the only outlier.

Patch summary

crates/xenia-kernel/src/exports.rs | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
 fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
-    // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state
+    // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
+    // 2.AI — Xenon DISPATCHER_HEADER `Type` (NT convention):
+    //     0 = NotificationEvent   (manual-reset)
+    //     1 = SynchronizationEvent (auto-reset)
+    // Canary mirrors this at `xboxkrnl_threading.cc:620`
+    // (`ev->Initialize(!event_type, !!initial_state)`) and our own
+    // `ensure_dispatcher_object` (above, type=0→manual, type=1→auto).
+    // The prior polarity here was inverted (`event_type != 0` → manual)...
     let handle_ptr = ctx.gpr[3] as u32;
-    let manual_reset = ctx.gpr[5] != 0;
+    let manual_reset = ctx.gpr[5] == 0;
     let signaled = ctx.gpr[6] != 0;

1 substantive LOC change (the negation). Rest is a 14-line clarifying comment with the canary cross-reference and root-cause anecdote. Well within the 5-50 LOC scope (and the 100-LOC hard cap).

Determinism: the only added behavior is a per-handle boolean flip on NtCreateEvent entry. No host_ns, no Instant::now(), no RNG. Proof in the determinism check below.

Test results

cargo build --release  -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
  xenia-cpu    300 passed, 0 failed
  xenia-kernel 227 passed, 0 failed
  xenia-app      5 passed, 0 failed (+ 2/1 ignored long-runners)
  + auxiliary suites: 0 failures

No tests pinned the buggy polarity — search for the existing nt_create_event callsites in the test corpus returned only audit-trail fixtures (audit.rs:253-352), which exercise the trace label "Event/Auto" vs "Event/Manual" but not the param-to-flag mapping itself.

Primary gate results

# predicate result
1 tid=1 main-loop iteration count drops from ~1.05M to ≪ baseline PASS — tid=1 NtWaitForSingleObjectEx import calls: 3,233,583 (2.AF) → 51 (2.AI), a 63,400× reduction. Events on wedge semid 9ad1bebb6cae28c4: 1,077,847 (2.AF) → 3 (2.AI) (1 handle.create + 2 wait.begin, then permanently blocks).
2 wait gap on Event 0x10e8 rises from 2.21 µs to ≥1 ms PASS structurally — first two wait.begins on this semid are 126.8 µs apart, and after the second the thread blocks indefinitely (no further wait.begin). The "23 kHz spin" is gone; the wait now correctly waits for a real signaler (the VSync ISR).
3 tid=1 XamInputGetCapabilities > 0 (was 0 in 2.V) PASS24 calls by tid=1, all in the [136 ms .. 6.58 s] interval right before the (now-blocking) VSync gate. (Same count as 2.AF baseline — already > 0 there, but the spec's "was 0" referred to 2.V; this iterate preserves the post-2.AF value.)

The structural primary objective is achieved: the spin-forever fast-path on the wedge handle is eliminated. tid=1 now correctly blocks on its frame-sync wait, the way the game expects (waiting for the VSync ISR to signal the auto-reset event).

The wait gap isn't the full 17.18 ms because the trace cuts off at the second wait.begin — after that, tid=1 is permanently blocked (no signaler in 51 s of execution past that point). That is a different bug (the VSync ISR doesn't reach this handle) and is now exposed for the first time; the previous polarity bug masked it. This is the natural follow-up surface and matches the secondary gate pattern (new wedges appear downstream).

Determinism check

Two cold runs (XENIA_CACHE_WIPE=1 -n 500000000) produced bit-identical event counts: 65,691,821 events each (ours-cold.jsonl / ours-cold-run2.jsonl).

After stripping host_ns (the only intentionally-non-deterministic field):

  • First 100,000 events: cmp returns 0 differences.
  • Last 100,000 events: both files' md5 = 389d631e5b557bca0767fb8ee8104d4c.

Verdict: determinism preserved at the event-sequence level per the spec's hard constraint.

Secondary gates (cascade)

metric 2.V baseline 2.AF 2.AI direction
Total events 13,003,881 45,206,378 65,691,821 5.05× vs 2.V, 1.45× vs 2.AF
Last event host_ns 51,011 ms 152,207 ms 208,272 ms 4.08× vs 2.V, 1.37× vs 2.AF
Alive threads 21 21 21 unchanged
Exited threads (exit_code=0) 2 (13,14) 2 (13,17) 2 (13,14) shifted back
Wedge map entries 15 15 18 +3 new downstream wedges
signal.match events 75 69 84 +15 vs 2.AF (+22%)
wake.requested events 79 71 86 +15 vs 2.AF (+21%)
VdSwap calls 2 2 6 3×
tid=1 NtWaitForSingleObjectEx calls (wedged spin) 3,233,583 51 63,400×
tid=1 events (wedged spin) 13,301,954 148,773 89× ↓ (no more spin)

VdSwap moved from 2 → 6. Three additional VdSwap calls land in the trace — meaning the frame-presentation path actually fires now. This was 2 in both 2.V and 2.AF; 2.AI is the first iterate where it grows. Real rendering progression.

tid=12 (DPC dispatcher, secondary gate target): still Blocked on Event 0x00001004 at PC 0x824ac578. Unchanged from 2.V/2.AF. Independent cascade.

Thread-by-thread post-fix wedge analysis

The exit-state.json now contains 18 wedge entries (up from 15 in 2.AF). Newly added:

  • tid=1 → Event 0x000010e8 at PC 0x824ac578previously hidden by the polarity bug's fast-path. Now exposed as a real blocker (waits for VSync ISR signaling that never arrives). This is the natural "wedge moved one level deeper" pattern (#41/#42 class).
  • tid=21 → Event 0x0000151c / 0x01000000 — appears downstream of tid=5/tid=17 progress.
  • tid=20 → Event 0x0000151c / Sema 0x00001528 — same downstream surface (already flagged in 2.AF's "next-iterate" list).

tid=14 reverts to Exited (vs tid=17 in 2.AF) — confirming that the 2.AF "tid=17 vs tid=14 swap" was a timing-shift on the deadline-fire fix, and the underlying tid=14 producer-exhaustion divergence (2.AE target) is unaltered by this fix.

Cross-engine context

2.AH had pinned canary's analog wait as VSync-gated. Now that our event has the correct semantics (auto-reset, not permanently-signaled), the next question — "is the VSync ISR reaching this handle on time?" — becomes meaningful for the first time. Per 2.AH's notes, the canary's analog wait returns ~17.18 ms (one VSync period). Ours blocks indefinitely after 2 cycles, suggesting the ISR is either not firing for tid=1's handle or the wake path doesn't reach this auto-reset event.

This is left for a subsequent iterate (see next-iterate recommendation).

Third-order observations (no claims, just data)

  • 1.45× event-count growth in this iterate (45.2 M → 65.7 M) is in the same ballpark as 2.AF's 3.5× from the deadline-fire fix. Per-fix diminishing returns are visible — each independent blocker peels off more progression but the wedge surface is widening, not collapsing.
  • VdSwap = 6: still not a full frame-rate (would be ~12,000 at 60 Hz across 208 s), but the mere fact that VdSwap > 2 is the first rendering progression since 2.V landed two days ago. The XAudio/XInput surfaces are likely the next limiter.
  • tid=11 (XAudio worker, blocked on Events 0x828a3244 / 0x828a3220) remains unchanged — the XAudio stub from 2.AB is the remaining independent blocker.

Tripstone audit

  • #28 (cross-engine tid stability): tid claims are ours-side within this trajectory. Canary references rely on prior 2.AH mapping (+ ctx_ptr for cross-engine equivalence).
  • #39 (composite progression IS progression): Honored. The headline separately reports (a) the primary state-change (1.05M iter → 51 calls + permanent block), (b) the cascade volume (1.45× events), and (c) VdSwap growth (2 → 6, the first real rendering progression metric).
  • #40 (no single-keystone framing): Care taken. Headline reads WEDGE-PACED-CASCADE-FOLLOWS, body explicitly lists 3+ remaining independent blockers (tid=11 XAudio, tid=14 first-divergence, new tid=20/21 events). The 2 prior open follow-ups (2.AE, 2.AG, 2.AI XAudio, 2.AH) are explicitly retained.
  • #41 (categorized diff tags): N/A this iterate (no diff harness run; pure single-trace before/after).
  • #42 (Phase-A blind to blocked-forever): Exit-state JSON used throughout. tid=1's Blocked-on-0x10e8 post-fix is visible only because of that dump.
  • #43 (no budget-cap framing): Budget cap reached but trace had structural progression throughout (1.37× wallclock vs 2.AF). Cascade observation robust.
  • #44 refined (rate+shape comparison): Pre-fix wait rate 463,475/sec on 0x10e8; post-fix 2 events then block — vs canary's ~60/sec one VSync period each. Shape now matches canary structurally (blocking auto-reset); rate diverges in the opposite direction (we block forever; canary blocks ~17 ms each cycle). This is the expected next-step exposure.

Confidence

  • HIGH that the patch is correct and minimal: 1-LOC negation, 0 test regressions, determinism preserved bit-for-bit on event count, head-100K and tail-100K cmp/md5.
  • HIGH that the polarity bug is dispatched: trace evidence (3,233,583 → 51 NtWait calls on tid=1; 1,077,847 → 3 events on the wedge handle) is unambiguous. Exit-state JSON shows the event correctly classified as auto-reset (manual_reset: false, signaled: false).
  • HIGH that the cascade is genuine (1.45× events, 1.37× wallclock, +15 signal.match/wake.requested events, VdSwap 2→6 — all up).
  • MEDIUM-HIGH that other guest events created with the same pattern were silently mis-classified across the codebase. Any event the guest creates with event_type=1 (auto-reset) prior to this fix was actually behaving as manual-reset — meaning many wait sites could be hiding similar fast-path bugs. Worth a regression-grep next.
  • MEDIUM that the next wedge (tid=1 on 0x10e8 with no signaler) is small. The VSync ISR path → tid=1's auto-reset handle is the obvious surface but the wiring may need its own fix.
  • LOW that gameplay is imminent. VdSwap 6 is rendering progression but a full game frame needs ~60+ swaps/sec at steady state, and the XAudio / first-divergence / DPC blockers remain. Several more cascade iterations likely needed.

Next-iterate recommendation

Priority list:

  1. 2.AJ (VSync ISR → 0x10e8 wiring) — the new wedge exposed by this iterate. tid=1 correctly blocks but no signaler reaches the handle. Likely in try_inject_graphics_interrupt (main.rs:3729) or the callback's user_data path. Approx 5-30 LOC, single-file.
  2. 2.AE (tid=14 first-divergence diff) — unchanged priority from 2.AF list. ~0 LOC pure trace mining.
  3. 2.AI XAudio stub — tid=11 still wedged on 0x828a3244 / 0x828a3220. exports.rs:4591-4598 still a no-op. Approx 5-150 LOC.
  4. 2.AG (do_wait_multiple wait.begin) — observability gap. ~10 LOC.
  5. Regression-grep for other inverted-polarity callers — any other guest-API entry that maps NT's "event_type" the wrong way? Quick scan: nt_create_timer is fine, ensure_dispatcher_object is fine. No further hits in current corpus, but worth a CI tripwire (e.g. Event/Manual audit-create label asserting manual_reset == true).

I recommend 2.AJ next (it's the wedge this iterate just exposed, single-thread, single-handle, single-file).

Artifacts

Under xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/:

  • ours-cold.jsonl (16.07 GB, 65,691,821 events) — primary trace
  • ours-cold.stdout.log (empty — quiet mode)
  • ours-cold.stderr.log (single exit-thread-state notice)
  • exit-thread-state.json (17.4 KB; 21 alive + 18 wedge entries)
  • ours-cold-run2.jsonl (16.07 GB, 65,691,821 events) — determinism check, bit-identical event count, head & tail strip-host_ns matches
  • ours-cold-run2.{stdout,stderr}.log
  • writer-report.md (this file)

xenia-canary UNCHANGED.

Engine state: head + 2.AF patch (+18 in xenia-app/src/main.rs) + 2.AI patch (+16/-2 in xenia-kernel/src/exports.rs). Both patches retained in working tree, uncommitted (per the cumulative-LOC policy noted in 2.W's report).