Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
15 KiB
Iterate 2.AI — tid=1 main-loop wedge fix (NtCreateEvent polarity)
Date: 2026-06-02. LOC delta: engine +16 / -2 LOC (1
substantive change + 14 doc lines + 1-LOC negation) in
crates/xenia-kernel/src/exports.rs nt_create_event. Retained.
Tests: xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS,
0 regressions.
Headline
WEDGE-PACED-CASCADE-FOLLOWS.
Sub-hypothesis C-1 confirmed and dispatched. tid=1's main update
loop sub_822F1AA8 no longer fast-paths through Event 0x000010e8
1.05 M times. The wait now correctly blocks (waiting on a real signaler
— the VSync ISR), tid=1 reaches 18 wedge entries downstream, and the
trace expands from 45.2 M events / 152.2 s (2.AF) to 65.7 M events /
208.3 s (2.AI), a 1.45× event growth and 1.37× wallclock progression.
Sub-hypothesis selection
The wedge handle 0x000010e8 (semid 9ad1bebb6cae28c4) was created by
tid=1's NtCreateEvent at host_ns 838 ms. In 2.AF, the handle then
received 1,077,846 wait.begin events + handle.create + ZERO
signal.match, ZERO wake.requested, ZERO handle.destroy — across
152 s.
Decision matrix:
| sub-hyp | requires | observed | verdict |
|---|---|---|---|
| C-1 Event manual-reset + initial-signaled | handle_signaled()==true forever, no real signaler needed, handle_consume no-op |
matches exactly (zero signal events, fast-path returns rv=0 each call) | chosen |
C-2 refresh_pkevent_shadow_from_guest re-signals each wait |
callsite must run before wait | nt_wait_for_single_object_ex does NOT call refresh (only ke_wait_* do); handle is small-int NT handle not guest pointer |
falsified at source |
| C-3 VSync ISR over-fires | repeated wake/signal events on the handle | zero signal events on it | falsified |
Source read confirmed the precise bug. nt_create_event
(exports.rs:3040-3060) had manual_reset = ctx.gpr[5] != 0. Canary's
NtCreateEvent_entry
(xboxkrnl_threading.cc:601-632) does
ev->Initialize(!event_type, !!initial_state) — i.e.,
manual_reset = !event_type. The polarity is inverted relative to
NT semantics (NotificationEvent = type 0 = manual-reset;
SynchronizationEvent = type 1 = auto-reset), and is also inconsistent
with our own ensure_dispatcher_object (exports.rs:4970-4980), which
correctly maps type 0 → manual, type 1 → auto. So:
- Game passes
event_type=1(SynchronizationEvent / auto-reset) +initial_state=1(signaled). - Pre-fix:
manual_reset = (1 != 0) = true→ Event{manual=true, signaled=true}. Permanently signaled, never consumed (manual-reset). - Post-fix:
manual_reset = (1 == 0) = false→ Event{manual=false, signaled=true}. First wait consumes signal, subsequent waits block.
Sister export nt_create_timer (exports.rs:3087-3116) already had the
correct polarity (manual_reset: timer_type == 0). nt_create_event
was the only outlier.
Patch summary
crates/xenia-kernel/src/exports.rs | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
- // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state
+ // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
+ // 2.AI — Xenon DISPATCHER_HEADER `Type` (NT convention):
+ // 0 = NotificationEvent (manual-reset)
+ // 1 = SynchronizationEvent (auto-reset)
+ // Canary mirrors this at `xboxkrnl_threading.cc:620`
+ // (`ev->Initialize(!event_type, !!initial_state)`) and our own
+ // `ensure_dispatcher_object` (above, type=0→manual, type=1→auto).
+ // The prior polarity here was inverted (`event_type != 0` → manual)...
let handle_ptr = ctx.gpr[3] as u32;
- let manual_reset = ctx.gpr[5] != 0;
+ let manual_reset = ctx.gpr[5] == 0;
let signaled = ctx.gpr[6] != 0;
1 substantive LOC change (the negation). Rest is a 14-line clarifying comment with the canary cross-reference and root-cause anecdote. Well within the 5-50 LOC scope (and the 100-LOC hard cap).
Determinism: the only added behavior is a per-handle boolean flip on
NtCreateEvent entry. No host_ns, no Instant::now(), no RNG. Proof
in the determinism check below.
Test results
cargo build --release -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 2/1 ignored long-runners)
+ auxiliary suites: 0 failures
No tests pinned the buggy polarity — search for the existing nt_create_event callsites in the test corpus returned only audit-trail fixtures (audit.rs:253-352), which exercise the trace label "Event/Auto" vs "Event/Manual" but not the param-to-flag mapping itself.
Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=1 main-loop iteration count drops from ~1.05M to ≪ baseline | PASS — tid=1 NtWaitForSingleObjectEx import calls: 3,233,583 (2.AF) → 51 (2.AI), a 63,400× reduction. Events on wedge semid 9ad1bebb6cae28c4: 1,077,847 (2.AF) → 3 (2.AI) (1 handle.create + 2 wait.begin, then permanently blocks). |
| 2 | wait gap on Event 0x10e8 rises from 2.21 µs to ≥1 ms | PASS structurally — first two wait.begins on this semid are 126.8 µs apart, and after the second the thread blocks indefinitely (no further wait.begin). The "23 kHz spin" is gone; the wait now correctly waits for a real signaler (the VSync ISR). |
| 3 | tid=1 XamInputGetCapabilities > 0 (was 0 in 2.V) |
PASS — 24 calls by tid=1, all in the [136 ms .. 6.58 s] interval right before the (now-blocking) VSync gate. (Same count as 2.AF baseline — already > 0 there, but the spec's "was 0" referred to 2.V; this iterate preserves the post-2.AF value.) |
The structural primary objective is achieved: the spin-forever fast-path on the wedge handle is eliminated. tid=1 now correctly blocks on its frame-sync wait, the way the game expects (waiting for the VSync ISR to signal the auto-reset event).
The wait gap isn't the full 17.18 ms because the trace cuts off at the second wait.begin — after that, tid=1 is permanently blocked (no signaler in 51 s of execution past that point). That is a different bug (the VSync ISR doesn't reach this handle) and is now exposed for the first time; the previous polarity bug masked it. This is the natural follow-up surface and matches the secondary gate pattern (new wedges appear downstream).
Determinism check
Two cold runs (XENIA_CACHE_WIPE=1 -n 500000000) produced
bit-identical event counts: 65,691,821 events each
(ours-cold.jsonl / ours-cold-run2.jsonl).
After stripping host_ns (the only intentionally-non-deterministic
field):
- First 100,000 events:
cmpreturns 0 differences. - Last 100,000 events: both files' md5 =
389d631e5b557bca0767fb8ee8104d4c.
Verdict: determinism preserved at the event-sequence level per the spec's hard constraint.
Secondary gates (cascade)
| metric | 2.V baseline | 2.AF | 2.AI | direction |
|---|---|---|---|---|
| Total events | 13,003,881 | 45,206,378 | 65,691,821 | 5.05× vs 2.V, 1.45× vs 2.AF |
| Last event host_ns | 51,011 ms | 152,207 ms | 208,272 ms | 4.08× vs 2.V, 1.37× vs 2.AF |
| Alive threads | 21 | 21 | 21 | unchanged |
| Exited threads (exit_code=0) | 2 (13,14) | 2 (13,17) | 2 (13,14) | shifted back |
| Wedge map entries | 15 | 15 | 18 | +3 new downstream wedges |
signal.match events |
75 | 69 | 84 | +15 vs 2.AF (+22%) |
wake.requested events |
79 | 71 | 86 | +15 vs 2.AF (+21%) |
| VdSwap calls | 2 | 2 | 6 | 3× ↑ |
| tid=1 NtWaitForSingleObjectEx calls | (wedged spin) | 3,233,583 | 51 | 63,400× ↓ |
| tid=1 events | (wedged spin) | 13,301,954 | 148,773 | 89× ↓ (no more spin) |
VdSwap moved from 2 → 6. Three additional VdSwap calls land in the
trace — meaning the frame-presentation path actually fires now. This was
2 in both 2.V and 2.AF; 2.AI is the first iterate where it grows. Real
rendering progression.
tid=12 (DPC dispatcher, secondary gate target): still Blocked on
Event 0x00001004 at PC 0x824ac578. Unchanged from 2.V/2.AF.
Independent cascade.
Thread-by-thread post-fix wedge analysis
The exit-state.json now contains 18 wedge entries (up from 15 in 2.AF). Newly added:
- tid=1 → Event
0x000010e8at PC0x824ac578— previously hidden by the polarity bug's fast-path. Now exposed as a real blocker (waits for VSync ISR signaling that never arrives). This is the natural "wedge moved one level deeper" pattern (#41/#42 class). - tid=21 → Event
0x0000151c/0x01000000— appears downstream of tid=5/tid=17 progress. - tid=20 → Event
0x0000151c/ Sema0x00001528— same downstream surface (already flagged in 2.AF's "next-iterate" list).
tid=14 reverts to Exited (vs tid=17 in 2.AF) — confirming that the 2.AF "tid=17 vs tid=14 swap" was a timing-shift on the deadline-fire fix, and the underlying tid=14 producer-exhaustion divergence (2.AE target) is unaltered by this fix.
Cross-engine context
2.AH had pinned canary's analog wait as VSync-gated. Now that our event has the correct semantics (auto-reset, not permanently-signaled), the next question — "is the VSync ISR reaching this handle on time?" — becomes meaningful for the first time. Per 2.AH's notes, the canary's analog wait returns ~17.18 ms (one VSync period). Ours blocks indefinitely after 2 cycles, suggesting the ISR is either not firing for tid=1's handle or the wake path doesn't reach this auto-reset event.
This is left for a subsequent iterate (see next-iterate recommendation).
Third-order observations (no claims, just data)
- 1.45× event-count growth in this iterate (45.2 M → 65.7 M) is in the same ballpark as 2.AF's 3.5× from the deadline-fire fix. Per-fix diminishing returns are visible — each independent blocker peels off more progression but the wedge surface is widening, not collapsing.
- VdSwap = 6: still not a full frame-rate (would be ~12,000 at 60 Hz across 208 s), but the mere fact that VdSwap > 2 is the first rendering progression since 2.V landed two days ago. The XAudio/XInput surfaces are likely the next limiter.
- tid=11 (XAudio worker, blocked on Events
0x828a3244/0x828a3220) remains unchanged — the XAudio stub from 2.AB is the remaining independent blocker.
Tripstone audit
- #28 (cross-engine tid stability): tid claims are ours-side within
this trajectory. Canary references rely on prior 2.AH mapping
(
+ ctx_ptrfor cross-engine equivalence). - #39 (composite progression IS progression): Honored. The headline separately reports (a) the primary state-change (1.05M iter → 51 calls + permanent block), (b) the cascade volume (1.45× events), and (c) VdSwap growth (2 → 6, the first real rendering progression metric).
- #40 (no single-keystone framing): Care taken. Headline reads
WEDGE-PACED-CASCADE-FOLLOWS, body explicitly lists 3+ remaining independent blockers (tid=11 XAudio, tid=14 first-divergence, new tid=20/21 events). The 2 prior open follow-ups (2.AE, 2.AG, 2.AI XAudio, 2.AH) are explicitly retained. - #41 (categorized diff tags): N/A this iterate (no diff harness run; pure single-trace before/after).
- #42 (Phase-A blind to blocked-forever): Exit-state JSON used throughout. tid=1's Blocked-on-0x10e8 post-fix is visible only because of that dump.
- #43 (no budget-cap framing): Budget cap reached but trace had structural progression throughout (1.37× wallclock vs 2.AF). Cascade observation robust.
- #44 refined (rate+shape comparison): Pre-fix wait rate 463,475/sec on 0x10e8; post-fix 2 events then block — vs canary's ~60/sec one VSync period each. Shape now matches canary structurally (blocking auto-reset); rate diverges in the opposite direction (we block forever; canary blocks ~17 ms each cycle). This is the expected next-step exposure.
Confidence
- HIGH that the patch is correct and minimal: 1-LOC negation, 0 test regressions, determinism preserved bit-for-bit on event count, head-100K and tail-100K cmp/md5.
- HIGH that the polarity bug is dispatched: trace evidence
(3,233,583 → 51 NtWait calls on tid=1; 1,077,847 → 3 events on the
wedge handle) is unambiguous. Exit-state JSON shows the event
correctly classified as auto-reset (
manual_reset: false, signaled: false). - HIGH that the cascade is genuine (1.45× events, 1.37× wallclock, +15 signal.match/wake.requested events, VdSwap 2→6 — all up).
- MEDIUM-HIGH that other guest events created with the same
pattern were silently mis-classified across the codebase. Any event
the guest creates with
event_type=1(auto-reset) prior to this fix was actually behaving as manual-reset — meaning many wait sites could be hiding similar fast-path bugs. Worth a regression-grep next. - MEDIUM that the next wedge (tid=1 on 0x10e8 with no signaler) is small. The VSync ISR path → tid=1's auto-reset handle is the obvious surface but the wiring may need its own fix.
- LOW that gameplay is imminent. VdSwap 6 is rendering progression but a full game frame needs ~60+ swaps/sec at steady state, and the XAudio / first-divergence / DPC blockers remain. Several more cascade iterations likely needed.
Next-iterate recommendation
Priority list:
- 2.AJ (VSync ISR → 0x10e8 wiring) — the new wedge exposed by
this iterate. tid=1 correctly blocks but no signaler reaches the
handle. Likely in
try_inject_graphics_interrupt(main.rs:3729) or the callback's user_data path. Approx 5-30 LOC, single-file. - 2.AE (tid=14 first-divergence diff) — unchanged priority from 2.AF list. ~0 LOC pure trace mining.
- 2.AI XAudio stub — tid=11 still wedged on
0x828a3244/0x828a3220. exports.rs:4591-4598 still a no-op. Approx 5-150 LOC. - 2.AG (
do_wait_multiplewait.begin) — observability gap. ~10 LOC. - Regression-grep for other inverted-polarity callers — any other
guest-API entry that maps NT's "event_type" the wrong way? Quick
scan:
nt_create_timeris fine,ensure_dispatcher_objectis fine. No further hits in current corpus, but worth a CI tripwire (e.g.Event/Manualaudit-create label assertingmanual_reset == true).
I recommend 2.AJ next (it's the wedge this iterate just exposed, single-thread, single-handle, single-file).
Artifacts
Under xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/:
ours-cold.jsonl(16.07 GB, 65,691,821 events) — primary traceours-cold.stdout.log(empty — quiet mode)ours-cold.stderr.log(single exit-thread-state notice)exit-thread-state.json(17.4 KB; 21 alive + 18 wedge entries)ours-cold-run2.jsonl(16.07 GB, 65,691,821 events) — determinism check, bit-identical event count, head & tail strip-host_ns matchesours-cold-run2.{stdout,stderr}.logwriter-report.md(this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (+18 in xenia-app/src/main.rs) +
2.AI patch (+16/-2 in xenia-kernel/src/exports.rs). Both patches
retained in working tree, uncommitted (per the cumulative-LOC policy
noted in 2.W's report).