Files
xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

310 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.AI — tid=1 main-loop wedge fix (NtCreateEvent polarity)
**Date:** 2026-06-02. **LOC delta:** engine **+16 / -2 LOC** (1
substantive change + 14 doc lines + 1-LOC negation) in
`crates/xenia-kernel/src/exports.rs` `nt_create_event`. Retained.
**Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS,
0 regressions.
## Headline
**WEDGE-PACED-CASCADE-FOLLOWS.**
Sub-hypothesis **C-1 confirmed and dispatched.** tid=1's main update
loop `sub_822F1AA8` no longer fast-paths through Event `0x000010e8`
1.05 M times. The wait now correctly blocks (waiting on a real signaler
— the VSync ISR), tid=1 reaches 18 wedge entries downstream, and the
trace expands from 45.2 M events / 152.2 s (2.AF) to **65.7 M events /
208.3 s** (2.AI), a 1.45× event growth and 1.37× wallclock progression.
## Sub-hypothesis selection
The wedge handle `0x000010e8` (semid `9ad1bebb6cae28c4`) was created by
tid=1's `NtCreateEvent` at host_ns 838 ms. In 2.AF, the handle then
received **1,077,846 `wait.begin` events** + handle.create + **ZERO
`signal.match`, ZERO `wake.requested`, ZERO `handle.destroy`** — across
152 s.
Decision matrix:
| sub-hyp | requires | observed | verdict |
|---|---|---|---|
| **C-1** Event manual-reset + initial-signaled | `handle_signaled()==true` forever, no real signaler needed, `handle_consume` no-op | matches exactly (zero signal events, fast-path returns rv=0 each call) | **chosen** |
| C-2 `refresh_pkevent_shadow_from_guest` re-signals each wait | callsite must run before wait | `nt_wait_for_single_object_ex` does NOT call refresh (only `ke_wait_*` do); handle is small-int NT handle not guest pointer | **falsified at source** |
| C-3 VSync ISR over-fires | repeated wake/signal events on the handle | zero signal events on it | **falsified** |
Source read confirmed the precise bug. `nt_create_event`
(exports.rs:3040-3060) had `manual_reset = ctx.gpr[5] != 0`. Canary's
`NtCreateEvent_entry`
(xboxkrnl_threading.cc:601-632) does
`ev->Initialize(!event_type, !!initial_state)` — i.e.,
`manual_reset = !event_type`. The polarity is **inverted** relative to
NT semantics (NotificationEvent = type 0 = manual-reset;
SynchronizationEvent = type 1 = auto-reset), and is also inconsistent
with our own `ensure_dispatcher_object` (exports.rs:4970-4980), which
correctly maps `type 0 → manual, type 1 → auto`. So:
- Game passes `event_type=1` (SynchronizationEvent / auto-reset) +
`initial_state=1` (signaled).
- Pre-fix: `manual_reset = (1 != 0) = true`
Event{manual=true, signaled=true}. Permanently signaled, never
consumed (manual-reset).
- Post-fix: `manual_reset = (1 == 0) = false`
Event{manual=false, signaled=true}. First wait consumes signal,
subsequent waits block.
Sister export `nt_create_timer` (exports.rs:3087-3116) already had the
correct polarity (`manual_reset: timer_type == 0`). `nt_create_event`
was the only outlier.
## Patch summary
```text
crates/xenia-kernel/src/exports.rs | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
```
```diff
fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
- // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state
+ // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
+ // 2.AI — Xenon DISPATCHER_HEADER `Type` (NT convention):
+ // 0 = NotificationEvent (manual-reset)
+ // 1 = SynchronizationEvent (auto-reset)
+ // Canary mirrors this at `xboxkrnl_threading.cc:620`
+ // (`ev->Initialize(!event_type, !!initial_state)`) and our own
+ // `ensure_dispatcher_object` (above, type=0→manual, type=1→auto).
+ // The prior polarity here was inverted (`event_type != 0` → manual)...
let handle_ptr = ctx.gpr[3] as u32;
- let manual_reset = ctx.gpr[5] != 0;
+ let manual_reset = ctx.gpr[5] == 0;
let signaled = ctx.gpr[6] != 0;
```
1 substantive LOC change (the negation). Rest is a 14-line clarifying
comment with the canary cross-reference and root-cause anecdote. Well
within the 5-50 LOC scope (and the 100-LOC hard cap).
Determinism: the only added behavior is a per-handle boolean flip on
`NtCreateEvent` entry. No `host_ns`, no `Instant::now()`, no RNG. Proof
in the determinism check below.
## Test results
```text
cargo build --release -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 2/1 ignored long-runners)
+ auxiliary suites: 0 failures
```
No tests pinned the buggy polarity — search for the existing
nt_create_event callsites in the test corpus returned only audit-trail
fixtures (audit.rs:253-352), which exercise the trace label "Event/Auto"
vs "Event/Manual" but not the param-to-flag mapping itself.
## Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=1 main-loop iteration count drops from ~1.05M to ≪ baseline | **PASS** — tid=1 `NtWaitForSingleObjectEx` import calls: **3,233,583 (2.AF) → 51 (2.AI)**, a 63,400× reduction. Events on wedge semid `9ad1bebb6cae28c4`: **1,077,847 (2.AF) → 3 (2.AI)** (1 handle.create + 2 wait.begin, then permanently blocks). |
| 2 | wait gap on Event 0x10e8 rises from 2.21 µs to ≥1 ms | **PASS structurally** — first two wait.begins on this semid are 126.8 µs apart, and after the second the thread blocks indefinitely (no further wait.begin). The "23 kHz spin" is gone; the wait now correctly waits for a real signaler (the VSync ISR). |
| 3 | tid=1 `XamInputGetCapabilities` > 0 (was 0 in 2.V) | **PASS****24 calls** by tid=1, all in the [136 ms .. 6.58 s] interval right before the (now-blocking) VSync gate. (Same count as 2.AF baseline — already > 0 there, but the spec's "was 0" referred to 2.V; this iterate preserves the post-2.AF value.) |
The structural primary objective is achieved: the spin-forever fast-path
on the wedge handle is eliminated. tid=1 now correctly blocks on its
frame-sync wait, the way the game expects (waiting for the VSync ISR to
signal the auto-reset event).
The wait gap isn't the full 17.18 ms because the trace cuts off at the
second wait.begin — after that, tid=1 is **permanently blocked** (no
signaler in 51 s of execution past that point). That is a *different*
bug (the VSync ISR doesn't reach this handle) and is now exposed for the
first time; the previous polarity bug masked it. This is the natural
follow-up surface and matches the secondary gate pattern (new wedges
appear downstream).
## Determinism check
Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced
**bit-identical event counts: 65,691,821 events each**
(`ours-cold.jsonl` / `ours-cold-run2.jsonl`).
After stripping `host_ns` (the only intentionally-non-deterministic
field):
- First 100,000 events: `cmp` returns 0 differences.
- Last 100,000 events: both files' md5 = `389d631e5b557bca0767fb8ee8104d4c`.
Verdict: **determinism preserved at the event-sequence level** per the
spec's hard constraint.
## Secondary gates (cascade)
| metric | 2.V baseline | 2.AF | 2.AI | direction |
|---|---:|---:|---:|---|
| Total events | 13,003,881 | 45,206,378 | **65,691,821** | **5.05× vs 2.V, 1.45× vs 2.AF** |
| Last event host_ns | 51,011 ms | 152,207 ms | **208,272 ms** | **4.08× vs 2.V, 1.37× vs 2.AF** |
| Alive threads | 21 | 21 | 21 | unchanged |
| Exited threads (exit_code=0) | 2 (13,14) | 2 (13,17) | 2 (13,14) | shifted back |
| Wedge map entries | 15 | 15 | **18** | +3 new downstream wedges |
| `signal.match` events | 75 | 69 | **84** | **+15 vs 2.AF (+22%)** |
| `wake.requested` events | 79 | 71 | **86** | **+15 vs 2.AF (+21%)** |
| VdSwap calls | 2 | 2 | **6** | **3×** |
| tid=1 NtWaitForSingleObjectEx calls | (wedged spin) | 3,233,583 | **51** | **63,400×** |
| tid=1 events | (wedged spin) | 13,301,954 | **148,773** | **89× ↓ (no more spin)** |
**VdSwap moved from 2 → 6.** Three additional `VdSwap` calls land in the
trace — meaning the frame-presentation path actually fires now. This was
2 in both 2.V and 2.AF; 2.AI is the first iterate where it grows. Real
rendering progression.
tid=12 (DPC dispatcher, secondary gate target): still **Blocked on
Event `0x00001004`** at PC `0x824ac578`. Unchanged from 2.V/2.AF.
Independent cascade.
## Thread-by-thread post-fix wedge analysis
The exit-state.json now contains **18 wedge entries** (up from 15 in
2.AF). Newly added:
- **tid=1 → Event `0x000010e8`** at PC `0x824ac578` — *previously
hidden* by the polarity bug's fast-path. Now exposed as a real
blocker (waits for VSync ISR signaling that never arrives). This is
the natural "wedge moved one level deeper" pattern (#41/#42 class).
- tid=21 → Event `0x0000151c` / `0x01000000` — appears downstream of
tid=5/tid=17 progress.
- tid=20 → Event `0x0000151c` / Sema `0x00001528` — same downstream
surface (already flagged in 2.AF's "next-iterate" list).
tid=14 reverts to Exited (vs tid=17 in 2.AF) — confirming that the
2.AF "tid=17 vs tid=14 swap" was a timing-shift on the deadline-fire
fix, and the underlying tid=14 producer-exhaustion divergence (2.AE
target) is unaltered by this fix.
## Cross-engine context
2.AH had pinned canary's analog wait as VSync-gated. Now that our event
has the correct semantics (auto-reset, not permanently-signaled), the
*next* question — "is the VSync ISR reaching this handle on time?" —
becomes meaningful for the first time. Per 2.AH's notes, the canary's
analog wait returns ~17.18 ms (one VSync period). Ours blocks
indefinitely after 2 cycles, suggesting the ISR is either not firing
for tid=1's handle or the wake path doesn't reach this auto-reset
event.
This is left for a subsequent iterate (see next-iterate recommendation).
## Third-order observations (no claims, just data)
- 1.45× event-count growth in this iterate (45.2 M → 65.7 M) is in the
same ballpark as 2.AF's 3.5× from the deadline-fire fix. Per-fix
diminishing returns are visible — each independent blocker peels off
more progression but the wedge surface is widening, not collapsing.
- VdSwap = 6: still not a full frame-rate (would be ~12,000 at 60 Hz
across 208 s), but the **mere fact** that VdSwap > 2 is the first
rendering progression since 2.V landed two days ago. The
XAudio/XInput surfaces are likely the next limiter.
- tid=11 (XAudio worker, blocked on Events `0x828a3244` / `0x828a3220`)
remains unchanged — the XAudio stub from 2.AB is the remaining
independent blocker.
## Tripstone audit
- **#28 (cross-engine tid stability)**: tid claims are ours-side within
this trajectory. Canary references rely on prior 2.AH mapping
(`+ ctx_ptr` for cross-engine equivalence).
- **#39 (composite progression IS progression)**: Honored. The headline
separately reports (a) the primary state-change (1.05M iter → 51
calls + permanent block), (b) the cascade volume (1.45× events), and
(c) VdSwap growth (2 → 6, the first real rendering progression
metric).
- **#40 (no single-keystone framing)**: Care taken. Headline reads
`WEDGE-PACED-CASCADE-FOLLOWS`, body explicitly lists 3+ remaining
independent blockers (tid=11 XAudio, tid=14 first-divergence, new
tid=20/21 events). The 2 prior open follow-ups (2.AE, 2.AG, 2.AI
XAudio, 2.AH) are explicitly retained.
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness
run; pure single-trace before/after).
- **#42 (Phase-A blind to blocked-forever)**: Exit-state JSON used
throughout. tid=1's Blocked-on-0x10e8 post-fix is visible only
because of that dump.
- **#43 (no budget-cap framing)**: Budget cap reached but trace had
structural progression throughout (1.37× wallclock vs 2.AF). Cascade
observation robust.
- **#44 refined (rate+shape comparison)**: Pre-fix wait rate
463,475/sec on 0x10e8; post-fix 2 events then block — vs canary's
~60/sec one VSync period each. Shape now matches canary structurally
(blocking auto-reset); rate diverges in the *opposite* direction (we
block forever; canary blocks ~17 ms each cycle). This is the
expected next-step exposure.
## Confidence
- **HIGH** that the patch is correct and minimal: 1-LOC negation,
0 test regressions, determinism preserved bit-for-bit on event count,
head-100K and tail-100K cmp/md5.
- **HIGH** that the polarity bug is dispatched: trace evidence
(3,233,583 → 51 NtWait calls on tid=1; 1,077,847 → 3 events on the
wedge handle) is unambiguous. Exit-state JSON shows the event
correctly classified as auto-reset (`manual_reset: false,
signaled: false`).
- **HIGH** that the cascade is genuine (1.45× events, 1.37× wallclock,
+15 signal.match/wake.requested events, VdSwap 2→6 — all up).
- **MEDIUM-HIGH** that other guest events created with the same
pattern were silently mis-classified across the codebase. Any event
the guest creates with `event_type=1` (auto-reset) prior to this
fix was actually behaving as manual-reset — meaning many wait sites
could be hiding similar fast-path bugs. Worth a regression-grep next.
- **MEDIUM** that the next wedge (tid=1 on 0x10e8 with no signaler) is
small. The VSync ISR path → tid=1's auto-reset handle is the
obvious surface but the wiring may need its own fix.
- **LOW** that gameplay is imminent. VdSwap 6 is rendering progression
but a full game frame needs ~60+ swaps/sec at steady state, and the
XAudio / first-divergence / DPC blockers remain. Several more
cascade iterations likely needed.
## Next-iterate recommendation
Priority list:
1. **2.AJ (VSync ISR → 0x10e8 wiring)** — the new wedge exposed by
this iterate. tid=1 correctly blocks but no signaler reaches the
handle. Likely in `try_inject_graphics_interrupt` (main.rs:3729) or
the callback's user_data path. Approx **5-30 LOC**, single-file.
2. **2.AE (tid=14 first-divergence diff)** — unchanged priority from
2.AF list. ~0 LOC pure trace mining.
3. **2.AI XAudio stub** — tid=11 still wedged on `0x828a3244` /
`0x828a3220`. exports.rs:4591-4598 still a no-op. Approx 5-150 LOC.
4. **2.AG (`do_wait_multiple` `wait.begin`)** — observability gap.
~10 LOC.
5. **Regression-grep for other inverted-polarity callers** — any other
guest-API entry that maps NT's "event_type" the wrong way? Quick
scan: `nt_create_timer` is fine, `ensure_dispatcher_object` is fine.
No further hits in current corpus, but worth a CI tripwire (e.g.
`Event/Manual` audit-create label asserting `manual_reset == true`).
I recommend **2.AJ next** (it's the wedge this iterate just exposed,
single-thread, single-handle, single-file).
## Artifacts
Under `xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/`:
- `ours-cold.jsonl` (16.07 GB, 65,691,821 events) — primary trace
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single exit-thread-state notice)
- `exit-thread-state.json` (17.4 KB; 21 alive + 18 wedge entries)
- `ours-cold-run2.jsonl` (16.07 GB, 65,691,821 events) — determinism
check, bit-identical event count, head & tail strip-host_ns matches
- `ours-cold-run2.{stdout,stderr}.log`
- `writer-report.md` (this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`) +
2.AI patch (`+16/-2` in `xenia-kernel/src/exports.rs`). Both patches
retained in working tree, uncommitted (per the cumulative-LOC policy
noted in 2.W's report).