handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,309 @@
# Iterate 2.AI — tid=1 main-loop wedge fix (NtCreateEvent polarity)
**Date:** 2026-06-02. **LOC delta:** engine **+16 / -2 LOC** (1
substantive change + 14 doc lines + 1-LOC negation) in
`crates/xenia-kernel/src/exports.rs` `nt_create_event`. Retained.
**Tests:** xenia-cpu 300 / xenia-kernel 227 / xenia-app 5 — full PASS,
0 regressions.
## Headline
**WEDGE-PACED-CASCADE-FOLLOWS.**
Sub-hypothesis **C-1 confirmed and dispatched.** tid=1's main update
loop `sub_822F1AA8` no longer fast-paths through Event `0x000010e8`
1.05 M times. The wait now correctly blocks (waiting on a real signaler
— the VSync ISR), tid=1 reaches 18 wedge entries downstream, and the
trace expands from 45.2 M events / 152.2 s (2.AF) to **65.7 M events /
208.3 s** (2.AI), a 1.45× event growth and 1.37× wallclock progression.
## Sub-hypothesis selection
The wedge handle `0x000010e8` (semid `9ad1bebb6cae28c4`) was created by
tid=1's `NtCreateEvent` at host_ns 838 ms. In 2.AF, the handle then
received **1,077,846 `wait.begin` events** + handle.create + **ZERO
`signal.match`, ZERO `wake.requested`, ZERO `handle.destroy`** — across
152 s.
Decision matrix:
| sub-hyp | requires | observed | verdict |
|---|---|---|---|
| **C-1** Event manual-reset + initial-signaled | `handle_signaled()==true` forever, no real signaler needed, `handle_consume` no-op | matches exactly (zero signal events, fast-path returns rv=0 each call) | **chosen** |
| C-2 `refresh_pkevent_shadow_from_guest` re-signals each wait | callsite must run before wait | `nt_wait_for_single_object_ex` does NOT call refresh (only `ke_wait_*` do); handle is small-int NT handle not guest pointer | **falsified at source** |
| C-3 VSync ISR over-fires | repeated wake/signal events on the handle | zero signal events on it | **falsified** |
Source read confirmed the precise bug. `nt_create_event`
(exports.rs:3040-3060) had `manual_reset = ctx.gpr[5] != 0`. Canary's
`NtCreateEvent_entry`
(xboxkrnl_threading.cc:601-632) does
`ev->Initialize(!event_type, !!initial_state)` — i.e.,
`manual_reset = !event_type`. The polarity is **inverted** relative to
NT semantics (NotificationEvent = type 0 = manual-reset;
SynchronizationEvent = type 1 = auto-reset), and is also inconsistent
with our own `ensure_dispatcher_object` (exports.rs:4970-4980), which
correctly maps `type 0 → manual, type 1 → auto`. So:
- Game passes `event_type=1` (SynchronizationEvent / auto-reset) +
`initial_state=1` (signaled).
- Pre-fix: `manual_reset = (1 != 0) = true`
Event{manual=true, signaled=true}. Permanently signaled, never
consumed (manual-reset).
- Post-fix: `manual_reset = (1 == 0) = false`
Event{manual=false, signaled=true}. First wait consumes signal,
subsequent waits block.
Sister export `nt_create_timer` (exports.rs:3087-3116) already had the
correct polarity (`manual_reset: timer_type == 0`). `nt_create_event`
was the only outlier.
## Patch summary
```text
crates/xenia-kernel/src/exports.rs | 18 ++++++++++++++++--
1 file changed, 16 insertions(+), 2 deletions(-)
```
```diff
fn nt_create_event(ctx: &mut PpcContext, mem: &GuestMemory, state: &mut KernelState) {
- // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state
+ // r3 = handle_ptr, r4 = obj_attrs, r5 = event_type, r6 = initial_state.
+ // 2.AI — Xenon DISPATCHER_HEADER `Type` (NT convention):
+ // 0 = NotificationEvent (manual-reset)
+ // 1 = SynchronizationEvent (auto-reset)
+ // Canary mirrors this at `xboxkrnl_threading.cc:620`
+ // (`ev->Initialize(!event_type, !!initial_state)`) and our own
+ // `ensure_dispatcher_object` (above, type=0→manual, type=1→auto).
+ // The prior polarity here was inverted (`event_type != 0` → manual)...
let handle_ptr = ctx.gpr[3] as u32;
- let manual_reset = ctx.gpr[5] != 0;
+ let manual_reset = ctx.gpr[5] == 0;
let signaled = ctx.gpr[6] != 0;
```
1 substantive LOC change (the negation). Rest is a 14-line clarifying
comment with the canary cross-reference and root-cause anecdote. Well
within the 5-50 LOC scope (and the 100-LOC hard cap).
Determinism: the only added behavior is a per-handle boolean flip on
`NtCreateEvent` entry. No `host_ns`, no `Instant::now()`, no RNG. Proof
in the determinism check below.
## Test results
```text
cargo build --release -> OK
cargo test -p xenia-cpu -p xenia-kernel -p xenia-app --release
xenia-cpu 300 passed, 0 failed
xenia-kernel 227 passed, 0 failed
xenia-app 5 passed, 0 failed (+ 2/1 ignored long-runners)
+ auxiliary suites: 0 failures
```
No tests pinned the buggy polarity — search for the existing
nt_create_event callsites in the test corpus returned only audit-trail
fixtures (audit.rs:253-352), which exercise the trace label "Event/Auto"
vs "Event/Manual" but not the param-to-flag mapping itself.
## Primary gate results
| # | predicate | result |
|---|---|---|
| 1 | tid=1 main-loop iteration count drops from ~1.05M to ≪ baseline | **PASS** — tid=1 `NtWaitForSingleObjectEx` import calls: **3,233,583 (2.AF) → 51 (2.AI)**, a 63,400× reduction. Events on wedge semid `9ad1bebb6cae28c4`: **1,077,847 (2.AF) → 3 (2.AI)** (1 handle.create + 2 wait.begin, then permanently blocks). |
| 2 | wait gap on Event 0x10e8 rises from 2.21 µs to ≥1 ms | **PASS structurally** — first two wait.begins on this semid are 126.8 µs apart, and after the second the thread blocks indefinitely (no further wait.begin). The "23 kHz spin" is gone; the wait now correctly waits for a real signaler (the VSync ISR). |
| 3 | tid=1 `XamInputGetCapabilities` > 0 (was 0 in 2.V) | **PASS****24 calls** by tid=1, all in the [136 ms .. 6.58 s] interval right before the (now-blocking) VSync gate. (Same count as 2.AF baseline — already > 0 there, but the spec's "was 0" referred to 2.V; this iterate preserves the post-2.AF value.) |
The structural primary objective is achieved: the spin-forever fast-path
on the wedge handle is eliminated. tid=1 now correctly blocks on its
frame-sync wait, the way the game expects (waiting for the VSync ISR to
signal the auto-reset event).
The wait gap isn't the full 17.18 ms because the trace cuts off at the
second wait.begin — after that, tid=1 is **permanently blocked** (no
signaler in 51 s of execution past that point). That is a *different*
bug (the VSync ISR doesn't reach this handle) and is now exposed for the
first time; the previous polarity bug masked it. This is the natural
follow-up surface and matches the secondary gate pattern (new wedges
appear downstream).
## Determinism check
Two cold runs (`XENIA_CACHE_WIPE=1 -n 500000000`) produced
**bit-identical event counts: 65,691,821 events each**
(`ours-cold.jsonl` / `ours-cold-run2.jsonl`).
After stripping `host_ns` (the only intentionally-non-deterministic
field):
- First 100,000 events: `cmp` returns 0 differences.
- Last 100,000 events: both files' md5 = `389d631e5b557bca0767fb8ee8104d4c`.
Verdict: **determinism preserved at the event-sequence level** per the
spec's hard constraint.
## Secondary gates (cascade)
| metric | 2.V baseline | 2.AF | 2.AI | direction |
|---|---:|---:|---:|---|
| Total events | 13,003,881 | 45,206,378 | **65,691,821** | **5.05× vs 2.V, 1.45× vs 2.AF** |
| Last event host_ns | 51,011 ms | 152,207 ms | **208,272 ms** | **4.08× vs 2.V, 1.37× vs 2.AF** |
| Alive threads | 21 | 21 | 21 | unchanged |
| Exited threads (exit_code=0) | 2 (13,14) | 2 (13,17) | 2 (13,14) | shifted back |
| Wedge map entries | 15 | 15 | **18** | +3 new downstream wedges |
| `signal.match` events | 75 | 69 | **84** | **+15 vs 2.AF (+22%)** |
| `wake.requested` events | 79 | 71 | **86** | **+15 vs 2.AF (+21%)** |
| VdSwap calls | 2 | 2 | **6** | **3×** |
| tid=1 NtWaitForSingleObjectEx calls | (wedged spin) | 3,233,583 | **51** | **63,400×** |
| tid=1 events | (wedged spin) | 13,301,954 | **148,773** | **89× ↓ (no more spin)** |
**VdSwap moved from 2 → 6.** Three additional `VdSwap` calls land in the
trace — meaning the frame-presentation path actually fires now. This was
2 in both 2.V and 2.AF; 2.AI is the first iterate where it grows. Real
rendering progression.
tid=12 (DPC dispatcher, secondary gate target): still **Blocked on
Event `0x00001004`** at PC `0x824ac578`. Unchanged from 2.V/2.AF.
Independent cascade.
## Thread-by-thread post-fix wedge analysis
The exit-state.json now contains **18 wedge entries** (up from 15 in
2.AF). Newly added:
- **tid=1 → Event `0x000010e8`** at PC `0x824ac578` — *previously
hidden* by the polarity bug's fast-path. Now exposed as a real
blocker (waits for VSync ISR signaling that never arrives). This is
the natural "wedge moved one level deeper" pattern (#41/#42 class).
- tid=21 → Event `0x0000151c` / `0x01000000` — appears downstream of
tid=5/tid=17 progress.
- tid=20 → Event `0x0000151c` / Sema `0x00001528` — same downstream
surface (already flagged in 2.AF's "next-iterate" list).
tid=14 reverts to Exited (vs tid=17 in 2.AF) — confirming that the
2.AF "tid=17 vs tid=14 swap" was a timing-shift on the deadline-fire
fix, and the underlying tid=14 producer-exhaustion divergence (2.AE
target) is unaltered by this fix.
## Cross-engine context
2.AH had pinned canary's analog wait as VSync-gated. Now that our event
has the correct semantics (auto-reset, not permanently-signaled), the
*next* question — "is the VSync ISR reaching this handle on time?" —
becomes meaningful for the first time. Per 2.AH's notes, the canary's
analog wait returns ~17.18 ms (one VSync period). Ours blocks
indefinitely after 2 cycles, suggesting the ISR is either not firing
for tid=1's handle or the wake path doesn't reach this auto-reset
event.
This is left for a subsequent iterate (see next-iterate recommendation).
## Third-order observations (no claims, just data)
- 1.45× event-count growth in this iterate (45.2 M → 65.7 M) is in the
same ballpark as 2.AF's 3.5× from the deadline-fire fix. Per-fix
diminishing returns are visible — each independent blocker peels off
more progression but the wedge surface is widening, not collapsing.
- VdSwap = 6: still not a full frame-rate (would be ~12,000 at 60 Hz
across 208 s), but the **mere fact** that VdSwap > 2 is the first
rendering progression since 2.V landed two days ago. The
XAudio/XInput surfaces are likely the next limiter.
- tid=11 (XAudio worker, blocked on Events `0x828a3244` / `0x828a3220`)
remains unchanged — the XAudio stub from 2.AB is the remaining
independent blocker.
## Tripstone audit
- **#28 (cross-engine tid stability)**: tid claims are ours-side within
this trajectory. Canary references rely on prior 2.AH mapping
(`+ ctx_ptr` for cross-engine equivalence).
- **#39 (composite progression IS progression)**: Honored. The headline
separately reports (a) the primary state-change (1.05M iter → 51
calls + permanent block), (b) the cascade volume (1.45× events), and
(c) VdSwap growth (2 → 6, the first real rendering progression
metric).
- **#40 (no single-keystone framing)**: Care taken. Headline reads
`WEDGE-PACED-CASCADE-FOLLOWS`, body explicitly lists 3+ remaining
independent blockers (tid=11 XAudio, tid=14 first-divergence, new
tid=20/21 events). The 2 prior open follow-ups (2.AE, 2.AG, 2.AI
XAudio, 2.AH) are explicitly retained.
- **#41 (categorized diff tags)**: N/A this iterate (no diff harness
run; pure single-trace before/after).
- **#42 (Phase-A blind to blocked-forever)**: Exit-state JSON used
throughout. tid=1's Blocked-on-0x10e8 post-fix is visible only
because of that dump.
- **#43 (no budget-cap framing)**: Budget cap reached but trace had
structural progression throughout (1.37× wallclock vs 2.AF). Cascade
observation robust.
- **#44 refined (rate+shape comparison)**: Pre-fix wait rate
463,475/sec on 0x10e8; post-fix 2 events then block — vs canary's
~60/sec one VSync period each. Shape now matches canary structurally
(blocking auto-reset); rate diverges in the *opposite* direction (we
block forever; canary blocks ~17 ms each cycle). This is the
expected next-step exposure.
## Confidence
- **HIGH** that the patch is correct and minimal: 1-LOC negation,
0 test regressions, determinism preserved bit-for-bit on event count,
head-100K and tail-100K cmp/md5.
- **HIGH** that the polarity bug is dispatched: trace evidence
(3,233,583 → 51 NtWait calls on tid=1; 1,077,847 → 3 events on the
wedge handle) is unambiguous. Exit-state JSON shows the event
correctly classified as auto-reset (`manual_reset: false,
signaled: false`).
- **HIGH** that the cascade is genuine (1.45× events, 1.37× wallclock,
+15 signal.match/wake.requested events, VdSwap 2→6 — all up).
- **MEDIUM-HIGH** that other guest events created with the same
pattern were silently mis-classified across the codebase. Any event
the guest creates with `event_type=1` (auto-reset) prior to this
fix was actually behaving as manual-reset — meaning many wait sites
could be hiding similar fast-path bugs. Worth a regression-grep next.
- **MEDIUM** that the next wedge (tid=1 on 0x10e8 with no signaler) is
small. The VSync ISR path → tid=1's auto-reset handle is the
obvious surface but the wiring may need its own fix.
- **LOW** that gameplay is imminent. VdSwap 6 is rendering progression
but a full game frame needs ~60+ swaps/sec at steady state, and the
XAudio / first-divergence / DPC blockers remain. Several more
cascade iterations likely needed.
## Next-iterate recommendation
Priority list:
1. **2.AJ (VSync ISR → 0x10e8 wiring)** — the new wedge exposed by
this iterate. tid=1 correctly blocks but no signaler reaches the
handle. Likely in `try_inject_graphics_interrupt` (main.rs:3729) or
the callback's user_data path. Approx **5-30 LOC**, single-file.
2. **2.AE (tid=14 first-divergence diff)** — unchanged priority from
2.AF list. ~0 LOC pure trace mining.
3. **2.AI XAudio stub** — tid=11 still wedged on `0x828a3244` /
`0x828a3220`. exports.rs:4591-4598 still a no-op. Approx 5-150 LOC.
4. **2.AG (`do_wait_multiple` `wait.begin`)** — observability gap.
~10 LOC.
5. **Regression-grep for other inverted-polarity callers** — any other
guest-API entry that maps NT's "event_type" the wrong way? Quick
scan: `nt_create_timer` is fine, `ensure_dispatcher_object` is fine.
No further hits in current corpus, but worth a CI tripwire (e.g.
`Event/Manual` audit-create label asserting `manual_reset == true`).
I recommend **2.AJ next** (it's the wedge this iterate just exposed,
single-thread, single-handle, single-file).
## Artifacts
Under `xenia-rs/audit-runs/iterate-2AI-tid1-xnotify-fix/`:
- `ours-cold.jsonl` (16.07 GB, 65,691,821 events) — primary trace
- `ours-cold.stdout.log` (empty — quiet mode)
- `ours-cold.stderr.log` (single exit-thread-state notice)
- `exit-thread-state.json` (17.4 KB; 21 alive + 18 wedge entries)
- `ours-cold-run2.jsonl` (16.07 GB, 65,691,821 events) — determinism
check, bit-identical event count, head & tail strip-host_ns matches
- `ours-cold-run2.{stdout,stderr}.log`
- `writer-report.md` (this file)
xenia-canary UNCHANGED.
Engine state: head + 2.AF patch (`+18` in `xenia-app/src/main.rs`) +
2.AI patch (`+16/-2` in `xenia-kernel/src/exports.rs`). Both patches
retained in working tree, uncommitted (per the cumulative-LOC policy
noted in 2.W's report).