handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/iterate-2T-wake-requested/writer-report.md
+++ b/audit-runs/iterate-2T-wake-requested/writer-report.md
@@ -0,0 +1,319 @@
+# Iterate 2.T — `wake.requested` instrumentation in `wake_eligible_waiters` (writer report)
+
+**Date:** 2026-05-28. **LOC delta:** engine **~95** (event_log.rs +55,
+exports.rs ~40 across helpers + 2 in-loop snapshot/emit blocks), tooling
+**+1** (diff_events.py `ENGINE_LOCAL_KINDS` extension). All retained,
+cvar-gated default-off via existing `event_log::is_enabled()`.
+**Tests:** 227 kernel + 19 path + 149 cpu + 300 main + ~30 smaller =
+full suite PASS, 0 regressions.
+**Cascade:** N/A — observability class, no semantics changed.
+
+## Headline
+
+**C-2B-SCHEDULER-PICK-BUG-IDENTIFIED.** The kernel-side wake primitive
+(`wake_eligible_waiters`) **correctly transitions tid=6 from
+Blocked → Ready EVERY TIME** a signal targets it. 5 `wake.requested`
+events fire for `target_tid=6` on handle `0x000010b4` over [485, 764]ms,
+**all with `transitioned=true, new_state="Ready", target_cpu=5`.** After
+the 5th wake at 764.29 ms, tid=6 is Ready on CPU5 but never executes
+again — last tid=6 event remains 663.60 ms — even though the trace
+continues until 766.86 ms (tids 4, 5, 13 keep emitting events). The
+exit-thread-state confirms tid=6 finishes in `Ready` on hw_id=5,
+priority=0 — sharing CPU5 with **tid=10 at priority=15** (and tid=12
+at priority=0). The `pick_runnable` rule in
+`xenia-cpu/src/scheduler.rs:211-218` is strict-priority within a slot:
+`max_by_key(|(i, t)| (t.priority, -(*i as i64)))`. tid=10's priority=15
+deterministically beats tid=6's priority=0. **C-2a (kernel wake-call
+bug) is FALSIFIED**; **C-2b (scheduler-pick-skipping Ready tid on CPU)
+is CONFIRMED**, with the specific mechanism being strict-priority
+starvation by a same-CPU peer (tid=10, priority=15).
+
+## Mode
+
+Pure observability emitter. No semantic change to `wake_eligible_waiters`
+or any caller. Cvar-gated via existing `event_log::is_enabled()` and
+implicitly skipped when zero waiters are eligible (the wake loop's
+existing early-returns happen before the prior-state snapshot).
+ENGINE_LOCAL in the diff tool — does not affect matched-prefix.
+
+Invocation (identical to 2.Q):
+
+```
+XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec \
+  -n 50000000 --quiet \
+  --phase-a-event-log audit-runs/iterate-2T-wake-requested/ours-cold.jsonl \
+  "../Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso"
+```
+
+Exit code 0. Output: `ours-cold.jsonl` (28.7 MB, 121,641 events —
+121,569 baseline + 36 `signal.match` + 36 `wake.requested`),
+`exit-thread-state.json` (9651 bytes, bit-identical to 2.M/2.N/2.Q/2.S),
+`ours-cold.stderr.log` (single 2.M emission-notice line),
+`ours-cold.stdout.log` (empty — quiet mode).
+
+## Patch summary
+
+| file | purpose | LOC added |
+|---|---|---:|
+| `crates/xenia-kernel/src/event_log.rs` | new `emit_wake_requested(signaling_tid, cycle, target_tid, target_handle, wait_kind, transitioned, new_state, target_cpu)` — emits schema-v1 `wake.requested` event mirroring the `signal.match` shape from 2.Q | ~55 |
+| `crates/xenia-kernel/src/exports.rs` | `wake_classify_state(HwState) -> (wait_kind, state_name)` + `wake_signaling_ctx(state) -> (tid, cycle)` + `emit_wake_requested_for(state, signaler, cycle, target, handle, prior_kind, prior_name)` shim (~40 LOC), plus 2 in-loop snapshot blocks inside `wake_eligible_waiters` (~14 LOC: manual-fan-out path + auto-winner path) capturing prior state BEFORE `set_wake_status_for_waitany` + `wake_ref` and emitting AFTER | ~40 |
+| `crates/xenia-kernel/src/exports.rs` | `let (signaling_tid, signaling_cycle) = wake_signaling_ctx(state);` once at entry to `wake_eligible_waiters` (captures the signal-call caller's identity for the entire fan-out attribution) | 1 |
+| `tools/diff-events/diff_events.py` | `ENGINE_LOCAL_KINDS += {"wake.requested"}` so the new kind consumes a per-tid idx slot on the emitter side without alignment cost (analog to 2.Q's `signal.match` addition) | 1 |
+
+Total engine ~95 LOC, tooling +1. Within the 80-150 target / 200 hard
+cap.
+
+Schema-v1 event emitted:
+```json
+{"schema_version":1,"engine":"ours","kind":"wake.requested",
+ "tid":<signaling tid>,"tid_event_idx":<idx>,"guest_cycle":<cycle>,
+ "host_ns":<ns>,"deterministic":true,
+ "payload":{"target_tid":<woken tid>,"target_handle":"0x<8hex>",
+            "wait_kind":"WaitAny"|"WaitAll"|"WaitSingle"|"Other",
+            "transitioned":true|false,
+            "new_state":"Ready"|"StillBlocked"|"AlreadyReady"|"Exited"|"ServicingIrq"|"Idle"|"Other",
+            "target_cpu":<hw_id>|null}}
+```
+
+Emit policy: exactly ONE event per waiter the kernel touches (the wake
+loop's early-returns naturally skip zero-eligible-waiter cases per
+spec). Snapshot of prior state captured immediately BEFORE
+`set_wake_status_for_waitany` + `wake_ref`; post-state read AFTER.
+`transitioned` is the strict `prior=Blocked && post=Ready` predicate.
+
+## Test results
+
+```
+cargo build --release  -> OK (1 pre-existing dead_code warning unrelated)
+cargo test --release   -> all suites PASS:
+  xenia-kernel    227 passed, 0 failed
+  xenia-cpu       149 passed, 0 failed
+  xenia-app       300 passed, 0 failed
+  xenia-path       19 passed, 0 failed
+  + 30+ smaller suites, 0 failures total
+```
+
+Baseline event count UNCHANGED at 121,569 (vs 2.J/2.K/2.Q/2.S);
+`signal.match` count UNCHANGED at 36 (vs 2.Q/2.S); 36 NEW
+`wake.requested` events emitted. Total 121,641 = 121,569 + 36 + 36.
+
+## `wake.requested` summary (all 36 events)
+
+| field | distribution |
+|---|---|
+| `transitioned` | true: 36 — **every wake successfully transitioned its target from Blocked → Ready** |
+| `new_state` | "Ready": 36 — no `StillBlocked` / `AlreadyReady` / etc. |
+| `wait_kind` | "WaitAny": 36 — all single-WaitAny parks (consistent with NtWaitForSingleObjectEx routing through `WaitAny{handles:[h]}` per `do_wait_single`) |
+| signaler tids | tid=5:19, tid=1:9, tid=6:3, tid=2:1, tid=11:1, tid=9:1, tid=8:1, tid=13:1 |
+| target tids | tid=5:11, tid=1:9, tid=4:7, **tid=6:5**, tid=8:2, tid=9:1, tid=10:1 |
+
+**Every signal that targeted a parked waiter produced a successful
+Blocked→Ready transition.** The kernel wake primitive plumbing is sound
+for this trajectory.
+
+## tid=6 deep-dive — the C-2 case
+
+Five `wake.requested` events for `target_tid=6`, paired 1:1 with the
+5 `signal.match` events on handle `0x000010b4` (same timestamps to ms
+precision):
+
+| ns (ms) | signaler | handle | transitioned | new_state | target_cpu | tid=6 next event |
+|---:|---:|---|---|---|---:|---|
+| 485.03 | tid=5 | 0x10b4 | true | Ready | 5 | (tid=6 runs through ~520 ms) |
+| 485.14 | tid=5 | 0x10b4 | true | Ready | 5 | (back-to-back wake; coalesces) |
+| 510.64 | tid=5 | 0x10b4 | true | Ready | 5 | (tid=6 runs through ~660 ms) |
+| 659.00 | tid=5 | 0x10b4 | true | Ready | 5 | tid=6 last event @ 663.60 ms |
+| **764.29** | **tid=5** | **0x10b4** | **true** | **Ready** | **5** | **NEVER — tid=6 emits no further events; trace continues to 766.86 ms** |
+
+**The first 4 wakes are followed by tid=6 execution.** The 5th wake
+(764.29 ms) transitions tid=6 Blocked→Ready on CPU5 — exactly as
+expected — but tid=6 never gets picked by the scheduler before the
+50M-instruction budget cuts off ~2.6 ms later. Other tids on the
+trace continue emitting events past 764.29 ms:
+
+| tid | last event ns (ms) | vs tid=6 quiescence |
+|---:|---:|---|
+| 1 | 760.12 | predecessor |
+| 4 | 766.84 | **+103 ms after tid=6 last event; +2.55 ms after tid=6's final wake** |
+| 5 | 766.86 | +103 ms after; the signaler still alive |
+| 13 | 763.04 | +99 ms after |
+
+So the trace is NOT truncated; tid=6 specifically is starved post-wake.
+
+## CPU5 contention — the C-2b mechanism
+
+Exit-thread-state CPU5 (hw_id=5) Ready set:
+
+| tid | state | hw_id | affinity | priority | pc |
+|---:|---|---:|---|---:|---|
+| **6** | **Ready** | **5** | **0x20** | **0** | 0x824ab214 |
+| **10** | **Ready** | **5** | **0x20** | **15** | 0x824d1404 |
+| **12** | **Ready** | **5** | **0x20** | 0 | 0x824aa6a4 |
+| 3 | Blocked | 5 | 0x20 | 0 | — |
+
+Per `xenia-cpu/src/scheduler.rs:211-218`:
+
+```rust
+pub fn pick_runnable(&self) -> Option<usize> {
+    self.runqueue.iter().enumerate()
+        .filter(|(_, t)| matches!(t.state, HwState::Ready | HwState::ServicingIrq(_)))
+        .max_by_key(|(i, t)| (t.priority, -(*i as i64)))
+        .map(|(i, _)| i)
+}
+```
+
+Strict-priority within a slot. tid=10 (`priority=15`) deterministically
+beats tid=6 (`priority=0`) whenever tid=10 is also Ready. tid=10's
+last trace event is 727.92 ms, but tid=10 is still Ready in the exit
+state — meaning **tid=10 is executing past 727.92 ms in a path that
+doesn't cross kernel boundaries** (no further `kernel.call`, `import.call`,
+etc.), almost certainly a CPU-bound loop. That loop starves
+tid=6 on CPU5 indefinitely.
+
+This is the C-2b mechanism with concrete identification:
+
+- **NOT a wake-call bug** (`wake.requested` fires 5× with
+  `transitioned=true, new_state=Ready` on tid=6) — falsifies C-2a.
+- **IS a scheduler-pick-skip** but specifically a strict-priority
+  starvation: when a same-slot peer is at priority ≥1, lower-priority
+  Ready threads on that slot never run.
+
+## Disambiguation result vs goal-spec
+
+| outcome | gate predicate | result | conclusion |
+|---|---|---|---|
+| C-2A-KERNEL-WAKE-BUG | `wake.requested` for tid=6 absent OR `transitioned=false` | NO (5 events, all transitioned) | **FALSIFIED** |
+| C-2B-SCHEDULER-PICK-BUG | `wake.requested` fires for tid=6 with `transitioned=true, new_state=Ready` AND tid=6 still doesn't execute | YES (5 events, last at 764.29 ms; tid=6 last event 663.60 ms; never runs after the 5th wake) | **CONFIRMED** |
+| NEITHER-CLEAN-NEW-HYPOTHESIS | neither sub-hypothesis matches | NO | NOT TRIGGERED |
+| RUN-FAILED | non-zero exit / crash / hang / no `wake.requested` events | NO (EXIT=0, 36 events) | NOT TRIGGERED |
+
+**Result: C-2b CONFIRMED with concrete mechanism — strict-priority
+starvation on CPU5 by tid=10 (priority=15) blocking tid=6 (priority=0).**
+
+## Tripstone audit
+
+- **#28** (cross-engine tid stability): No cross-engine tid claims. All
+  reported tids are ours-side scheduler tids; stable within this
+  trajectory (verified per-tid counts intact vs 2.Q/2.S baseline).
+- **#39** (composite progression IS progression): NO progression claim.
+  VdSwap=1, draws=0, render_targets=0 — bit-identical to
+  2.J/2.K/2.Q/2.N/2.S. The C-2b confirmation is a *root-cause
+  identification*, not a progression event.
+- **#40** (single-keystone framing): Care taken. The headline names
+  *strict-priority starvation* as the SPECIFIC mechanism behind C-2b
+  (not just "the scheduler is buggy"). The data isolates the priority
+  asymmetry (15 vs 0) and same-slot pinning (0x20=CPU5 only). Open
+  follow-ups are not collapsed: *why* is tid=10 priority=15 (canary
+  parity? game intent?), and *why* does the strict-priority scheduler
+  not implement starvation-avoidance (intentional simplification or
+  oversight)?
+- **#41** (categorized diff tags): `wake.requested` is ENGINE_LOCAL in
+  the diff harness — doesn't affect matched-prefix.
+- **#42** (Phase-A blind to blocked-forever waits): Used the always-on
+  `exit-thread-state.json` to confirm tid=6's final Ready state
+  post-trace-end (Phase-A alone would have shown only the 5 wakes
+  without the final unscheduled outcome).
+- **#43** (no budget-cap framing): 2.S already established the 50M
+  budget reproduces the phenomenon; this iterate uses 50M and confirms
+  the C-2b mechanism is structural and reproducible in the smaller
+  budget. The 2.6 ms gap between the 5th wake (764.29) and tid=5/4's
+  final events (~766.86) is sufficient to show the scheduler had ample
+  opportunity to pick tid=6 and didn't.
+
+## Comparison: 2.K → 2.Q → 2.S → 2.T
+
+| gate | 2.K (500M, baseline) | 2.Q (50M + signal.match) | 2.S (500M + signal.match) | 2.T (50M + signal.match + wake.requested) |
+|------|---:|---:|---:|---:|
+| total events | 121,569 | 121,605 | 121,605 | **121,641** |
+| baseline events | 121,569 | 121,569 | 121,569 | **121,569** |
+| `signal.match` events | n/a | 36 | 36 | **36** |
+| `wake.requested` events | n/a | n/a | n/a | **36** |
+| `wake.requested` for tid=6 (transitioned=true) | n/a | n/a | n/a | **5** |
+| exit-state size | 9651 | 9651 | 9651 | **9651** |
+| tid=6 final state | Ready | Ready | Ready | **Ready** |
+| Termination | budget hit | (50M) | budget hit (500M) | budget hit (50M) |
+
+Baseline + signal.match counts bit-identical to 2.Q/2.S. New 36
+wake.requested events 1:1 with the 36 signal.match events
+(every successful signal that found a waiter resulted in a wake call
+that transitioned the waiter — sanity check of the kernel wake plumbing
+across all signaler tids and target handles).
+
+## Confidence
+
+- **HIGH** that the patch is correct and observability-only: 0 test
+  regressions; `wake_eligible_waiters` body unchanged except for the
+  snapshot reads and post-call emits.
+- **HIGH** that `wake.requested` events fire as designed: 36 events
+  emitted, all with `transitioned=true && new_state=Ready` (which
+  matches the expected branch in the wake primitive — every waker
+  pulled off a non-empty waiter queue should transition Blocked→Ready).
+- **HIGH** that tid=6 is woken 5× by tid=5 on handle 0x10b4 with
+  successful transition — grepped exhaustively by exact handle string
+  + target_tid filter.
+- **HIGH** that tid=6 emits no events after 663.60 ms despite the 5th
+  wake at 764.29 ms (per-tid scan).
+- **HIGH** that exit-state thread geometry matches 2.M/2.N/2.Q/2.S
+  bit-identically (9651 bytes; tid=6 final Ready/hw_id=5/affinity=0x20).
+- **HIGH** that C-2a (kernel wake-call bug) is FALSIFIED: the
+  wake-call IS issued AND IS transitioning tid=6 Blocked→Ready
+  correctly.
+- **HIGH** that C-2b (scheduler-pick-skip) is CONFIRMED with strict-
+  priority starvation as the specific mechanism, given CPU5's exit-
+  state Ready set has tid=10 at priority=15 (vs tid=6 at priority=0)
+  pinned to affinity 0x20.
+- **MEDIUM-HIGH** that tid=10 is in a CPU-bound non-kernel-crossing
+  loop after 727.92 ms (inferred from "still Ready at exit" + "no
+  trace events after 727.92 ms"). Could also be a tight kernel.call
+  loop where the trace happens to not have a corresponding emit — but
+  on CPU5 with priority advantage, tid=10 keeps the slot.
+
+## Next-iterate recommendation
+
+Three complementary directions, priority order:
+
+**(1) 2.U — investigate why tid=10 is priority=15 / what tid=10 is
+doing post-727.92ms (~0 LOC observability OR ~50 LOC PC-trace +
+KeSetBasePriorityThread hook).** Pull from existing audit DB
+(`sylpheed.db`) the basic block containing PC `0x824d1404` (tid=10's
+final PC) — find the loop body. Compare with canary's analog tid to
+see whether canary's same-PC thread also runs at priority=15. If canary
+runs the same code at a different priority OR has a yield/sleep in the
+loop, that's the divergence point. If canary is identical, then the
+problem is **ours's scheduler lacks fair preemption** that canary's
+host-OS-thread scheduler provides naturally (canary multiplexes guest
+threads onto OS threads, getting OS-level fairness for free; ours's
+cooperative-pick-runnable model needs explicit anti-starvation).
+
+**(2) 2.V — add starvation-avoidance to `pick_runnable`** (~30-80 LOC
+engine, semantic change). E.g., dynamic priority aging or a
+quantum-expiration override that occasionally picks the highest-priority
+*not-recently-run* thread. Risk: changes scheduling semantics —
+matched-prefix and Phase-A trace counts will shift; needs careful
+parity check vs canary. **NOT recommended as next** without first
+establishing canary baseline (2.U).
+
+**(3) 2.W — canary-side `wake.requested` mirror (~30-60 LOC C++)** for
+cross-engine parity diff. Lower urgency than 2.U because the ours-side
+data alone is sufficient to identify the scheduler-pick mechanism;
+canary parity becomes relevant only if 2.U finds tid=10 is intentionally
+high-priority and canary handles it via OS-level fairness.
+
+**Recommended:** 2.U first (~30 min DB lookup + cross-engine check),
+then 2.V or 2.W depending on what 2.U reveals.
+
+## Artifacts
+
+Under `xenia-rs/audit-runs/iterate-2T-wake-requested/`:
+
+- `ours-cold.jsonl` (28.7 MB, 121,641 events = 121,569 baseline + 36
+  `signal.match` + 36 `wake.requested`)
+- `ours-cold.stdout.log` (empty — quiet mode)
+- `ours-cold.stderr.log` (single 2.M emission-notice line)
+- `exit-thread-state.json` (9651 bytes; bit-identical to
+  2.M/2.N/2.Q/2.S — 13 threads + 10 wedge entries)
+- `writer-report.md` (this file)
+
+Engine HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` + uncommitted
+2.Q signal.match patch + this iterate's 2.T wake.requested patch.
+xenia-canary UNCHANGED.