Files
xenia-rs/audit-runs/phase-c7-keSetEvent/broad-impact.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

6.0 KiB
Raw Blame History

Phase C+7 — broad-impact verification

The user explicitly asked for thorough side-effect analysis beyond standard gates. This document covers (1) new-divergence enumeration, (2) KeSetEvent call-site sampling, (3) wake-cascade check, (4) spawn/exit pattern check, (5) determinism stability over a longer horizon.

1. New-divergence enumeration

Comparing pre-fix (audit-runs/phase-c6half-xam-audit/diff-report.md) to post-fix (audit-runs/phase-c7-keSetEvent/diff-report.md):

chain pre-fix first-divergence post-fix first-divergence category
tid=6→1 (main) idx 102158 XamTaskCloseHandle ret 1/0 idx 102158 XamTaskCloseHandle ret 1/0 persisted (unrelated to KeSetEvent fix)
tid=4→11 idx 5 KeSetEvent ret 1/0 none (full match in 9-event ours window) resolved
tid=7→2 idx 26 KeSetEvent ret 1/0 none (full match in 29-event canary window) resolved
tid=12→7 idx 2 KeWaitForSingleObject ret 258/0 idx 2 KeWaitForSingleObject ret 258/0 persisted (different bug)
tid=14→9 idx 39 XAudioGetVoiceCategoryVolumeChangeMask vs RtlEnterCS idx 39 same persisted (different bug)
tid=15→10 none none unchanged
  • Resolved: 2 (both sister chains where KeSetEvent was the first divergence)
  • Advanced: 0
  • Persisted: 3 (XamTaskCloseHandle, KeWaitForSingleObject=258, XAudio call-name divergence — all on different functions, none related to KeSetEvent)
  • NEW: 0 — no new divergence surfaced. The fix neither unblocked a new code path that then re-diverged nor introduced any regression.

This is the clean-fix outcome (per task description language: "NEW divergences are EXPECTED for a widely-used fix"). The clean-zero outcome here is itself a positive finding — within the current 50M horizon, the boot path was not hiding any downstream divergence behind the wrong KeSetEvent return.

Per-tid event totals are byte-identical pre/post fix ((0,1),(1,108486),(2,30),(3,36),(4,2022),(5,9945),(6,315),(7,3), (8,36),(9,75),(10,15),(11,9),(12,6),(13,426)), confirming no secondary boot-trajectory shift from the return-value change. Same boot, same paths, same imports — the only delta is the value in the return_value field on KeSetEvent / NtSetEvent emits.

2. KeSetEvent call-site sampling

Within the 50M Phase A window, ours emits 2 KeSetEvent kernel.return events (one on each of tid=2 and tid=11). Canary emits 7,495 KeSetEvent returns (spread across many threads that ours doesn't reach in this window). Below: every call-site where both engines have data, plus 3 canary-only samples to characterize the unreached space:

# canary tid → ours tid idx canary ret pre-fix ours ret post-fix ours ret match?
1 4 → 11 5 1 0 1 YES
2 7 → 2 26 1 0 1 YES
3 4 → 11 20 1 (ours stream ended at idx 9) (same) n/a — ours blocked upstream
4 14 → 9 107 1 (tid=9 diverges at idx 39 on XAudio) (same) n/a — ours blocked upstream
5 14 → 9 215 1 (same) (same) n/a — ours blocked upstream

Both call-sites with comparable data are now in bit-identical return-value alignment with canary. Sites 3-5 are downstream of unrelated divergences; the KeSetEvent return on each (canary always returns 1) will trivially match the moment our boot reaches them.

3. Wake-cascade check

Phase A's wake-cascade event kinds (wait.end, handle.*, etc.) are not wired in ours's emitter at the time of writing (per MEMORY.md Phase A index: "4 of 13 schema kinds wired"). Therefore we cannot observe wake events directly. Indirect signal: per-tid event counts are identical pre/post fix, suggesting no new threads progress past prior parking points — i.e. the KeSetEvent return-value flip did not visibly change wake-cascade behavior within 50M.

This is consistent with internal-state inspection: ours's ke_set_event already mutated signaled = true and called wake_eligible_waiters correctly pre-fix; only the return-value emission was wrong. Wake semantics never depended on the return.

4. Spawn/exit pattern check

thread.create and thread.exit events are also not wired in ours's emitter (same 4-of-13 reason). Phase A logs 0 thread.create / 0 thread.exit events in both pre and post fix. We cannot independently verify thread count from Phase A.

From the per-tid breakdown (tids present in the log), ours has the same 14 distinct tids pre and post fix (0,1,2,3,4,5,6,7,8,9,10,11, 12,13) with identical event counts. No new tid spawned and no tid disappeared.

5. Determinism stability over time

50M --stable-digest: 3× identical (c6d89582…). Matches C+6½ baseline byte-for-byte. Sample fields:

{
  "instructions": 50000000,
  "imports": 40470,
  "unimpl": 0,
  "draws": 0,
  "swaps": 1,
  ...
}

200M --stable-digest: 2× identical (8186841b…). New baseline. Field values at 200M: imports=40470 still (no new imports between 50M and 200M — boot still plateaus on the same wait), draws=0, swaps=1. Same as 50M. The boot is still parked on the same upstream gate (XamTaskCloseHandle / KeWaitForSingleObject in the main thread); the KeSetEvent fix alone is not sufficient to unblock the next phase.

Conclusion

The fix is clean-positive: resolved exactly the 2 sister-chain divergences it was scoped to (idx 5 / idx 26), preserved main chain (no #23 redux), preserved all 6 unit tests, added 6 new tests, and introduced zero new divergences. Per-tid event totals are byte-identical pre/post fix — the fix is observation-only (changes what the emitter reports, not what the kernel does). The return-value flip from 0 to 1 propagates through Phase A's kernel.return payloads and nothing else, exactly matching canary's behavior.

Next session's target: main-chain divergence at idx 102158 (XamTaskCloseHandle), per C+6½ XAM-audit memory note. tid=4→11 and tid=7→2 fully aligned; if those chains develop new divergences past their current canary-stream ends, that's a future-boot horizon problem, not this session's.