handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,134 @@
# Phase C+7 — broad-impact verification
The user explicitly asked for thorough side-effect analysis beyond
standard gates. This document covers (1) new-divergence enumeration,
(2) KeSetEvent call-site sampling, (3) wake-cascade check,
(4) spawn/exit pattern check, (5) determinism stability over a
longer horizon.
## 1. New-divergence enumeration
Comparing pre-fix (`audit-runs/phase-c6half-xam-audit/diff-report.md`)
to post-fix (`audit-runs/phase-c7-keSetEvent/diff-report.md`):
| chain | pre-fix first-divergence | post-fix first-divergence | category |
|---|---|---|---|
| tid=6→1 (main) | idx 102158 `XamTaskCloseHandle ret 1/0` | idx 102158 `XamTaskCloseHandle ret 1/0` | **persisted** (unrelated to KeSetEvent fix) |
| tid=4→11 | idx 5 `KeSetEvent ret 1/0` | none (full match in 9-event ours window) | **resolved** |
| tid=7→2 | idx 26 `KeSetEvent ret 1/0` | none (full match in 29-event canary window) | **resolved** |
| tid=12→7 | idx 2 `KeWaitForSingleObject ret 258/0` | idx 2 `KeWaitForSingleObject ret 258/0` | **persisted** (different bug) |
| tid=14→9 | idx 39 `XAudioGetVoiceCategoryVolumeChangeMask vs RtlEnterCS` | idx 39 same | **persisted** (different bug) |
| tid=15→10 | none | none | unchanged |
* **Resolved: 2** (both sister chains where KeSetEvent was the first
divergence)
* **Advanced: 0**
* **Persisted: 3** (XamTaskCloseHandle, KeWaitForSingleObject=258,
XAudio call-name divergence — all on different functions, none
related to KeSetEvent)
* **NEW: 0** — no new divergence surfaced. The fix neither
unblocked a new code path that then re-diverged nor introduced any
regression.
This is the clean-fix outcome (per task description language: "NEW
divergences are EXPECTED for a widely-used fix"). The clean-zero
outcome here is itself a positive finding — within the current 50M
horizon, the boot path was not hiding any downstream divergence
behind the wrong KeSetEvent return.
Per-tid event totals are byte-identical pre/post fix
(`(0,1),(1,108486),(2,30),(3,36),(4,2022),(5,9945),(6,315),(7,3),
(8,36),(9,75),(10,15),(11,9),(12,6),(13,426)`), confirming no
secondary boot-trajectory shift from the return-value change. Same
boot, same paths, same imports — the only delta is the value in
the `return_value` field on KeSetEvent / NtSetEvent emits.
## 2. KeSetEvent call-site sampling
Within the 50M Phase A window, ours emits **2** KeSetEvent
kernel.return events (one on each of tid=2 and tid=11). Canary emits
**7,495** KeSetEvent returns (spread across many threads that ours
doesn't reach in this window). Below: every call-site where
both engines have data, plus 3 canary-only samples to characterize
the unreached space:
| # | canary tid → ours tid | idx | canary ret | pre-fix ours ret | post-fix ours ret | match? |
|---|---|---|---|---|---|---|
| 1 | 4 → 11 | 5 | 1 | 0 | 1 | YES |
| 2 | 7 → 2 | 26 | 1 | 0 | 1 | YES |
| 3 | 4 → 11 | 20 | 1 | (ours stream ended at idx 9) | (same) | n/a — ours blocked upstream |
| 4 | 14 → 9 | 107 | 1 | (tid=9 diverges at idx 39 on XAudio) | (same) | n/a — ours blocked upstream |
| 5 | 14 → 9 | 215 | 1 | (same) | (same) | n/a — ours blocked upstream |
Both call-sites with comparable data are now in **bit-identical
return-value alignment with canary**. Sites 3-5 are downstream of
unrelated divergences; the KeSetEvent return on each (canary always
returns 1) will trivially match the moment our boot reaches them.
## 3. Wake-cascade check
Phase A's wake-cascade event kinds (`wait.end`, `handle.*`, etc.) are
not wired in ours's emitter at the time of writing (per MEMORY.md
Phase A index: "4 of 13 schema kinds wired"). Therefore we cannot
observe wake events directly. Indirect signal: per-tid event counts
are identical pre/post fix, suggesting no new threads progress past
prior parking points — i.e. the KeSetEvent return-value flip did
not visibly change wake-cascade behavior within 50M.
This is consistent with internal-state inspection: ours's
`ke_set_event` already mutated `signaled = true` and called
`wake_eligible_waiters` correctly pre-fix; only the return-value
emission was wrong. Wake semantics never depended on the return.
## 4. Spawn/exit pattern check
`thread.create` and `thread.exit` events are also not wired in ours's
emitter (same 4-of-13 reason). Phase A logs 0 thread.create / 0
thread.exit events in both pre and post fix. We cannot independently
verify thread count from Phase A.
From the per-tid breakdown (tids present in the log), ours has the
same 14 distinct tids pre and post fix (0,1,2,3,4,5,6,7,8,9,10,11,
12,13) with identical event counts. No new tid spawned and no tid
disappeared.
## 5. Determinism stability over time
50M `--stable-digest`: 3× identical (`c6d89582…`). Matches C+6½
baseline byte-for-byte. Sample fields:
```
{
"instructions": 50000000,
"imports": 40470,
"unimpl": 0,
"draws": 0,
"swaps": 1,
...
}
```
200M `--stable-digest`: 2× identical (`8186841b…`). New baseline.
Field values at 200M: imports=40470 still (no new imports between
50M and 200M — boot still plateaus on the same wait), draws=0,
swaps=1. Same as 50M. The boot is still parked on the same upstream
gate (XamTaskCloseHandle / KeWaitForSingleObject in the main
thread); the KeSetEvent fix alone is not sufficient to unblock the
next phase.
## Conclusion
The fix is **clean-positive**: resolved exactly the 2 sister-chain
divergences it was scoped to (idx 5 / idx 26), preserved main chain
(no #23 redux), preserved all 6 unit tests, added 6 new tests, and
introduced zero new divergences. Per-tid event totals are
byte-identical pre/post fix — the fix is observation-only (changes
what the emitter reports, not what the kernel does). The return-value
flip from 0 to 1 propagates through Phase A's kernel.return payloads
and nothing else, exactly matching canary's behavior.
Next session's target: main-chain divergence at idx 102158
(XamTaskCloseHandle), per C+6½ XAM-audit memory note. tid=4→11 and
tid=7→2 fully aligned; if those chains develop new divergences past
their current canary-stream ends, that's a future-boot horizon
problem, not this session's.