handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
134
audit-runs/phase-c7-keSetEvent/broad-impact.md
Normal file
134
audit-runs/phase-c7-keSetEvent/broad-impact.md
Normal file
@@ -0,0 +1,134 @@
|
||||
# Phase C+7 — broad-impact verification
|
||||
|
||||
The user explicitly asked for thorough side-effect analysis beyond
|
||||
standard gates. This document covers (1) new-divergence enumeration,
|
||||
(2) KeSetEvent call-site sampling, (3) wake-cascade check,
|
||||
(4) spawn/exit pattern check, (5) determinism stability over a
|
||||
longer horizon.
|
||||
|
||||
## 1. New-divergence enumeration
|
||||
|
||||
Comparing pre-fix (`audit-runs/phase-c6half-xam-audit/diff-report.md`)
|
||||
to post-fix (`audit-runs/phase-c7-keSetEvent/diff-report.md`):
|
||||
|
||||
| chain | pre-fix first-divergence | post-fix first-divergence | category |
|
||||
|---|---|---|---|
|
||||
| tid=6→1 (main) | idx 102158 `XamTaskCloseHandle ret 1/0` | idx 102158 `XamTaskCloseHandle ret 1/0` | **persisted** (unrelated to KeSetEvent fix) |
|
||||
| tid=4→11 | idx 5 `KeSetEvent ret 1/0` | none (full match in 9-event ours window) | **resolved** |
|
||||
| tid=7→2 | idx 26 `KeSetEvent ret 1/0` | none (full match in 29-event canary window) | **resolved** |
|
||||
| tid=12→7 | idx 2 `KeWaitForSingleObject ret 258/0` | idx 2 `KeWaitForSingleObject ret 258/0` | **persisted** (different bug) |
|
||||
| tid=14→9 | idx 39 `XAudioGetVoiceCategoryVolumeChangeMask vs RtlEnterCS` | idx 39 same | **persisted** (different bug) |
|
||||
| tid=15→10 | none | none | unchanged |
|
||||
|
||||
* **Resolved: 2** (both sister chains where KeSetEvent was the first
|
||||
divergence)
|
||||
* **Advanced: 0**
|
||||
* **Persisted: 3** (XamTaskCloseHandle, KeWaitForSingleObject=258,
|
||||
XAudio call-name divergence — all on different functions, none
|
||||
related to KeSetEvent)
|
||||
* **NEW: 0** — no new divergence surfaced. The fix neither
|
||||
unblocked a new code path that then re-diverged nor introduced any
|
||||
regression.
|
||||
|
||||
This is the clean-fix outcome (per task description language: "NEW
|
||||
divergences are EXPECTED for a widely-used fix"). The clean-zero
|
||||
outcome here is itself a positive finding — within the current 50M
|
||||
horizon, the boot path was not hiding any downstream divergence
|
||||
behind the wrong KeSetEvent return.
|
||||
|
||||
Per-tid event totals are byte-identical pre/post fix
|
||||
(`(0,1),(1,108486),(2,30),(3,36),(4,2022),(5,9945),(6,315),(7,3),
|
||||
(8,36),(9,75),(10,15),(11,9),(12,6),(13,426)`), confirming no
|
||||
secondary boot-trajectory shift from the return-value change. Same
|
||||
boot, same paths, same imports — the only delta is the value in
|
||||
the `return_value` field on KeSetEvent / NtSetEvent emits.
|
||||
|
||||
## 2. KeSetEvent call-site sampling
|
||||
|
||||
Within the 50M Phase A window, ours emits **2** KeSetEvent
|
||||
kernel.return events (one on each of tid=2 and tid=11). Canary emits
|
||||
**7,495** KeSetEvent returns (spread across many threads that ours
|
||||
doesn't reach in this window). Below: every call-site where
|
||||
both engines have data, plus 3 canary-only samples to characterize
|
||||
the unreached space:
|
||||
|
||||
| # | canary tid → ours tid | idx | canary ret | pre-fix ours ret | post-fix ours ret | match? |
|
||||
|---|---|---|---|---|---|---|
|
||||
| 1 | 4 → 11 | 5 | 1 | 0 | 1 | YES |
|
||||
| 2 | 7 → 2 | 26 | 1 | 0 | 1 | YES |
|
||||
| 3 | 4 → 11 | 20 | 1 | (ours stream ended at idx 9) | (same) | n/a — ours blocked upstream |
|
||||
| 4 | 14 → 9 | 107 | 1 | (tid=9 diverges at idx 39 on XAudio) | (same) | n/a — ours blocked upstream |
|
||||
| 5 | 14 → 9 | 215 | 1 | (same) | (same) | n/a — ours blocked upstream |
|
||||
|
||||
Both call-sites with comparable data are now in **bit-identical
|
||||
return-value alignment with canary**. Sites 3-5 are downstream of
|
||||
unrelated divergences; the KeSetEvent return on each (canary always
|
||||
returns 1) will trivially match the moment our boot reaches them.
|
||||
|
||||
## 3. Wake-cascade check
|
||||
|
||||
Phase A's wake-cascade event kinds (`wait.end`, `handle.*`, etc.) are
|
||||
not wired in ours's emitter at the time of writing (per MEMORY.md
|
||||
Phase A index: "4 of 13 schema kinds wired"). Therefore we cannot
|
||||
observe wake events directly. Indirect signal: per-tid event counts
|
||||
are identical pre/post fix, suggesting no new threads progress past
|
||||
prior parking points — i.e. the KeSetEvent return-value flip did
|
||||
not visibly change wake-cascade behavior within 50M.
|
||||
|
||||
This is consistent with internal-state inspection: ours's
|
||||
`ke_set_event` already mutated `signaled = true` and called
|
||||
`wake_eligible_waiters` correctly pre-fix; only the return-value
|
||||
emission was wrong. Wake semantics never depended on the return.
|
||||
|
||||
## 4. Spawn/exit pattern check
|
||||
|
||||
`thread.create` and `thread.exit` events are also not wired in ours's
|
||||
emitter (same 4-of-13 reason). Phase A logs 0 thread.create / 0
|
||||
thread.exit events in both pre and post fix. We cannot independently
|
||||
verify thread count from Phase A.
|
||||
|
||||
From the per-tid breakdown (tids present in the log), ours has the
|
||||
same 14 distinct tids pre and post fix (0,1,2,3,4,5,6,7,8,9,10,11,
|
||||
12,13) with identical event counts. No new tid spawned and no tid
|
||||
disappeared.
|
||||
|
||||
## 5. Determinism stability over time
|
||||
|
||||
50M `--stable-digest`: 3× identical (`c6d89582…`). Matches C+6½
|
||||
baseline byte-for-byte. Sample fields:
|
||||
|
||||
```
|
||||
{
|
||||
"instructions": 50000000,
|
||||
"imports": 40470,
|
||||
"unimpl": 0,
|
||||
"draws": 0,
|
||||
"swaps": 1,
|
||||
...
|
||||
}
|
||||
```
|
||||
|
||||
200M `--stable-digest`: 2× identical (`8186841b…`). New baseline.
|
||||
Field values at 200M: imports=40470 still (no new imports between
|
||||
50M and 200M — boot still plateaus on the same wait), draws=0,
|
||||
swaps=1. Same as 50M. The boot is still parked on the same upstream
|
||||
gate (XamTaskCloseHandle / KeWaitForSingleObject in the main
|
||||
thread); the KeSetEvent fix alone is not sufficient to unblock the
|
||||
next phase.
|
||||
|
||||
## Conclusion
|
||||
|
||||
The fix is **clean-positive**: resolved exactly the 2 sister-chain
|
||||
divergences it was scoped to (idx 5 / idx 26), preserved main chain
|
||||
(no #23 redux), preserved all 6 unit tests, added 6 new tests, and
|
||||
introduced zero new divergences. Per-tid event totals are
|
||||
byte-identical pre/post fix — the fix is observation-only (changes
|
||||
what the emitter reports, not what the kernel does). The return-value
|
||||
flip from 0 to 1 propagates through Phase A's kernel.return payloads
|
||||
and nothing else, exactly matching canary's behavior.
|
||||
|
||||
Next session's target: main-chain divergence at idx 102158
|
||||
(XamTaskCloseHandle), per C+6½ XAM-audit memory note. tid=4→11 and
|
||||
tid=7→2 fully aligned; if those chains develop new divergences past
|
||||
their current canary-stream ends, that's a future-boot horizon
|
||||
problem, not this session's.
|
||||
Reference in New Issue
Block a user