handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,132 @@
# Cross-reference — absorbed events vs Phase W wedge ground truth
Baseline diff: Phase W cold canary `canary-wedge.head250k.jsonl` (56 MB,
head-truncated to ~250k canary tid=6 events) vs ours
`ours-postfix.jsonl` (28 MB). Default tid-map `6=1,4=11,7=2,12=7,14=9,15=10`.
## Total absorbed events: 8
| absorber | side | events |
|---|---|---|
| shared-global | ours | 1 |
| wait-begin | canary | 1 |
| nested-cs | canary | 6 (1 [E,L] pair = 6 events) |
## Per-absorbed-event analysis
### Nested-cs (6 events, canary tid=6 idx 104607104612)
Pure `RtlEnterCriticalSection`+`RtlLeaveCriticalSection` import/kernel.call/
kernel.return triples on canary's side. Return values all `0x00000000`
(uncontended fast-path). **No handles referenced.** Not wedge-relevant by
construction — these are CS API calls, not signal-flow events.
### Wait-begin (1 event, canary tid=6 idx 104622)
```
SID: a25a16a4f6f547aa (object_type=1 EVENT)
raw_handle_id: 0xf8000044 (canary kernel-table slot)
created at: canary tid=10 idx=843 (worker thread)
used by: 108 wait.begin events across canary tids 6, 9, 10, 17, 18
```
**Embedded inside an `RtlEnterCriticalSection` block** (idx 104620
import.call → 104621 kernel.call → 104622 wait.begin → 104623 kernel.return).
This is canary's CS slow-path — the CS was contended so the wait.begin
fired on the **CS dispatcher Event**. Object_type=1 (EVENT) is the Xbox
kernel's representation of the CS's owned-by-other-thread dispatcher;
NOT a user-mode `NtCreateEvent`-created Event.
The Event is created on worker tid=10 because in canary the worker did
run and contend on this CS. In ours the workers don't run so the CS is
never contended; ours fast-paths through (uncontended kernel.return at
idx 104616 with status 0).
**Wedge handles in ours** (per `halt-on-deadlock-dump.txt`) are:
`0x12d0`, `0x1020`, `0x1040`, `0x10b0`, `0x10ec`, `0x12e4` — all
`object_type=1 EVENT` but all created via `NtCreateEvent`
at `LR=0x824a9f6c` from worker tid=13. They're worker-LOCAL Events
(SIDs `d5e23609d3948568` etc., computed from ours's per-tid recipe),
NOT the shared CS dispatcher Event `a25a16a4f6f547aa`.
**Verdict**: absorber is correctly suppressing CS-contention scheduling
jitter, not wedge signal flow. The Event canary waits on is the
CS dispatcher proxy, never the user-mode worker-private Events.
### Shared-global (1 event, ours tid=10 idx 2)
```
SID: ac8315b371bcf7cb (object_type=3 SEMAPHORE)
raw_handle_id: 0x828a3230 (guest VA — well-known XAudio voice-volume
semaphore, documented in C+18)
```
ours emits `handle.create` for this Semaphore at idx 2 because
`ensure_dispatcher_object` synthesizes the shadow KernelObject at
first touch (Phase C+17). Canary doesn't emit a corresponding
`handle.create` on the same tid pair because the canary first toucher
was a different host thread — classic process-global first-toucher
race that C+18 was specifically designed for.
**Wedge handles are all EVENT (object_type=1).** This is a SEMAPHORE
(type=3). Different object class, different code path (XAudio voice
volume). Not wedge-relevant.
**Verdict**: absorber is correctly suppressing first-toucher race for
a shared XAudio dispatcher, not wedge signal flow.
## Selective-disable matched-prefix deltas
Baseline (all absorbers ON): main tid=6→1 matched=**105,128**.
| disabled absorber | main matched | delta | sister 15→10 matched |
|---|---|---|---|
| (none — baseline) | 105,128 | 0 | 16 |
| shared-global | 105,128 | 0 | 2 (14) |
| wait-begin | 104,616 | 512 | 16 |
| nested-cs | 104,607 | 521 | 16 |
The delta pattern matches the absorbed events exactly:
- nested-cs's 6 absorbed events at idx 104,607104,612 enabled the
104,607 → 105,128 advance (combined with subsequent wait-begin).
- wait-begin's single absorb at idx 104,622 enabled the 104,616 →
105,128 advance (without it, the absorber-chain stops there).
- shared-global's single absorb on tid=15→10 enabled that sister
chain's 2 → 16 advance.
## Cross-reference verdict
**None of the absorbed events reference a wedge-relevant handle.**
Specifically:
1. Nested-cs absorbs RtlEnter/Leave **API events** — no handles involved.
2. Wait-begin absorbs a CS-dispatcher Event used in CS contention.
The wedge Events are user-mode `NtCreateEvent` outputs from worker
tid=13 — DIFFERENT object class than CS dispatchers.
3. Shared-global absorbs an XAudio SEMAPHORE — wedge handles are all
EVENT type.
## What the absorbers DO reveal indirectly
The wait-begin and nested-cs absorbers fire because canary's main
thread (tid=6) waits on a CS that ours never contends on. **The reason
ours never contends on it is because the worker cluster (canary
tid=9/10/14/15/17/18) never runs** — they emit 17 and 77 events in
ours (vs 995k and 1.9M in canary) per Phase W ground truth.
The absorbers are therefore CORRECTLY treating the contention pattern
as scheduling jitter at the diff layer. The underlying root cause —
workers don't bootstrap — is what Phase W identified and is unchanged
by absorber behavior.
Even if we disabled all three absorbers, the surfaced divergences
would be:
- canary's main waiting on a CS dispatcher that ours doesn't create
(because the contending worker is absent), AND
- canary's main entering CS nested-cleanup branches because the
CS-protected registry has more entries (because workers inserted them).
Both are downstream effects of the same upstream "workers don't run"
root cause that Phase D's contention-replay (Stage 3/4) and quantum
(Stage 0) experiments already failed to unblock. No new signal-flow
gap is exposed.