handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
132
audit-runs/phase-absorber-review/cross-reference.md
Normal file
132
audit-runs/phase-absorber-review/cross-reference.md
Normal file
@@ -0,0 +1,132 @@
|
||||
# Cross-reference — absorbed events vs Phase W wedge ground truth
|
||||
|
||||
Baseline diff: Phase W cold canary `canary-wedge.head250k.jsonl` (56 MB,
|
||||
head-truncated to ~250k canary tid=6 events) vs ours
|
||||
`ours-postfix.jsonl` (28 MB). Default tid-map `6=1,4=11,7=2,12=7,14=9,15=10`.
|
||||
|
||||
## Total absorbed events: 8
|
||||
|
||||
| absorber | side | events |
|
||||
|---|---|---|
|
||||
| shared-global | ours | 1 |
|
||||
| wait-begin | canary | 1 |
|
||||
| nested-cs | canary | 6 (1 [E,L] pair = 6 events) |
|
||||
|
||||
## Per-absorbed-event analysis
|
||||
|
||||
### Nested-cs (6 events, canary tid=6 idx 104607–104612)
|
||||
|
||||
Pure `RtlEnterCriticalSection`+`RtlLeaveCriticalSection` import/kernel.call/
|
||||
kernel.return triples on canary's side. Return values all `0x00000000`
|
||||
(uncontended fast-path). **No handles referenced.** Not wedge-relevant by
|
||||
construction — these are CS API calls, not signal-flow events.
|
||||
|
||||
### Wait-begin (1 event, canary tid=6 idx 104622)
|
||||
|
||||
```
|
||||
SID: a25a16a4f6f547aa (object_type=1 EVENT)
|
||||
raw_handle_id: 0xf8000044 (canary kernel-table slot)
|
||||
created at: canary tid=10 idx=843 (worker thread)
|
||||
used by: 108 wait.begin events across canary tids 6, 9, 10, 17, 18
|
||||
```
|
||||
|
||||
**Embedded inside an `RtlEnterCriticalSection` block** (idx 104620
|
||||
import.call → 104621 kernel.call → 104622 wait.begin → 104623 kernel.return).
|
||||
This is canary's CS slow-path — the CS was contended so the wait.begin
|
||||
fired on the **CS dispatcher Event**. Object_type=1 (EVENT) is the Xbox
|
||||
kernel's representation of the CS's owned-by-other-thread dispatcher;
|
||||
NOT a user-mode `NtCreateEvent`-created Event.
|
||||
|
||||
The Event is created on worker tid=10 because in canary the worker did
|
||||
run and contend on this CS. In ours the workers don't run so the CS is
|
||||
never contended; ours fast-paths through (uncontended kernel.return at
|
||||
idx 104616 with status 0).
|
||||
|
||||
**Wedge handles in ours** (per `halt-on-deadlock-dump.txt`) are:
|
||||
`0x12d0`, `0x1020`, `0x1040`, `0x10b0`, `0x10ec`, `0x12e4` — all
|
||||
`object_type=1 EVENT` but all created via `NtCreateEvent`
|
||||
at `LR=0x824a9f6c` from worker tid=13. They're worker-LOCAL Events
|
||||
(SIDs `d5e23609d3948568` etc., computed from ours's per-tid recipe),
|
||||
NOT the shared CS dispatcher Event `a25a16a4f6f547aa`.
|
||||
|
||||
**Verdict**: absorber is correctly suppressing CS-contention scheduling
|
||||
jitter, not wedge signal flow. The Event canary waits on is the
|
||||
CS dispatcher proxy, never the user-mode worker-private Events.
|
||||
|
||||
### Shared-global (1 event, ours tid=10 idx 2)
|
||||
|
||||
```
|
||||
SID: ac8315b371bcf7cb (object_type=3 SEMAPHORE)
|
||||
raw_handle_id: 0x828a3230 (guest VA — well-known XAudio voice-volume
|
||||
semaphore, documented in C+18)
|
||||
```
|
||||
|
||||
ours emits `handle.create` for this Semaphore at idx 2 because
|
||||
`ensure_dispatcher_object` synthesizes the shadow KernelObject at
|
||||
first touch (Phase C+17). Canary doesn't emit a corresponding
|
||||
`handle.create` on the same tid pair because the canary first toucher
|
||||
was a different host thread — classic process-global first-toucher
|
||||
race that C+18 was specifically designed for.
|
||||
|
||||
**Wedge handles are all EVENT (object_type=1).** This is a SEMAPHORE
|
||||
(type=3). Different object class, different code path (XAudio voice
|
||||
volume). Not wedge-relevant.
|
||||
|
||||
**Verdict**: absorber is correctly suppressing first-toucher race for
|
||||
a shared XAudio dispatcher, not wedge signal flow.
|
||||
|
||||
## Selective-disable matched-prefix deltas
|
||||
|
||||
Baseline (all absorbers ON): main tid=6→1 matched=**105,128**.
|
||||
|
||||
| disabled absorber | main matched | delta | sister 15→10 matched |
|
||||
|---|---|---|---|
|
||||
| (none — baseline) | 105,128 | 0 | 16 |
|
||||
| shared-global | 105,128 | 0 | 2 (−14) |
|
||||
| wait-begin | 104,616 | −512 | 16 |
|
||||
| nested-cs | 104,607 | −521 | 16 |
|
||||
|
||||
The delta pattern matches the absorbed events exactly:
|
||||
- nested-cs's 6 absorbed events at idx 104,607–104,612 enabled the
|
||||
104,607 → 105,128 advance (combined with subsequent wait-begin).
|
||||
- wait-begin's single absorb at idx 104,622 enabled the 104,616 →
|
||||
105,128 advance (without it, the absorber-chain stops there).
|
||||
- shared-global's single absorb on tid=15→10 enabled that sister
|
||||
chain's 2 → 16 advance.
|
||||
|
||||
## Cross-reference verdict
|
||||
|
||||
**None of the absorbed events reference a wedge-relevant handle.**
|
||||
|
||||
Specifically:
|
||||
1. Nested-cs absorbs RtlEnter/Leave **API events** — no handles involved.
|
||||
2. Wait-begin absorbs a CS-dispatcher Event used in CS contention.
|
||||
The wedge Events are user-mode `NtCreateEvent` outputs from worker
|
||||
tid=13 — DIFFERENT object class than CS dispatchers.
|
||||
3. Shared-global absorbs an XAudio SEMAPHORE — wedge handles are all
|
||||
EVENT type.
|
||||
|
||||
## What the absorbers DO reveal indirectly
|
||||
|
||||
The wait-begin and nested-cs absorbers fire because canary's main
|
||||
thread (tid=6) waits on a CS that ours never contends on. **The reason
|
||||
ours never contends on it is because the worker cluster (canary
|
||||
tid=9/10/14/15/17/18) never runs** — they emit 17 and 77 events in
|
||||
ours (vs 995k and 1.9M in canary) per Phase W ground truth.
|
||||
|
||||
The absorbers are therefore CORRECTLY treating the contention pattern
|
||||
as scheduling jitter at the diff layer. The underlying root cause —
|
||||
workers don't bootstrap — is what Phase W identified and is unchanged
|
||||
by absorber behavior.
|
||||
|
||||
Even if we disabled all three absorbers, the surfaced divergences
|
||||
would be:
|
||||
- canary's main waiting on a CS dispatcher that ours doesn't create
|
||||
(because the contending worker is absent), AND
|
||||
- canary's main entering CS nested-cleanup branches because the
|
||||
CS-protected registry has more entries (because workers inserted them).
|
||||
|
||||
Both are downstream effects of the same upstream "workers don't run"
|
||||
root cause that Phase D's contention-replay (Stage 3/4) and quantum
|
||||
(Stage 0) experiments already failed to unblock. No new signal-flow
|
||||
gap is exposed.
|
||||
Reference in New Issue
Block a user