Files
xenia-rs/audit-runs/phase-absorber-review/cross-reference.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

133 lines
5.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Cross-reference — absorbed events vs Phase W wedge ground truth
Baseline diff: Phase W cold canary `canary-wedge.head250k.jsonl` (56 MB,
head-truncated to ~250k canary tid=6 events) vs ours
`ours-postfix.jsonl` (28 MB). Default tid-map `6=1,4=11,7=2,12=7,14=9,15=10`.
## Total absorbed events: 8
| absorber | side | events |
|---|---|---|
| shared-global | ours | 1 |
| wait-begin | canary | 1 |
| nested-cs | canary | 6 (1 [E,L] pair = 6 events) |
## Per-absorbed-event analysis
### Nested-cs (6 events, canary tid=6 idx 104607104612)
Pure `RtlEnterCriticalSection`+`RtlLeaveCriticalSection` import/kernel.call/
kernel.return triples on canary's side. Return values all `0x00000000`
(uncontended fast-path). **No handles referenced.** Not wedge-relevant by
construction — these are CS API calls, not signal-flow events.
### Wait-begin (1 event, canary tid=6 idx 104622)
```
SID: a25a16a4f6f547aa (object_type=1 EVENT)
raw_handle_id: 0xf8000044 (canary kernel-table slot)
created at: canary tid=10 idx=843 (worker thread)
used by: 108 wait.begin events across canary tids 6, 9, 10, 17, 18
```
**Embedded inside an `RtlEnterCriticalSection` block** (idx 104620
import.call → 104621 kernel.call → 104622 wait.begin → 104623 kernel.return).
This is canary's CS slow-path — the CS was contended so the wait.begin
fired on the **CS dispatcher Event**. Object_type=1 (EVENT) is the Xbox
kernel's representation of the CS's owned-by-other-thread dispatcher;
NOT a user-mode `NtCreateEvent`-created Event.
The Event is created on worker tid=10 because in canary the worker did
run and contend on this CS. In ours the workers don't run so the CS is
never contended; ours fast-paths through (uncontended kernel.return at
idx 104616 with status 0).
**Wedge handles in ours** (per `halt-on-deadlock-dump.txt`) are:
`0x12d0`, `0x1020`, `0x1040`, `0x10b0`, `0x10ec`, `0x12e4` — all
`object_type=1 EVENT` but all created via `NtCreateEvent`
at `LR=0x824a9f6c` from worker tid=13. They're worker-LOCAL Events
(SIDs `d5e23609d3948568` etc., computed from ours's per-tid recipe),
NOT the shared CS dispatcher Event `a25a16a4f6f547aa`.
**Verdict**: absorber is correctly suppressing CS-contention scheduling
jitter, not wedge signal flow. The Event canary waits on is the
CS dispatcher proxy, never the user-mode worker-private Events.
### Shared-global (1 event, ours tid=10 idx 2)
```
SID: ac8315b371bcf7cb (object_type=3 SEMAPHORE)
raw_handle_id: 0x828a3230 (guest VA — well-known XAudio voice-volume
semaphore, documented in C+18)
```
ours emits `handle.create` for this Semaphore at idx 2 because
`ensure_dispatcher_object` synthesizes the shadow KernelObject at
first touch (Phase C+17). Canary doesn't emit a corresponding
`handle.create` on the same tid pair because the canary first toucher
was a different host thread — classic process-global first-toucher
race that C+18 was specifically designed for.
**Wedge handles are all EVENT (object_type=1).** This is a SEMAPHORE
(type=3). Different object class, different code path (XAudio voice
volume). Not wedge-relevant.
**Verdict**: absorber is correctly suppressing first-toucher race for
a shared XAudio dispatcher, not wedge signal flow.
## Selective-disable matched-prefix deltas
Baseline (all absorbers ON): main tid=6→1 matched=**105,128**.
| disabled absorber | main matched | delta | sister 15→10 matched |
|---|---|---|---|
| (none — baseline) | 105,128 | 0 | 16 |
| shared-global | 105,128 | 0 | 2 (14) |
| wait-begin | 104,616 | 512 | 16 |
| nested-cs | 104,607 | 521 | 16 |
The delta pattern matches the absorbed events exactly:
- nested-cs's 6 absorbed events at idx 104,607104,612 enabled the
104,607 → 105,128 advance (combined with subsequent wait-begin).
- wait-begin's single absorb at idx 104,622 enabled the 104,616 →
105,128 advance (without it, the absorber-chain stops there).
- shared-global's single absorb on tid=15→10 enabled that sister
chain's 2 → 16 advance.
## Cross-reference verdict
**None of the absorbed events reference a wedge-relevant handle.**
Specifically:
1. Nested-cs absorbs RtlEnter/Leave **API events** — no handles involved.
2. Wait-begin absorbs a CS-dispatcher Event used in CS contention.
The wedge Events are user-mode `NtCreateEvent` outputs from worker
tid=13 — DIFFERENT object class than CS dispatchers.
3. Shared-global absorbs an XAudio SEMAPHORE — wedge handles are all
EVENT type.
## What the absorbers DO reveal indirectly
The wait-begin and nested-cs absorbers fire because canary's main
thread (tid=6) waits on a CS that ours never contends on. **The reason
ours never contends on it is because the worker cluster (canary
tid=9/10/14/15/17/18) never runs** — they emit 17 and 77 events in
ours (vs 995k and 1.9M in canary) per Phase W ground truth.
The absorbers are therefore CORRECTLY treating the contention pattern
as scheduling jitter at the diff layer. The underlying root cause —
workers don't bootstrap — is what Phase W identified and is unchanged
by absorber behavior.
Even if we disabled all three absorbers, the surfaced divergences
would be:
- canary's main waiting on a CS dispatcher that ours doesn't create
(because the contending worker is absent), AND
- canary's main entering CS nested-cleanup branches because the
CS-protected registry has more entries (because workers inserted them).
Both are downstream effects of the same upstream "workers don't run"
root cause that Phase D's contention-replay (Stage 3/4) and quantum
(Stage 0) experiments already failed to unblock. No new signal-flow
gap is exposed.