handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
101
audit-runs/phase-absorber-review/absorber-inventory.md
Normal file
101
audit-runs/phase-absorber-review/absorber-inventory.md
Normal file
@@ -0,0 +1,101 @@
|
||||
# Absorber inventory (Phase absorber-review, 2026-05-19)
|
||||
|
||||
The diff tool currently lands three absorbers that cross reading-error #23
|
||||
(matching genuinely different guest behavior at the diff layer). Each is
|
||||
documented below — trigger, match heuristic, rationale, what is silenced.
|
||||
|
||||
The investigation goal is to determine whether any of them is hiding
|
||||
signal flow that would explain the AUDIT-049 wedge (tid=13 blocked on
|
||||
Event/Auto handle `0x12d0`, sister wedges `0x1020/0x1040/0x10A8/0x10E4/0x12B8`,
|
||||
`sub_825070F0` worker spawner fires 0×).
|
||||
|
||||
## A) Shared-global `handle.create` floating absorb (Phase C+18)
|
||||
|
||||
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
|
||||
`is_shared_global_handle_create` + `cross_tid_floating_sids`.
|
||||
* **Trigger condition**: at a kind mismatch, exactly one side has
|
||||
`handle.create` whose SID is in the cross-tid floating set.
|
||||
* **Match heuristic**: the SID equals the deterministic
|
||||
`shared_global_sid(pointer, object_type)` recipe (FNV-1a over marker
|
||||
`0xC01AB005`, pointer, object_type) OR appears across ≥2 distinct tids
|
||||
in either engine's stream (cross-tid usage heuristic).
|
||||
* **Rationale**: process-global dispatcher objects (XAudio voice-volume
|
||||
semaphores, shared CSes, shared KEVENTs) get lazy-wrapped by whichever
|
||||
guest thread is the first toucher; that thread differs between cold
|
||||
runs. The SID recipe is scheduling-invariant so the diff can absorb
|
||||
the `handle.create` on the "wrong" tid.
|
||||
* **What's silenced**: `handle.create` events for process-global
|
||||
dispatchers. Per-thread (`alloc_handle_for`/`AddHandle`) handle.create
|
||||
events are NOT silenced because their SID uses the per-(tid, idx)
|
||||
recipe.
|
||||
|
||||
## B) Shared-global `wait.begin` floating absorb (Phase C+21)
|
||||
|
||||
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
|
||||
`is_shared_global_wait_begin`.
|
||||
* **Trigger condition**: at a kind mismatch, exactly one side has
|
||||
`wait.begin` whose `handles_semantic_ids` list includes at least one
|
||||
SID in the shared-global set.
|
||||
* **Match heuristic**: any of the wait's handles matches the
|
||||
shared-global SID criterion above. For `wait_type=all`, ANY single
|
||||
shared-global handle is enough to classify the whole wait as
|
||||
floating (heuristic risk: a wait on one shared + multiple per-thread
|
||||
handles is fully absorbed).
|
||||
* **Rationale**: contention on shared dispatchers is host-scheduler
|
||||
driven. One cold run may emit `wait.begin` (slow path) while another
|
||||
fast-paths past it without ever blocking. Reading-error #32.
|
||||
* **What's silenced**: `wait.begin` events that touch shared-global
|
||||
dispatchers. The associated `wait.end` (which has its own field
|
||||
skips per `SKIP_PAYLOAD_FIELDS_BY_KIND`) still aligns positionally.
|
||||
|
||||
## C) Nested-CS-cleanup absorber, Phase D D-extension (v1.5)
|
||||
|
||||
* **File**: `diff_events.py::_try_absorb_nested_cs_cleanup`, invoked from
|
||||
`diff_one_tid`.
|
||||
* **Trigger condition**: kind mismatch where canary has
|
||||
`import.call RtlEnterCriticalSection` while ours has
|
||||
`import.call RtlLeaveCriticalSection`. Pattern is exact — NO other
|
||||
kind-mismatch shape engages this absorber.
|
||||
* **Match heuristic**: walks canary forward consuming balanced
|
||||
`[Enter-block(3), Leave-block(3)]` pairs (each pair = 6 events: import.call,
|
||||
kernel.call, kernel.return for Enter; same triple for Leave). Cap
|
||||
`_NESTED_CS_PAIR_CAP = 32`. After each pair, checks whether
|
||||
canary's next event has the SAME kind AND payload `name` as ours's
|
||||
current event — first convergence wins (greedy).
|
||||
* **Rationale**: the 104,607 cap is a producer-throughput divergence:
|
||||
canary's preemptive host-OS scheduling lets a peer tid insert more
|
||||
work items into a CS-protected registry/tree during a notification-event
|
||||
wait window than ours's cooperative scheduler does. Canary then
|
||||
iterates `[E L]` cleanups over those entries; ours has fewer entries
|
||||
and fast-Leaves. Per Phase D forensics, this is a real guest-behavior
|
||||
divergence, not jitter.
|
||||
* **What's silenced**: contiguous `[E L]` blocks on canary's side at
|
||||
the specific Enter-vs-Leave mismatch site (~+439 events at the
|
||||
104,607→105,046 advance per the D-extension memory).
|
||||
* **Stated caveat**: this explicitly crosses reading-error #23. The
|
||||
band-aid was approved because the underlying root cause requires
|
||||
preempting the cooperative scheduler (invalidates 23 phases of digest
|
||||
stability; out of scope per H' plan).
|
||||
|
||||
## Cross-references for wedge hunt
|
||||
|
||||
Per Phase W ground truth, the unsignaled handles at deadlock are:
|
||||
|
||||
```
|
||||
0x00001020 Event/Manual waiters=1 signals=0 waits=1 wakes=0
|
||||
0x00001040 Event/Auto waiters=0 signals=0 waits=32 wakes=0
|
||||
0x000010b0 Event/Auto waiters=0 signals=0 waits=7 wakes=0
|
||||
0x000010ec Event/Manual waiters=1 signals=0 waits=2 wakes=0
|
||||
0x000012d0 Event/Auto waiters=1 signals=0 waits=1 wakes=0 ← THE WEDGE
|
||||
0x000012e4 Event/Auto waiters=1 signals=0 waits=1 wakes=0
|
||||
```
|
||||
|
||||
Per the dossier caveat (AUDIT-049 era ID `0x1288` → Phase W ID `0x12d0`),
|
||||
handle ID is allocator-ordinal-dependent and does NOT match across
|
||||
engines. So we look up by **canary's analog handles** via the canary
|
||||
event stream — i.e. any Event/Auto whose tid+site equals canary's
|
||||
analog of ours's tid=13 `sub_821CB030+0x1B0` worker create call. Per
|
||||
Phase W's table, canary tid=14/15 are the worker cluster (1.9M / 995K
|
||||
events). If an absorbed event on canary is a worker-cluster
|
||||
`handle.create`/`wait.begin` for an event-like object, that's wedge-
|
||||
relevant.
|
||||
Reference in New Issue
Block a user