Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
102 lines
5.3 KiB
Markdown
102 lines
5.3 KiB
Markdown
# Absorber inventory (Phase absorber-review, 2026-05-19)
|
||
|
||
The diff tool currently lands three absorbers that cross reading-error #23
|
||
(matching genuinely different guest behavior at the diff layer). Each is
|
||
documented below — trigger, match heuristic, rationale, what is silenced.
|
||
|
||
The investigation goal is to determine whether any of them is hiding
|
||
signal flow that would explain the AUDIT-049 wedge (tid=13 blocked on
|
||
Event/Auto handle `0x12d0`, sister wedges `0x1020/0x1040/0x10A8/0x10E4/0x12B8`,
|
||
`sub_825070F0` worker spawner fires 0×).
|
||
|
||
## A) Shared-global `handle.create` floating absorb (Phase C+18)
|
||
|
||
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
|
||
`is_shared_global_handle_create` + `cross_tid_floating_sids`.
|
||
* **Trigger condition**: at a kind mismatch, exactly one side has
|
||
`handle.create` whose SID is in the cross-tid floating set.
|
||
* **Match heuristic**: the SID equals the deterministic
|
||
`shared_global_sid(pointer, object_type)` recipe (FNV-1a over marker
|
||
`0xC01AB005`, pointer, object_type) OR appears across ≥2 distinct tids
|
||
in either engine's stream (cross-tid usage heuristic).
|
||
* **Rationale**: process-global dispatcher objects (XAudio voice-volume
|
||
semaphores, shared CSes, shared KEVENTs) get lazy-wrapped by whichever
|
||
guest thread is the first toucher; that thread differs between cold
|
||
runs. The SID recipe is scheduling-invariant so the diff can absorb
|
||
the `handle.create` on the "wrong" tid.
|
||
* **What's silenced**: `handle.create` events for process-global
|
||
dispatchers. Per-thread (`alloc_handle_for`/`AddHandle`) handle.create
|
||
events are NOT silenced because their SID uses the per-(tid, idx)
|
||
recipe.
|
||
|
||
## B) Shared-global `wait.begin` floating absorb (Phase C+21)
|
||
|
||
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
|
||
`is_shared_global_wait_begin`.
|
||
* **Trigger condition**: at a kind mismatch, exactly one side has
|
||
`wait.begin` whose `handles_semantic_ids` list includes at least one
|
||
SID in the shared-global set.
|
||
* **Match heuristic**: any of the wait's handles matches the
|
||
shared-global SID criterion above. For `wait_type=all`, ANY single
|
||
shared-global handle is enough to classify the whole wait as
|
||
floating (heuristic risk: a wait on one shared + multiple per-thread
|
||
handles is fully absorbed).
|
||
* **Rationale**: contention on shared dispatchers is host-scheduler
|
||
driven. One cold run may emit `wait.begin` (slow path) while another
|
||
fast-paths past it without ever blocking. Reading-error #32.
|
||
* **What's silenced**: `wait.begin` events that touch shared-global
|
||
dispatchers. The associated `wait.end` (which has its own field
|
||
skips per `SKIP_PAYLOAD_FIELDS_BY_KIND`) still aligns positionally.
|
||
|
||
## C) Nested-CS-cleanup absorber, Phase D D-extension (v1.5)
|
||
|
||
* **File**: `diff_events.py::_try_absorb_nested_cs_cleanup`, invoked from
|
||
`diff_one_tid`.
|
||
* **Trigger condition**: kind mismatch where canary has
|
||
`import.call RtlEnterCriticalSection` while ours has
|
||
`import.call RtlLeaveCriticalSection`. Pattern is exact — NO other
|
||
kind-mismatch shape engages this absorber.
|
||
* **Match heuristic**: walks canary forward consuming balanced
|
||
`[Enter-block(3), Leave-block(3)]` pairs (each pair = 6 events: import.call,
|
||
kernel.call, kernel.return for Enter; same triple for Leave). Cap
|
||
`_NESTED_CS_PAIR_CAP = 32`. After each pair, checks whether
|
||
canary's next event has the SAME kind AND payload `name` as ours's
|
||
current event — first convergence wins (greedy).
|
||
* **Rationale**: the 104,607 cap is a producer-throughput divergence:
|
||
canary's preemptive host-OS scheduling lets a peer tid insert more
|
||
work items into a CS-protected registry/tree during a notification-event
|
||
wait window than ours's cooperative scheduler does. Canary then
|
||
iterates `[E L]` cleanups over those entries; ours has fewer entries
|
||
and fast-Leaves. Per Phase D forensics, this is a real guest-behavior
|
||
divergence, not jitter.
|
||
* **What's silenced**: contiguous `[E L]` blocks on canary's side at
|
||
the specific Enter-vs-Leave mismatch site (~+439 events at the
|
||
104,607→105,046 advance per the D-extension memory).
|
||
* **Stated caveat**: this explicitly crosses reading-error #23. The
|
||
band-aid was approved because the underlying root cause requires
|
||
preempting the cooperative scheduler (invalidates 23 phases of digest
|
||
stability; out of scope per H' plan).
|
||
|
||
## Cross-references for wedge hunt
|
||
|
||
Per Phase W ground truth, the unsignaled handles at deadlock are:
|
||
|
||
```
|
||
0x00001020 Event/Manual waiters=1 signals=0 waits=1 wakes=0
|
||
0x00001040 Event/Auto waiters=0 signals=0 waits=32 wakes=0
|
||
0x000010b0 Event/Auto waiters=0 signals=0 waits=7 wakes=0
|
||
0x000010ec Event/Manual waiters=1 signals=0 waits=2 wakes=0
|
||
0x000012d0 Event/Auto waiters=1 signals=0 waits=1 wakes=0 ← THE WEDGE
|
||
0x000012e4 Event/Auto waiters=1 signals=0 waits=1 wakes=0
|
||
```
|
||
|
||
Per the dossier caveat (AUDIT-049 era ID `0x1288` → Phase W ID `0x12d0`),
|
||
handle ID is allocator-ordinal-dependent and does NOT match across
|
||
engines. So we look up by **canary's analog handles** via the canary
|
||
event stream — i.e. any Event/Auto whose tid+site equals canary's
|
||
analog of ours's tid=13 `sub_821CB030+0x1B0` worker create call. Per
|
||
Phase W's table, canary tid=14/15 are the worker cluster (1.9M / 995K
|
||
events). If an absorbed event on canary is a worker-cluster
|
||
`handle.create`/`wait.begin` for an event-like object, that's wedge-
|
||
relevant.
|