Files
xenia-rs/audit-runs/phase-absorber-review/absorber-inventory.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

102 lines
5.3 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Absorber inventory (Phase absorber-review, 2026-05-19)
The diff tool currently lands three absorbers that cross reading-error #23
(matching genuinely different guest behavior at the diff layer). Each is
documented below — trigger, match heuristic, rationale, what is silenced.
The investigation goal is to determine whether any of them is hiding
signal flow that would explain the AUDIT-049 wedge (tid=13 blocked on
Event/Auto handle `0x12d0`, sister wedges `0x1020/0x1040/0x10A8/0x10E4/0x12B8`,
`sub_825070F0` worker spawner fires 0×).
## A) Shared-global `handle.create` floating absorb (Phase C+18)
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
`is_shared_global_handle_create` + `cross_tid_floating_sids`.
* **Trigger condition**: at a kind mismatch, exactly one side has
`handle.create` whose SID is in the cross-tid floating set.
* **Match heuristic**: the SID equals the deterministic
`shared_global_sid(pointer, object_type)` recipe (FNV-1a over marker
`0xC01AB005`, pointer, object_type) OR appears across ≥2 distinct tids
in either engine's stream (cross-tid usage heuristic).
* **Rationale**: process-global dispatcher objects (XAudio voice-volume
semaphores, shared CSes, shared KEVENTs) get lazy-wrapped by whichever
guest thread is the first toucher; that thread differs between cold
runs. The SID recipe is scheduling-invariant so the diff can absorb
the `handle.create` on the "wrong" tid.
* **What's silenced**: `handle.create` events for process-global
dispatchers. Per-thread (`alloc_handle_for`/`AddHandle`) handle.create
events are NOT silenced because their SID uses the per-(tid, idx)
recipe.
## B) Shared-global `wait.begin` floating absorb (Phase C+21)
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
`is_shared_global_wait_begin`.
* **Trigger condition**: at a kind mismatch, exactly one side has
`wait.begin` whose `handles_semantic_ids` list includes at least one
SID in the shared-global set.
* **Match heuristic**: any of the wait's handles matches the
shared-global SID criterion above. For `wait_type=all`, ANY single
shared-global handle is enough to classify the whole wait as
floating (heuristic risk: a wait on one shared + multiple per-thread
handles is fully absorbed).
* **Rationale**: contention on shared dispatchers is host-scheduler
driven. One cold run may emit `wait.begin` (slow path) while another
fast-paths past it without ever blocking. Reading-error #32.
* **What's silenced**: `wait.begin` events that touch shared-global
dispatchers. The associated `wait.end` (which has its own field
skips per `SKIP_PAYLOAD_FIELDS_BY_KIND`) still aligns positionally.
## C) Nested-CS-cleanup absorber, Phase D D-extension (v1.5)
* **File**: `diff_events.py::_try_absorb_nested_cs_cleanup`, invoked from
`diff_one_tid`.
* **Trigger condition**: kind mismatch where canary has
`import.call RtlEnterCriticalSection` while ours has
`import.call RtlLeaveCriticalSection`. Pattern is exact — NO other
kind-mismatch shape engages this absorber.
* **Match heuristic**: walks canary forward consuming balanced
`[Enter-block(3), Leave-block(3)]` pairs (each pair = 6 events: import.call,
kernel.call, kernel.return for Enter; same triple for Leave). Cap
`_NESTED_CS_PAIR_CAP = 32`. After each pair, checks whether
canary's next event has the SAME kind AND payload `name` as ours's
current event — first convergence wins (greedy).
* **Rationale**: the 104,607 cap is a producer-throughput divergence:
canary's preemptive host-OS scheduling lets a peer tid insert more
work items into a CS-protected registry/tree during a notification-event
wait window than ours's cooperative scheduler does. Canary then
iterates `[E L]` cleanups over those entries; ours has fewer entries
and fast-Leaves. Per Phase D forensics, this is a real guest-behavior
divergence, not jitter.
* **What's silenced**: contiguous `[E L]` blocks on canary's side at
the specific Enter-vs-Leave mismatch site (~+439 events at the
104,607→105,046 advance per the D-extension memory).
* **Stated caveat**: this explicitly crosses reading-error #23. The
band-aid was approved because the underlying root cause requires
preempting the cooperative scheduler (invalidates 23 phases of digest
stability; out of scope per H' plan).
## Cross-references for wedge hunt
Per Phase W ground truth, the unsignaled handles at deadlock are:
```
0x00001020 Event/Manual waiters=1 signals=0 waits=1 wakes=0
0x00001040 Event/Auto waiters=0 signals=0 waits=32 wakes=0
0x000010b0 Event/Auto waiters=0 signals=0 waits=7 wakes=0
0x000010ec Event/Manual waiters=1 signals=0 waits=2 wakes=0
0x000012d0 Event/Auto waiters=1 signals=0 waits=1 wakes=0 ← THE WEDGE
0x000012e4 Event/Auto waiters=1 signals=0 waits=1 wakes=0
```
Per the dossier caveat (AUDIT-049 era ID `0x1288` → Phase W ID `0x12d0`),
handle ID is allocator-ordinal-dependent and does NOT match across
engines. So we look up by **canary's analog handles** via the canary
event stream — i.e. any Event/Auto whose tid+site equals canary's
analog of ours's tid=13 `sub_821CB030+0x1B0` worker create call. Per
Phase W's table, canary tid=14/15 are the worker cluster (1.9M / 995K
events). If an absorbed event on canary is a worker-cluster
`handle.create`/`wait.begin` for an event-like object, that's wedge-
relevant.