Files
xenia-rs/audit-runs/phase-absorber-review/absorber-inventory.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

5.3 KiB
Raw Blame History

Absorber inventory (Phase absorber-review, 2026-05-19)

The diff tool currently lands three absorbers that cross reading-error #23 (matching genuinely different guest behavior at the diff layer). Each is documented below — trigger, match heuristic, rationale, what is silenced.

The investigation goal is to determine whether any of them is hiding signal flow that would explain the AUDIT-049 wedge (tid=13 blocked on Event/Auto handle 0x12d0, sister wedges 0x1020/0x1040/0x10A8/0x10E4/0x12B8, sub_825070F0 worker spawner fires 0×).

A) Shared-global handle.create floating absorb (Phase C+18)

  • File: diff_events.py::diff_one_tid, branch guarded by is_shared_global_handle_create + cross_tid_floating_sids.
  • Trigger condition: at a kind mismatch, exactly one side has handle.create whose SID is in the cross-tid floating set.
  • Match heuristic: the SID equals the deterministic shared_global_sid(pointer, object_type) recipe (FNV-1a over marker 0xC01AB005, pointer, object_type) OR appears across ≥2 distinct tids in either engine's stream (cross-tid usage heuristic).
  • Rationale: process-global dispatcher objects (XAudio voice-volume semaphores, shared CSes, shared KEVENTs) get lazy-wrapped by whichever guest thread is the first toucher; that thread differs between cold runs. The SID recipe is scheduling-invariant so the diff can absorb the handle.create on the "wrong" tid.
  • What's silenced: handle.create events for process-global dispatchers. Per-thread (alloc_handle_for/AddHandle) handle.create events are NOT silenced because their SID uses the per-(tid, idx) recipe.

B) Shared-global wait.begin floating absorb (Phase C+21)

  • File: diff_events.py::diff_one_tid, branch guarded by is_shared_global_wait_begin.
  • Trigger condition: at a kind mismatch, exactly one side has wait.begin whose handles_semantic_ids list includes at least one SID in the shared-global set.
  • Match heuristic: any of the wait's handles matches the shared-global SID criterion above. For wait_type=all, ANY single shared-global handle is enough to classify the whole wait as floating (heuristic risk: a wait on one shared + multiple per-thread handles is fully absorbed).
  • Rationale: contention on shared dispatchers is host-scheduler driven. One cold run may emit wait.begin (slow path) while another fast-paths past it without ever blocking. Reading-error #32.
  • What's silenced: wait.begin events that touch shared-global dispatchers. The associated wait.end (which has its own field skips per SKIP_PAYLOAD_FIELDS_BY_KIND) still aligns positionally.

C) Nested-CS-cleanup absorber, Phase D D-extension (v1.5)

  • File: diff_events.py::_try_absorb_nested_cs_cleanup, invoked from diff_one_tid.
  • Trigger condition: kind mismatch where canary has import.call RtlEnterCriticalSection while ours has import.call RtlLeaveCriticalSection. Pattern is exact — NO other kind-mismatch shape engages this absorber.
  • Match heuristic: walks canary forward consuming balanced [Enter-block(3), Leave-block(3)] pairs (each pair = 6 events: import.call, kernel.call, kernel.return for Enter; same triple for Leave). Cap _NESTED_CS_PAIR_CAP = 32. After each pair, checks whether canary's next event has the SAME kind AND payload name as ours's current event — first convergence wins (greedy).
  • Rationale: the 104,607 cap is a producer-throughput divergence: canary's preemptive host-OS scheduling lets a peer tid insert more work items into a CS-protected registry/tree during a notification-event wait window than ours's cooperative scheduler does. Canary then iterates [E L] cleanups over those entries; ours has fewer entries and fast-Leaves. Per Phase D forensics, this is a real guest-behavior divergence, not jitter.
  • What's silenced: contiguous [E L] blocks on canary's side at the specific Enter-vs-Leave mismatch site (~+439 events at the 104,607→105,046 advance per the D-extension memory).
  • Stated caveat: this explicitly crosses reading-error #23. The band-aid was approved because the underlying root cause requires preempting the cooperative scheduler (invalidates 23 phases of digest stability; out of scope per H' plan).

Cross-references for wedge hunt

Per Phase W ground truth, the unsignaled handles at deadlock are:

0x00001020 Event/Manual waiters=1 signals=0 waits=1 wakes=0
0x00001040 Event/Auto   waiters=0 signals=0 waits=32 wakes=0
0x000010b0 Event/Auto   waiters=0 signals=0 waits=7 wakes=0
0x000010ec Event/Manual waiters=1 signals=0 waits=2 wakes=0
0x000012d0 Event/Auto   waiters=1 signals=0 waits=1 wakes=0  ← THE WEDGE
0x000012e4 Event/Auto   waiters=1 signals=0 waits=1 wakes=0

Per the dossier caveat (AUDIT-049 era ID 0x1288 → Phase W ID 0x12d0), handle ID is allocator-ordinal-dependent and does NOT match across engines. So we look up by canary's analog handles via the canary event stream — i.e. any Event/Auto whose tid+site equals canary's analog of ours's tid=13 sub_821CB030+0x1B0 worker create call. Per Phase W's table, canary tid=14/15 are the worker cluster (1.9M / 995K events). If an absorbed event on canary is a worker-cluster handle.create/wait.begin for an event-like object, that's wedge- relevant.