handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,101 @@
# Absorber inventory (Phase absorber-review, 2026-05-19)
The diff tool currently lands three absorbers that cross reading-error #23
(matching genuinely different guest behavior at the diff layer). Each is
documented below — trigger, match heuristic, rationale, what is silenced.
The investigation goal is to determine whether any of them is hiding
signal flow that would explain the AUDIT-049 wedge (tid=13 blocked on
Event/Auto handle `0x12d0`, sister wedges `0x1020/0x1040/0x10A8/0x10E4/0x12B8`,
`sub_825070F0` worker spawner fires 0×).
## A) Shared-global `handle.create` floating absorb (Phase C+18)
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
`is_shared_global_handle_create` + `cross_tid_floating_sids`.
* **Trigger condition**: at a kind mismatch, exactly one side has
`handle.create` whose SID is in the cross-tid floating set.
* **Match heuristic**: the SID equals the deterministic
`shared_global_sid(pointer, object_type)` recipe (FNV-1a over marker
`0xC01AB005`, pointer, object_type) OR appears across ≥2 distinct tids
in either engine's stream (cross-tid usage heuristic).
* **Rationale**: process-global dispatcher objects (XAudio voice-volume
semaphores, shared CSes, shared KEVENTs) get lazy-wrapped by whichever
guest thread is the first toucher; that thread differs between cold
runs. The SID recipe is scheduling-invariant so the diff can absorb
the `handle.create` on the "wrong" tid.
* **What's silenced**: `handle.create` events for process-global
dispatchers. Per-thread (`alloc_handle_for`/`AddHandle`) handle.create
events are NOT silenced because their SID uses the per-(tid, idx)
recipe.
## B) Shared-global `wait.begin` floating absorb (Phase C+21)
* **File**: `diff_events.py::diff_one_tid`, branch guarded by
`is_shared_global_wait_begin`.
* **Trigger condition**: at a kind mismatch, exactly one side has
`wait.begin` whose `handles_semantic_ids` list includes at least one
SID in the shared-global set.
* **Match heuristic**: any of the wait's handles matches the
shared-global SID criterion above. For `wait_type=all`, ANY single
shared-global handle is enough to classify the whole wait as
floating (heuristic risk: a wait on one shared + multiple per-thread
handles is fully absorbed).
* **Rationale**: contention on shared dispatchers is host-scheduler
driven. One cold run may emit `wait.begin` (slow path) while another
fast-paths past it without ever blocking. Reading-error #32.
* **What's silenced**: `wait.begin` events that touch shared-global
dispatchers. The associated `wait.end` (which has its own field
skips per `SKIP_PAYLOAD_FIELDS_BY_KIND`) still aligns positionally.
## C) Nested-CS-cleanup absorber, Phase D D-extension (v1.5)
* **File**: `diff_events.py::_try_absorb_nested_cs_cleanup`, invoked from
`diff_one_tid`.
* **Trigger condition**: kind mismatch where canary has
`import.call RtlEnterCriticalSection` while ours has
`import.call RtlLeaveCriticalSection`. Pattern is exact — NO other
kind-mismatch shape engages this absorber.
* **Match heuristic**: walks canary forward consuming balanced
`[Enter-block(3), Leave-block(3)]` pairs (each pair = 6 events: import.call,
kernel.call, kernel.return for Enter; same triple for Leave). Cap
`_NESTED_CS_PAIR_CAP = 32`. After each pair, checks whether
canary's next event has the SAME kind AND payload `name` as ours's
current event — first convergence wins (greedy).
* **Rationale**: the 104,607 cap is a producer-throughput divergence:
canary's preemptive host-OS scheduling lets a peer tid insert more
work items into a CS-protected registry/tree during a notification-event
wait window than ours's cooperative scheduler does. Canary then
iterates `[E L]` cleanups over those entries; ours has fewer entries
and fast-Leaves. Per Phase D forensics, this is a real guest-behavior
divergence, not jitter.
* **What's silenced**: contiguous `[E L]` blocks on canary's side at
the specific Enter-vs-Leave mismatch site (~+439 events at the
104,607→105,046 advance per the D-extension memory).
* **Stated caveat**: this explicitly crosses reading-error #23. The
band-aid was approved because the underlying root cause requires
preempting the cooperative scheduler (invalidates 23 phases of digest
stability; out of scope per H' plan).
## Cross-references for wedge hunt
Per Phase W ground truth, the unsignaled handles at deadlock are:
```
0x00001020 Event/Manual waiters=1 signals=0 waits=1 wakes=0
0x00001040 Event/Auto waiters=0 signals=0 waits=32 wakes=0
0x000010b0 Event/Auto waiters=0 signals=0 waits=7 wakes=0
0x000010ec Event/Manual waiters=1 signals=0 waits=2 wakes=0
0x000012d0 Event/Auto waiters=1 signals=0 waits=1 wakes=0 ← THE WEDGE
0x000012e4 Event/Auto waiters=1 signals=0 waits=1 wakes=0
```
Per the dossier caveat (AUDIT-049 era ID `0x1288` → Phase W ID `0x12d0`),
handle ID is allocator-ordinal-dependent and does NOT match across
engines. So we look up by **canary's analog handles** via the canary
event stream — i.e. any Event/Auto whose tid+site equals canary's
analog of ours's tid=13 `sub_821CB030+0x1B0` worker create call. Per
Phase W's table, canary tid=14/15 are the worker cluster (1.9M / 995K
events). If an absorbed event on canary is a worker-cluster
`handle.create`/`wait.begin` for an event-like object, that's wedge-
relevant.