Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
6.3 KiB
Phase W escalation — wedge unbroken by accumulated tooling
Outcome category: (C) — escalating cleanly.
The Phase W mini-fix landed (VdInitializeEngines returns 1 vs old
0; matches canary xboxkrnl_video.cc:271-279). This is a real
correctness fix that advances Phase A matched-prefix
105,046 → 105,112 (+66 events). But on the brief's actual gate —
swaps > 1 / draws > 0 / texture_cache_entries > 0 — the
check --stable-digest -n 500000000 run is byte-identical to
baseline: draws=0, swaps=1, render_targets=0. The fix does not
unblock progression.
What we verified afresh
-
The wedge is structurally identical to AUDIT-049/058/059/062/065:
- tid=1 join-waits tid=13 at
sub_82173990+0x2D0(handle0x12c8). - tid=13 wedges at
sub_821CB030+0x1B0on Event0x12d0(<NO_SIGNALS_DESPITE_WAITS>). sub_825070F0(vtable[1] worker-spawner) fires 0×.- 4 of 5 canary worker tids (canary's tid=14/15/4/+ several more) emit hundreds of thousands of events; ours's equivalents emit ≤80. AUDIT-057 thread-gap PERSISTS.
- tid=1 join-waits tid=13 at
-
New tooling (handle.create/destroy, thread.create/exit, wait.begin, shared-global SID absorbers) was applied. It surfaces normal cold-vs-cold divergences past 105K but does NOT illuminate a new signal-flow gap on the wedge handle itself.
-
The wedge handle's SID
d5e23609d3948568has zero matches in any canary cold trace. The per-tid-PC SID recipe yields different SIDs for what is logically the same Event across engines, because create-site PC + tid + tid_event_idx all participate in the hash. This is by design (it's NOT a process-global dispatcher), but it means the new wait.begin events cannot directly identify "which canary NtSetEvent call should signal this".
Why this is hard — the structural impasse
The matched-prefix metric and the progression metric measure
different things. Matched-prefix tracks the tid=1-only event
sequence in lockstep up to the first kind-mismatch. The wedge is on
tid=13 waiting for a signal that would come from a
worker-cluster thread that never spawns. The two threads barely
overlap in the matched-prefix view (tid=1 is fine for 105K events
because it hasn't reached the join-wait yet from Phase A's
perspective — sub_82173990+0x2D0 is past idx 105,112 in canary's
tid=6 stream).
Every Phase C fix has correctly advanced matched-prefix while
leaving the wedge untouched, because the wedge needs the worker
cluster to bootstrap, and the worker cluster's activation chain
(sub_822F1AA8 → sub_82173990 → sub_821746B0 → sub_821748F0 → sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08 → sub_821CB030 and
in parallel → sub_82172BA0 → sub_821B55D8 → sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_825070F0 → 4 worker spawns) is
gated on the tid=13 wait completing, which is gated on a worker
signal, which is gated on the worker cluster bootstrapping. This
is the same self-referential lock AUDIT-063 documented.
What new information Phase W produced
- VdInitializeEngines stub fix (the landing). Trivially
correct, advances matched-prefix +66, does not move progression.
Worth keeping in canon for cold-vs-cold parity. New stable digest
73e99d60029128b4d5c3dd98e540457d82a52b8a962e7495132be2be31411aca× 3 byte-identical. - Confirmed via the new wait.begin events: canary's tid=9
(= ours's tid=13 logical role) calls
wait.beginon shared-global dispatcher Event0xf800004c(SIDc9f426cc34f55865) at idx 321 immediately afterRtlEnterCriticalSectionissues — proving that CS contention on canary's side awakens via the shared-global path while ours's per-tid Event takes the explicitNtCreateEvent+NtWaitForSingleObjectExpath. These are two different objects, not one waiting for the same signal. The tooling correctly says so. - The brief's hypothesis is correct: matched-prefix is no longer the right metric. Progression has not moved across 25 phases.
Recommended next steps (ranked)
Path 1 (recommended) — accept C+25 fallback and continue normal iteration
Dispatch C+25 = MmAllocatePhysicalMemoryEx / MmGetPhysicalAddress
deterministic allocator (the new first divergence at idx 105,112 is
in this family). Normal Phase C cadence; advances matched-prefix
without claiming wedge unblocking. Be honest in memory notes that
matched-prefix is the only metric moving.
Path 2 — re-examine the absorbers
The C+18/C+21/D-extension absorbers all explicitly fold "scheduling jitter" classes. Per the brief's Path B suggestion: is any absorber HIDING a signal that would resolve the wedge? Specifically:
- C+18 shared-global SID absorber folds canary's
aafae4c71fd42890work-queue semaphore creation into ours's emission window even when ours never creates the equivalent. If ours's worker fails to enqueue something canary's worker awaits, we'd never see the gap because the matched-prefix isn't on the worker tid in the first place. - The D-extension absorber folds nested-CS cleanup blocks. If
canary's
Enter/Leaveblock contains the NtSetEvent that signals the wedge handle (via descendantxeKeSetEvent), the absorber hides that.
Concrete: un-absorb, re-diff, look for the first FOLDED canary block
that contains an NtSetEvent whose SID resolves to the wedge handle.
~3-5 hours of analysis, no LOC change.
Path 3 — install host-side mem-watch + diff on wedge handle's guest memory
AUDIT-067 established that vtable installs go through host-side
writes invisible to guest-PC traces. By the same logic, the wedge
handle's kernel object header may be mutated by host code (the
canary scheduler / dispatcher) in ways ours doesn't replicate. Hook
Memory::write* in canary on the wedge handle's address; compare
against ours.
Path 4 — scheduler determinism investment
The unfunded scheduler_determinism_plan artifact (per memory). Stage
0 was null result; the contention manifest stages landed but didn't
move the cap. The PLAN doc explicitly notes the wedge is upstream of
contention, so this is unlikely to help WITHOUT additional work.
Honesty note
19 prior audits attacked this same wedge and failed. Phase W is the 20th. We landed a correct mini-fix, but the wedge itself is unchanged. The user's instinct to call this honest fallback is the correct posture.