Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
133 lines
6.3 KiB
Markdown
133 lines
6.3 KiB
Markdown
# Phase W escalation — wedge unbroken by accumulated tooling
|
||
|
||
**Outcome category: (C) — escalating cleanly.**
|
||
|
||
The Phase W mini-fix landed (`VdInitializeEngines` returns 1 vs old
|
||
0; matches canary `xboxkrnl_video.cc:271-279`). This is a real
|
||
correctness fix that advances Phase A matched-prefix
|
||
**105,046 → 105,112 (+66 events)**. But on the brief's actual gate —
|
||
`swaps > 1` / `draws > 0` / `texture_cache_entries > 0` — the
|
||
`check --stable-digest -n 500000000` run is **byte-identical to
|
||
baseline**: `draws=0, swaps=1, render_targets=0`. The fix does not
|
||
unblock progression.
|
||
|
||
## What we verified afresh
|
||
|
||
1. The wedge is structurally identical to AUDIT-049/058/059/062/065:
|
||
* tid=1 join-waits tid=13 at `sub_82173990+0x2D0` (handle `0x12c8`).
|
||
* tid=13 wedges at `sub_821CB030+0x1B0` on Event `0x12d0`
|
||
(`<NO_SIGNALS_DESPITE_WAITS>`).
|
||
* `sub_825070F0` (vtable[1] worker-spawner) fires 0×.
|
||
* 4 of 5 canary worker tids (canary's tid=14/15/4/+ several more)
|
||
emit hundreds of thousands of events; ours's equivalents emit
|
||
≤80. AUDIT-057 thread-gap PERSISTS.
|
||
|
||
2. New tooling (handle.create/destroy, thread.create/exit,
|
||
wait.begin, shared-global SID absorbers) was applied. It surfaces
|
||
normal cold-vs-cold divergences past 105K but does NOT illuminate
|
||
a new signal-flow gap on the wedge handle itself.
|
||
|
||
3. The wedge handle's SID `d5e23609d3948568` has zero matches in any
|
||
canary cold trace. The per-tid-PC SID recipe yields different SIDs
|
||
for what is *logically* the same Event across engines, because
|
||
create-site PC + tid + tid_event_idx all participate in the hash.
|
||
This is by design (it's NOT a process-global dispatcher), but it
|
||
means the new wait.begin events cannot directly identify "which
|
||
canary NtSetEvent call should signal this".
|
||
|
||
## Why this is hard — the structural impasse
|
||
|
||
The matched-prefix metric and the progression metric measure
|
||
different things. Matched-prefix tracks the **tid=1-only** event
|
||
sequence in lockstep up to the first kind-mismatch. The wedge is on
|
||
**tid=13** waiting for a signal that would come from a
|
||
**worker-cluster thread that never spawns**. The two threads barely
|
||
overlap in the matched-prefix view (tid=1 is fine for 105K events
|
||
*because* it hasn't reached the join-wait yet from Phase A's
|
||
perspective — `sub_82173990+0x2D0` is past idx 105,112 in canary's
|
||
tid=6 stream).
|
||
|
||
Every Phase C fix has correctly advanced matched-prefix while
|
||
leaving the wedge untouched, because the wedge needs the worker
|
||
cluster to bootstrap, and the worker cluster's activation chain
|
||
(`sub_822F1AA8 → sub_82173990 → sub_821746B0 → sub_821748F0 →
|
||
sub_821C4EB0 → sub_821CC3F8 → sub_821CBA08 → sub_821CB030` and
|
||
in parallel `→ sub_82172BA0 → sub_821B55D8 → sub_824F8398 →
|
||
sub_824F7CD0 → sub_824F7800 → sub_825070F0 → 4 worker spawns`) is
|
||
gated on the tid=13 wait completing, which is gated on a worker
|
||
signal, which is gated on the worker cluster bootstrapping. This
|
||
is the **same self-referential lock** AUDIT-063 documented.
|
||
|
||
## What new information Phase W produced
|
||
|
||
1. **VdInitializeEngines stub fix** (the landing). Trivially
|
||
correct, advances matched-prefix +66, does not move progression.
|
||
Worth keeping in canon for cold-vs-cold parity. New stable digest
|
||
`73e99d60029128b4d5c3dd98e540457d82a52b8a962e7495132be2be31411aca`
|
||
× 3 byte-identical.
|
||
2. **Confirmed via the new wait.begin events**: canary's tid=9
|
||
(= ours's tid=13 logical role) calls `wait.begin` on shared-global
|
||
dispatcher Event `0xf800004c` (SID `c9f426cc34f55865`) at idx 321
|
||
*immediately* after `RtlEnterCriticalSection` issues — proving
|
||
that CS contention on canary's side awakens via the shared-global
|
||
path while ours's per-tid Event takes the explicit
|
||
`NtCreateEvent+NtWaitForSingleObjectEx` path. **These are two
|
||
different objects, not one waiting for the same signal.** The
|
||
tooling correctly says so.
|
||
3. **The brief's hypothesis is correct**: matched-prefix is no
|
||
longer the right metric. Progression has not moved across 25
|
||
phases.
|
||
|
||
## Recommended next steps (ranked)
|
||
|
||
### Path 1 (recommended) — accept C+25 fallback and continue normal iteration
|
||
|
||
Dispatch C+25 = `MmAllocatePhysicalMemoryEx` / `MmGetPhysicalAddress`
|
||
deterministic allocator (the new first divergence at idx 105,112 is
|
||
in this family). Normal Phase C cadence; advances matched-prefix
|
||
without claiming wedge unblocking. **Be honest in memory notes that
|
||
matched-prefix is the only metric moving.**
|
||
|
||
### Path 2 — re-examine the absorbers
|
||
|
||
The C+18/C+21/D-extension absorbers all explicitly fold "scheduling
|
||
jitter" classes. Per the brief's Path B suggestion: is any absorber
|
||
HIDING a signal that would resolve the wedge? Specifically:
|
||
* C+18 shared-global SID absorber folds canary's
|
||
`aafae4c71fd42890` work-queue semaphore creation into ours's
|
||
emission window even when ours never creates the equivalent. If
|
||
ours's worker fails to *enqueue* something canary's worker awaits,
|
||
we'd never see the gap because the matched-prefix isn't on the
|
||
worker tid in the first place.
|
||
* The D-extension absorber folds nested-CS cleanup blocks. If
|
||
canary's `Enter/Leave` block contains the NtSetEvent that signals
|
||
the wedge handle (via descendant `xeKeSetEvent`), the absorber
|
||
hides that.
|
||
|
||
Concrete: un-absorb, re-diff, look for the first FOLDED canary block
|
||
that contains an `NtSetEvent` whose SID resolves to the wedge handle.
|
||
~3-5 hours of analysis, no LOC change.
|
||
|
||
### Path 3 — install host-side mem-watch + diff on wedge handle's guest memory
|
||
|
||
AUDIT-067 established that vtable installs go through host-side
|
||
writes invisible to guest-PC traces. By the same logic, the wedge
|
||
handle's kernel object header may be mutated by host code (the
|
||
canary scheduler / dispatcher) in ways ours doesn't replicate. Hook
|
||
`Memory::write*` in canary on the wedge handle's address; compare
|
||
against ours.
|
||
|
||
### Path 4 — scheduler determinism investment
|
||
|
||
The unfunded `scheduler_determinism_plan` artifact (per memory). Stage
|
||
0 was null result; the contention manifest stages landed but didn't
|
||
move the cap. The PLAN doc explicitly notes the wedge is upstream of
|
||
contention, so this is unlikely to help WITHOUT additional work.
|
||
|
||
## Honesty note
|
||
|
||
19 prior audits attacked this same wedge and failed. Phase W is the
|
||
20th. We landed a correct mini-fix, but the wedge itself is
|
||
unchanged. The user's instinct to call this honest fallback is the
|
||
correct posture.
|