handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/review-a-boot-state/methodology-assessment.md
+++ b/audit-runs/review-a-boot-state/methodology-assessment.md
@@ -0,0 +1,193 @@
+# Methodology assessment
+
+## The matched-prefix metric: load-bearing or load-shedding?
+
+Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
+Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
+matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
+advanced:
+
+| Phase | Matched-prefix | Δ |
+|---|---:|---:|
+| Phase B baseline (pre-C+1) | ~102,168 | — |
+| Phase D D-extension landing | 104,607 → 105,046 | +439 |
+| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
+| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
+
+| Phase | `swaps` | `draws` | `unique_render_targets` |
+|---|---:|---:|---:|
+| Phase B baseline | 1 | 0 | 0 |
+| Phase W | 1 | 0 | 0 |
+| Phase C+25 | 1 | 0 | 0 |
+
+**The two metrics are decoupled.**  Matched-prefix is moving along
+ENGINE-internal divergences (kernel-call return values, thread IDs,
+heap arena base addresses).  The progression metric is gated by
+boot-state activation, which lives one or more layers above the diff
+points.
+
+## Why the decoupling happened
+
+Three reading-errors compound:
+
+1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
+   default-scheduling produces different *intra-thread* event ordering
+   than ours's coroutine scheduler.  Diff-tool absorbers (C+18, C+21,
+   D-extension) correctly hide this jitter — but they hide *real
+   bootstrap-time divergences too*.  Phase W explicitly noted: "If
+   ours's worker fails to enqueue something canary's worker awaits,
+   we'd never see the gap because the matched-prefix isn't on the
+   worker tid in the first place."
+2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
+   process-global dispatchers (e.g., the work-queue semaphore at
+   handle `0xF800003C` in canary).  But the wedge handle `0x12d0`
+   uses a per-tid create-site SID that does NOT match across engines.
+   So even when the same logical event exists in both engines, the
+   diff harness reports SID mismatch and absorbs OR diverges
+   incorrectly.
+3. **#38 (cross-spawn producer paths)**: static reachability (the
+   sylpheed.db `xrefs` table) misses producer paths that cross
+   thread-spawn boundaries.  The result.md from Phase Non-match shows
+   canary's tid=14 (XAudio voice-mask poll) communicates with
+   downstream code via a path that has no static `bl` edge — it
+   crosses via guest kernel APIs.
+
+## Alternative metric proposals
+
+### Option 1 — `draws ≥ 1` (sharp gate)
+
+**Pros**: directly measures the target.  Boolean.  Reproducible.
+**Cons**: gives no signal during iteration — every iterate before the
+breakthrough is `draws = 0`.  Loss function is non-smooth.
+
+### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
+
+**Pros**: still sharp; one bit looser than draws.  Distinguishes
+boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
+(`swaps≥2`).
+**Cons**: same non-smooth loss.  Achievable in principle by a crowbar
+without solving the underlying bug.
+
+### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
+
+Compute: events emitted on the thread spawned at entry `0x822F1EE0`
+in any 90-s wallclock window.  Canary: 594,000.  Ours: ~0.
+
+**Pros**: smooth-ish (event count can move slowly).  Directly measures
+"is the renderer running."  Bypasses the diff-tool jitter problem
+because it's a per-engine internal count.
+**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
+ceiling).  Could be gamed by a crowbar that resumes the renderer
+without unblocking the wedge.
+
+### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
+
+Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
+Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
+plus the post-10s workers 21/27/28/29).  Ours at 50M instr: 5 tids.
+
+**Pros**: directly measures the AUDIT-057 thread-gap.  Smooth metric:
+each unwedged thread adds 1 to the count.
+**Cons**: requires 90-s wallclock runs — ours can't reach this
+without solving the wedge first, so it's pre-requisite-equivalent to
+Option 3.
+
+### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
+
+Compute: how many `NtReleaseSemaphore` calls on the work semaphore
+(handle `0xF800003C` in canary, equivalent in ours) over 90 s
+wallclock.  Canary: 414.  Ours: 99 (24%).
+
+**Pros**: pinpoints the under-production directly.  Mechanically
+measurable.  Already instrumented in canary (audit_70_semaphore_release_watch).
+**Cons**: same wallclock requirement; same gameability.
+
+### Option 6 — composite: `progression_score`
+
+Define:
+
+```
+progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
+                  + 0.001 * matched_prefix
+```
+
+This recovers signal during iteration (matched-prefix moves)
+without pretending it's progression.  The 1000:1 weight ratio
+matches the bug-class severity.
+
+**Pros**: continuous gradient over both wedge-solving and
+canonicalization work.  Honest about which is more important.
+**Cons**: arbitrary weights.  Composite metrics drift in meaning.
+
+## Recommendation
+
+**Adopt Option 6 (composite progression_score) as the primary
+methodology metric**, with a hard secondary gate of "Option 2
+(`swaps ≥ 2`) is what matters; everything else is fitness."
+
+Concrete proposal:
+
+1. The `digest.json` output gains a `progression_score` field
+   computed from the existing fields (zero new instrumentation).
+2. Every iterate must report Δprogression_score in its
+   re-validation.md.
+3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
+   = (small) × Δmatched_prefix) MUST be tagged in their memory entry
+   as "**canonicalization only — no progression**" and counted
+   against a *budget*: max 5 consecutive iterates in this class
+   before mandatory pivot to wedge-attack work.
+4. Audits that move `swaps` or `draws` (the high-weight terms) are
+   tagged "**progression**" and given priority for resource
+   allocation.
+
+This methodology change costs ~10 LOC in the digest output and
+imposes a discipline cap of 5 canonicalization-only audits between
+progression attempts.
+
+## Falsification of the matched-prefix-as-proxy belief
+
+Phase C through C+25 explicitly assumed that matched-prefix is a
+**proxy** for progression.  This assumption is now empirically
+falsified:
+
+> +2,960 events of matched-prefix advancement produced exactly
+> ZERO units of progression.
+
+Reading-error #39 (newly registered by this review):
+
+> **#39 (matched-prefix as progression proxy)**: matched-prefix
+> measures *engine-to-engine divergence point*, not *game-to-game
+> functional gap*.  When the wedge is on a different thread than the
+> matched-prefix anchor thread, advancing matched-prefix is orthogonal
+> to unwedging.  Future audits MUST distinguish "ours's tid-X main
+> thread diverges from canary's tid-Y" from "ours's tid-X main thread
+> is *blocked because tid-Z is wedged*", and target the wedge directly
+> when present.
+
+## What "progression discipline" looks like in practice
+
+For the next 3 iterates:
+
+- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar).  No
+  diff-tool work.  Target: `swaps ≥ 2`.
+- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
+  analysis).  No engine LOC.  Target: identification of the missing
+  kernel call(s).
+- Iterate N+3: **Step 3 of roadmap** (mirror the trigger).  Target:
+  ours unblocks without the crowbar.
+
+Each iterate must produce a `progression_score` delta report.  If
+3 iterates in a row produce Δprogression_score ≤ ε (where
+ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
+again before continuing — this would mean even the crowbar approach
+failed and a deeper rethink is needed.
+
+## Closing note
+
+The user's instinct in calling this strategic pause and review was
+correct.  The matched-prefix-only chain was producing real
+canonicalization work but had ceased producing progression.  The
+roadmap above is one principled attempt at breaking the cycle; if it
+fails, the next-level fallback is to formally accept Sylpheed's
+boot-state as currently unreachable in ours and pivot to a different
+title for the methodology demonstration.