# Methodology assessment ## The matched-prefix metric: load-bearing or load-shedding? Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25; Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*), matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1) advanced: | Phase | Matched-prefix | Δ | |---|---:|---:| | Phase B baseline (pre-C+1) | ~102,168 | — | | Phase D D-extension landing | 104,607 → 105,046 | +439 | | Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 | | Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 | | Phase | `swaps` | `draws` | `unique_render_targets` | |---|---:|---:|---:| | Phase B baseline | 1 | 0 | 0 | | Phase W | 1 | 0 | 0 | | Phase C+25 | 1 | 0 | 0 | **The two metrics are decoupled.** Matched-prefix is moving along ENGINE-internal divergences (kernel-call return values, thread IDs, heap arena base addresses). The progression metric is gated by boot-state activation, which lives one or more layers above the diff points. ## Why the decoupling happened Three reading-errors compound: 1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's default-scheduling produces different *intra-thread* event ordering than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21, D-extension) correctly hide this jitter — but they hide *real bootstrap-time divergences too*. Phase W explicitly noted: "If ours's worker fails to enqueue something canary's worker awaits, we'd never see the gap because the matched-prefix isn't on the worker tid in the first place." 2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for process-global dispatchers (e.g., the work-queue semaphore at handle `0xF800003C` in canary). But the wedge handle `0x12d0` uses a per-tid create-site SID that does NOT match across engines. So even when the same logical event exists in both engines, the diff harness reports SID mismatch and absorbs OR diverges incorrectly. 3. **#38 (cross-spawn producer paths)**: static reachability (the sylpheed.db `xrefs` table) misses producer paths that cross thread-spawn boundaries. The result.md from Phase Non-match shows canary's tid=14 (XAudio voice-mask poll) communicates with downstream code via a path that has no static `bl` edge — it crosses via guest kernel APIs. ## Alternative metric proposals ### Option 1 — `draws ≥ 1` (sharp gate) **Pros**: directly measures the target. Boolean. Reproducible. **Cons**: gives no signal during iteration — every iterate before the breakthrough is `draws = 0`. Loss function is non-smooth. ### Option 2 — `swaps ≥ 2` (relaxed first-frame gate) **Pros**: still sharp; one bit looser than draws. Distinguishes boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame (`swaps≥2`). **Cons**: same non-smooth loss. Achievable in principle by a crowbar without solving the underlying bug. ### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N` Compute: events emitted on the thread spawned at entry `0x822F1EE0` in any 90-s wallclock window. Canary: 594,000. Ours: ~0. **Pros**: smooth-ish (event count can move slowly). Directly measures "is the renderer running." Bypasses the diff-tool jitter problem because it's a per-engine internal count. **Cons**: requires a non-trivial 90-s wallclock run (not 50M instr ceiling). Could be gamed by a crowbar that resumes the renderer without unblocking the wedge. ### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6` Compute: how many tids in ours emit ≥10k events over 90 s wallclock. Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16 plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids. **Pros**: directly measures the AUDIT-057 thread-gap. Smooth metric: each unwedged thread adds 1 to the count. **Cons**: requires 90-s wallclock runs — ours can't reach this without solving the wedge first, so it's pre-requisite-equivalent to Option 3. ### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5) Compute: how many `NtReleaseSemaphore` calls on the work semaphore (handle `0xF800003C` in canary, equivalent in ours) over 90 s wallclock. Canary: 414. Ours: 99 (24%). **Pros**: pinpoints the under-production directly. Mechanically measurable. Already instrumented in canary (audit_70_semaphore_release_watch). **Cons**: same wallclock requirement; same gameability. ### Option 6 — composite: `progression_score` Define: ``` progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets + 0.001 * matched_prefix ``` This recovers signal during iteration (matched-prefix moves) without pretending it's progression. The 1000:1 weight ratio matches the bug-class severity. **Pros**: continuous gradient over both wedge-solving and canonicalization work. Honest about which is more important. **Cons**: arbitrary weights. Composite metrics drift in meaning. ## Recommendation **Adopt Option 6 (composite progression_score) as the primary methodology metric**, with a hard secondary gate of "Option 2 (`swaps ≥ 2`) is what matters; everything else is fitness." Concrete proposal: 1. The `digest.json` output gains a `progression_score` field computed from the existing fields (zero new instrumentation). 2. Every iterate must report Δprogression_score in its re-validation.md. 3. Iterates that only move `matched_prefix` (i.e., Δprogression_score = (small) × Δmatched_prefix) MUST be tagged in their memory entry as "**canonicalization only — no progression**" and counted against a *budget*: max 5 consecutive iterates in this class before mandatory pivot to wedge-attack work. 4. Audits that move `swaps` or `draws` (the high-weight terms) are tagged "**progression**" and given priority for resource allocation. This methodology change costs ~10 LOC in the digest output and imposes a discipline cap of 5 canonicalization-only audits between progression attempts. ## Falsification of the matched-prefix-as-proxy belief Phase C through C+25 explicitly assumed that matched-prefix is a **proxy** for progression. This assumption is now empirically falsified: > +2,960 events of matched-prefix advancement produced exactly > ZERO units of progression. Reading-error #39 (newly registered by this review): > **#39 (matched-prefix as progression proxy)**: matched-prefix > measures *engine-to-engine divergence point*, not *game-to-game > functional gap*. When the wedge is on a different thread than the > matched-prefix anchor thread, advancing matched-prefix is orthogonal > to unwedging. Future audits MUST distinguish "ours's tid-X main > thread diverges from canary's tid-Y" from "ours's tid-X main thread > is *blocked because tid-Z is wedged*", and target the wedge directly > when present. ## What "progression discipline" looks like in practice For the next 3 iterates: - Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar). No diff-tool work. Target: `swaps ≥ 2`. - Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl analysis). No engine LOC. Target: identification of the missing kernel call(s). - Iterate N+3: **Step 3 of roadmap** (mirror the trigger). Target: ours unblocks without the crowbar. Each iterate must produce a `progression_score` delta report. If 3 iterates in a row produce Δprogression_score ≤ ε (where ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed again before continuing — this would mean even the crowbar approach failed and a deeper rethink is needed. ## Closing note The user's instinct in calling this strategic pause and review was correct. The matched-prefix-only chain was producing real canonicalization work but had ceased producing progression. The roadmap above is one principled attempt at breaking the cycle; if it fails, the next-level fallback is to formally accept Sylpheed's boot-state as currently unreachable in ours and pivot to a different title for the methodology demonstration.