Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
194 lines
7.9 KiB
Markdown
194 lines
7.9 KiB
Markdown
# Methodology assessment
|
||
|
||
## The matched-prefix metric: load-bearing or load-shedding?
|
||
|
||
Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
|
||
Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
|
||
matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
|
||
advanced:
|
||
|
||
| Phase | Matched-prefix | Δ |
|
||
|---|---:|---:|
|
||
| Phase B baseline (pre-C+1) | ~102,168 | — |
|
||
| Phase D D-extension landing | 104,607 → 105,046 | +439 |
|
||
| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
|
||
| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
|
||
|
||
| Phase | `swaps` | `draws` | `unique_render_targets` |
|
||
|---|---:|---:|---:|
|
||
| Phase B baseline | 1 | 0 | 0 |
|
||
| Phase W | 1 | 0 | 0 |
|
||
| Phase C+25 | 1 | 0 | 0 |
|
||
|
||
**The two metrics are decoupled.** Matched-prefix is moving along
|
||
ENGINE-internal divergences (kernel-call return values, thread IDs,
|
||
heap arena base addresses). The progression metric is gated by
|
||
boot-state activation, which lives one or more layers above the diff
|
||
points.
|
||
|
||
## Why the decoupling happened
|
||
|
||
Three reading-errors compound:
|
||
|
||
1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
|
||
default-scheduling produces different *intra-thread* event ordering
|
||
than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21,
|
||
D-extension) correctly hide this jitter — but they hide *real
|
||
bootstrap-time divergences too*. Phase W explicitly noted: "If
|
||
ours's worker fails to enqueue something canary's worker awaits,
|
||
we'd never see the gap because the matched-prefix isn't on the
|
||
worker tid in the first place."
|
||
2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
|
||
process-global dispatchers (e.g., the work-queue semaphore at
|
||
handle `0xF800003C` in canary). But the wedge handle `0x12d0`
|
||
uses a per-tid create-site SID that does NOT match across engines.
|
||
So even when the same logical event exists in both engines, the
|
||
diff harness reports SID mismatch and absorbs OR diverges
|
||
incorrectly.
|
||
3. **#38 (cross-spawn producer paths)**: static reachability (the
|
||
sylpheed.db `xrefs` table) misses producer paths that cross
|
||
thread-spawn boundaries. The result.md from Phase Non-match shows
|
||
canary's tid=14 (XAudio voice-mask poll) communicates with
|
||
downstream code via a path that has no static `bl` edge — it
|
||
crosses via guest kernel APIs.
|
||
|
||
## Alternative metric proposals
|
||
|
||
### Option 1 — `draws ≥ 1` (sharp gate)
|
||
|
||
**Pros**: directly measures the target. Boolean. Reproducible.
|
||
**Cons**: gives no signal during iteration — every iterate before the
|
||
breakthrough is `draws = 0`. Loss function is non-smooth.
|
||
|
||
### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
|
||
|
||
**Pros**: still sharp; one bit looser than draws. Distinguishes
|
||
boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
|
||
(`swaps≥2`).
|
||
**Cons**: same non-smooth loss. Achievable in principle by a crowbar
|
||
without solving the underlying bug.
|
||
|
||
### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
|
||
|
||
Compute: events emitted on the thread spawned at entry `0x822F1EE0`
|
||
in any 90-s wallclock window. Canary: 594,000. Ours: ~0.
|
||
|
||
**Pros**: smooth-ish (event count can move slowly). Directly measures
|
||
"is the renderer running." Bypasses the diff-tool jitter problem
|
||
because it's a per-engine internal count.
|
||
**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
|
||
ceiling). Could be gamed by a crowbar that resumes the renderer
|
||
without unblocking the wedge.
|
||
|
||
### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
|
||
|
||
Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
|
||
Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
|
||
plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids.
|
||
|
||
**Pros**: directly measures the AUDIT-057 thread-gap. Smooth metric:
|
||
each unwedged thread adds 1 to the count.
|
||
**Cons**: requires 90-s wallclock runs — ours can't reach this
|
||
without solving the wedge first, so it's pre-requisite-equivalent to
|
||
Option 3.
|
||
|
||
### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
|
||
|
||
Compute: how many `NtReleaseSemaphore` calls on the work semaphore
|
||
(handle `0xF800003C` in canary, equivalent in ours) over 90 s
|
||
wallclock. Canary: 414. Ours: 99 (24%).
|
||
|
||
**Pros**: pinpoints the under-production directly. Mechanically
|
||
measurable. Already instrumented in canary (audit_70_semaphore_release_watch).
|
||
**Cons**: same wallclock requirement; same gameability.
|
||
|
||
### Option 6 — composite: `progression_score`
|
||
|
||
Define:
|
||
|
||
```
|
||
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
|
||
+ 0.001 * matched_prefix
|
||
```
|
||
|
||
This recovers signal during iteration (matched-prefix moves)
|
||
without pretending it's progression. The 1000:1 weight ratio
|
||
matches the bug-class severity.
|
||
|
||
**Pros**: continuous gradient over both wedge-solving and
|
||
canonicalization work. Honest about which is more important.
|
||
**Cons**: arbitrary weights. Composite metrics drift in meaning.
|
||
|
||
## Recommendation
|
||
|
||
**Adopt Option 6 (composite progression_score) as the primary
|
||
methodology metric**, with a hard secondary gate of "Option 2
|
||
(`swaps ≥ 2`) is what matters; everything else is fitness."
|
||
|
||
Concrete proposal:
|
||
|
||
1. The `digest.json` output gains a `progression_score` field
|
||
computed from the existing fields (zero new instrumentation).
|
||
2. Every iterate must report Δprogression_score in its
|
||
re-validation.md.
|
||
3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
|
||
= (small) × Δmatched_prefix) MUST be tagged in their memory entry
|
||
as "**canonicalization only — no progression**" and counted
|
||
against a *budget*: max 5 consecutive iterates in this class
|
||
before mandatory pivot to wedge-attack work.
|
||
4. Audits that move `swaps` or `draws` (the high-weight terms) are
|
||
tagged "**progression**" and given priority for resource
|
||
allocation.
|
||
|
||
This methodology change costs ~10 LOC in the digest output and
|
||
imposes a discipline cap of 5 canonicalization-only audits between
|
||
progression attempts.
|
||
|
||
## Falsification of the matched-prefix-as-proxy belief
|
||
|
||
Phase C through C+25 explicitly assumed that matched-prefix is a
|
||
**proxy** for progression. This assumption is now empirically
|
||
falsified:
|
||
|
||
> +2,960 events of matched-prefix advancement produced exactly
|
||
> ZERO units of progression.
|
||
|
||
Reading-error #39 (newly registered by this review):
|
||
|
||
> **#39 (matched-prefix as progression proxy)**: matched-prefix
|
||
> measures *engine-to-engine divergence point*, not *game-to-game
|
||
> functional gap*. When the wedge is on a different thread than the
|
||
> matched-prefix anchor thread, advancing matched-prefix is orthogonal
|
||
> to unwedging. Future audits MUST distinguish "ours's tid-X main
|
||
> thread diverges from canary's tid-Y" from "ours's tid-X main thread
|
||
> is *blocked because tid-Z is wedged*", and target the wedge directly
|
||
> when present.
|
||
|
||
## What "progression discipline" looks like in practice
|
||
|
||
For the next 3 iterates:
|
||
|
||
- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar). No
|
||
diff-tool work. Target: `swaps ≥ 2`.
|
||
- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
|
||
analysis). No engine LOC. Target: identification of the missing
|
||
kernel call(s).
|
||
- Iterate N+3: **Step 3 of roadmap** (mirror the trigger). Target:
|
||
ours unblocks without the crowbar.
|
||
|
||
Each iterate must produce a `progression_score` delta report. If
|
||
3 iterates in a row produce Δprogression_score ≤ ε (where
|
||
ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
|
||
again before continuing — this would mean even the crowbar approach
|
||
failed and a deeper rethink is needed.
|
||
|
||
## Closing note
|
||
|
||
The user's instinct in calling this strategic pause and review was
|
||
correct. The matched-prefix-only chain was producing real
|
||
canonicalization work but had ceased producing progression. The
|
||
roadmap above is one principled attempt at breaking the cycle; if it
|
||
fails, the next-level fallback is to formally accept Sylpheed's
|
||
boot-state as currently unreachable in ours and pivot to a different
|
||
title for the methodology demonstration.
|