handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
193
audit-runs/review-a-boot-state/methodology-assessment.md
Normal file
193
audit-runs/review-a-boot-state/methodology-assessment.md
Normal file
@@ -0,0 +1,193 @@
|
||||
# Methodology assessment
|
||||
|
||||
## The matched-prefix metric: load-bearing or load-shedding?
|
||||
|
||||
Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25;
|
||||
Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*),
|
||||
matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1)
|
||||
advanced:
|
||||
|
||||
| Phase | Matched-prefix | Δ |
|
||||
|---|---:|---:|
|
||||
| Phase B baseline (pre-C+1) | ~102,168 | — |
|
||||
| Phase D D-extension landing | 104,607 → 105,046 | +439 |
|
||||
| Phase W (VdInitializeEngines fix) | 105,046 → 105,112 | +66 |
|
||||
| Phase C+25 (MmGetPhysicalAddress canon) | 105,112 → 105,128 | +16 |
|
||||
|
||||
| Phase | `swaps` | `draws` | `unique_render_targets` |
|
||||
|---|---:|---:|---:|
|
||||
| Phase B baseline | 1 | 0 | 0 |
|
||||
| Phase W | 1 | 0 | 0 |
|
||||
| Phase C+25 | 1 | 0 | 0 |
|
||||
|
||||
**The two metrics are decoupled.** Matched-prefix is moving along
|
||||
ENGINE-internal divergences (kernel-call return values, thread IDs,
|
||||
heap arena base addresses). The progression metric is gated by
|
||||
boot-state activation, which lives one or more layers above the diff
|
||||
points.
|
||||
|
||||
## Why the decoupling happened
|
||||
|
||||
Three reading-errors compound:
|
||||
|
||||
1. **#23 (cooperative-vs-preemptive scheduling jitter)**: canary's
|
||||
default-scheduling produces different *intra-thread* event ordering
|
||||
than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21,
|
||||
D-extension) correctly hide this jitter — but they hide *real
|
||||
bootstrap-time divergences too*. Phase W explicitly noted: "If
|
||||
ours's worker fails to enqueue something canary's worker awaits,
|
||||
we'd never see the gap because the matched-prefix isn't on the
|
||||
worker tid in the first place."
|
||||
2. **#30 (per-tid PC SID drift)**: shared-global SIDs work for
|
||||
process-global dispatchers (e.g., the work-queue semaphore at
|
||||
handle `0xF800003C` in canary). But the wedge handle `0x12d0`
|
||||
uses a per-tid create-site SID that does NOT match across engines.
|
||||
So even when the same logical event exists in both engines, the
|
||||
diff harness reports SID mismatch and absorbs OR diverges
|
||||
incorrectly.
|
||||
3. **#38 (cross-spawn producer paths)**: static reachability (the
|
||||
sylpheed.db `xrefs` table) misses producer paths that cross
|
||||
thread-spawn boundaries. The result.md from Phase Non-match shows
|
||||
canary's tid=14 (XAudio voice-mask poll) communicates with
|
||||
downstream code via a path that has no static `bl` edge — it
|
||||
crosses via guest kernel APIs.
|
||||
|
||||
## Alternative metric proposals
|
||||
|
||||
### Option 1 — `draws ≥ 1` (sharp gate)
|
||||
|
||||
**Pros**: directly measures the target. Boolean. Reproducible.
|
||||
**Cons**: gives no signal during iteration — every iterate before the
|
||||
breakthrough is `draws = 0`. Loss function is non-smooth.
|
||||
|
||||
### Option 2 — `swaps ≥ 2` (relaxed first-frame gate)
|
||||
|
||||
**Pros**: still sharp; one bit looser than draws. Distinguishes
|
||||
boot-init-only swap (`swaps=1`) from at-least-one-rendered-frame
|
||||
(`swaps≥2`).
|
||||
**Cons**: same non-smooth loss. Achievable in principle by a crowbar
|
||||
without solving the underlying bug.
|
||||
|
||||
### Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`
|
||||
|
||||
Compute: events emitted on the thread spawned at entry `0x822F1EE0`
|
||||
in any 90-s wallclock window. Canary: 594,000. Ours: ~0.
|
||||
|
||||
**Pros**: smooth-ish (event count can move slowly). Directly measures
|
||||
"is the renderer running." Bypasses the diff-tool jitter problem
|
||||
because it's a per-engine internal count.
|
||||
**Cons**: requires a non-trivial 90-s wallclock run (not 50M instr
|
||||
ceiling). Could be gamed by a crowbar that resumes the renderer
|
||||
without unblocking the wedge.
|
||||
|
||||
### Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`
|
||||
|
||||
Compute: how many tids in ours emit ≥10k events over 90 s wallclock.
|
||||
Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16
|
||||
plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids.
|
||||
|
||||
**Pros**: directly measures the AUDIT-057 thread-gap. Smooth metric:
|
||||
each unwedged thread adds 1 to the count.
|
||||
**Cons**: requires 90-s wallclock runs — ours can't reach this
|
||||
without solving the wedge first, so it's pre-requisite-equivalent to
|
||||
Option 3.
|
||||
|
||||
### Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)
|
||||
|
||||
Compute: how many `NtReleaseSemaphore` calls on the work semaphore
|
||||
(handle `0xF800003C` in canary, equivalent in ours) over 90 s
|
||||
wallclock. Canary: 414. Ours: 99 (24%).
|
||||
|
||||
**Pros**: pinpoints the under-production directly. Mechanically
|
||||
measurable. Already instrumented in canary (audit_70_semaphore_release_watch).
|
||||
**Cons**: same wallclock requirement; same gameability.
|
||||
|
||||
### Option 6 — composite: `progression_score`
|
||||
|
||||
Define:
|
||||
|
||||
```
|
||||
progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
|
||||
+ 0.001 * matched_prefix
|
||||
```
|
||||
|
||||
This recovers signal during iteration (matched-prefix moves)
|
||||
without pretending it's progression. The 1000:1 weight ratio
|
||||
matches the bug-class severity.
|
||||
|
||||
**Pros**: continuous gradient over both wedge-solving and
|
||||
canonicalization work. Honest about which is more important.
|
||||
**Cons**: arbitrary weights. Composite metrics drift in meaning.
|
||||
|
||||
## Recommendation
|
||||
|
||||
**Adopt Option 6 (composite progression_score) as the primary
|
||||
methodology metric**, with a hard secondary gate of "Option 2
|
||||
(`swaps ≥ 2`) is what matters; everything else is fitness."
|
||||
|
||||
Concrete proposal:
|
||||
|
||||
1. The `digest.json` output gains a `progression_score` field
|
||||
computed from the existing fields (zero new instrumentation).
|
||||
2. Every iterate must report Δprogression_score in its
|
||||
re-validation.md.
|
||||
3. Iterates that only move `matched_prefix` (i.e., Δprogression_score
|
||||
= (small) × Δmatched_prefix) MUST be tagged in their memory entry
|
||||
as "**canonicalization only — no progression**" and counted
|
||||
against a *budget*: max 5 consecutive iterates in this class
|
||||
before mandatory pivot to wedge-attack work.
|
||||
4. Audits that move `swaps` or `draws` (the high-weight terms) are
|
||||
tagged "**progression**" and given priority for resource
|
||||
allocation.
|
||||
|
||||
This methodology change costs ~10 LOC in the digest output and
|
||||
imposes a discipline cap of 5 canonicalization-only audits between
|
||||
progression attempts.
|
||||
|
||||
## Falsification of the matched-prefix-as-proxy belief
|
||||
|
||||
Phase C through C+25 explicitly assumed that matched-prefix is a
|
||||
**proxy** for progression. This assumption is now empirically
|
||||
falsified:
|
||||
|
||||
> +2,960 events of matched-prefix advancement produced exactly
|
||||
> ZERO units of progression.
|
||||
|
||||
Reading-error #39 (newly registered by this review):
|
||||
|
||||
> **#39 (matched-prefix as progression proxy)**: matched-prefix
|
||||
> measures *engine-to-engine divergence point*, not *game-to-game
|
||||
> functional gap*. When the wedge is on a different thread than the
|
||||
> matched-prefix anchor thread, advancing matched-prefix is orthogonal
|
||||
> to unwedging. Future audits MUST distinguish "ours's tid-X main
|
||||
> thread diverges from canary's tid-Y" from "ours's tid-X main thread
|
||||
> is *blocked because tid-Z is wedged*", and target the wedge directly
|
||||
> when present.
|
||||
|
||||
## What "progression discipline" looks like in practice
|
||||
|
||||
For the next 3 iterates:
|
||||
|
||||
- Iterate N+1: **Step 1 of shortest-path-roadmap** (crowbar). No
|
||||
diff-tool work. Target: `swaps ≥ 2`.
|
||||
- Iterate N+2: **Step 2 of roadmap** (trigger ID via canary jsonl
|
||||
analysis). No engine LOC. Target: identification of the missing
|
||||
kernel call(s).
|
||||
- Iterate N+3: **Step 3 of roadmap** (mirror the trigger). Target:
|
||||
ours unblocks without the crowbar.
|
||||
|
||||
Each iterate must produce a `progression_score` delta report. If
|
||||
3 iterates in a row produce Δprogression_score ≤ ε (where
|
||||
ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed
|
||||
again before continuing — this would mean even the crowbar approach
|
||||
failed and a deeper rethink is needed.
|
||||
|
||||
## Closing note
|
||||
|
||||
The user's instinct in calling this strategic pause and review was
|
||||
correct. The matched-prefix-only chain was producing real
|
||||
canonicalization work but had ceased producing progression. The
|
||||
roadmap above is one principled attempt at breaking the cycle; if it
|
||||
fails, the next-level fallback is to formally accept Sylpheed's
|
||||
boot-state as currently unreachable in ours and pivot to a different
|
||||
title for the methodology demonstration.
|
||||
Reference in New Issue
Block a user