Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

7.9 KiB

Raw Blame History

Methodology assessment

The matched-prefix metric: load-bearing or load-shedding?

Across 25+ iterates (audits 049 through 069; Phase C+1 through C+25; Phase D Stages 0-4 plus D-extension; Phase W; Phase host-audio-*), matched-prefix on the main thread (canary tid=6 ⇄ ours tid=1) advanced:

Phase	Matched-prefix	Δ
Phase B baseline (pre-C+1)	~102,168	—
Phase D D-extension landing	104,607 → 105,046	+439
Phase W (VdInitializeEngines fix)	105,046 → 105,112	+66
Phase C+25 (MmGetPhysicalAddress canon)	105,112 → 105,128	+16

Phase	`swaps`	`draws`	`unique_render_targets`
Phase B baseline	1	0	0
Phase W	1	0	0
Phase C+25	1	0	0

The two metrics are decoupled. Matched-prefix is moving along ENGINE-internal divergences (kernel-call return values, thread IDs, heap arena base addresses). The progression metric is gated by boot-state activation, which lives one or more layers above the diff points.

Why the decoupling happened

Three reading-errors compound:

#23 (cooperative-vs-preemptive scheduling jitter): canary's default-scheduling produces different intra-thread event ordering than ours's coroutine scheduler. Diff-tool absorbers (C+18, C+21, D-extension) correctly hide this jitter — but they hide real bootstrap-time divergences too. Phase W explicitly noted: "If ours's worker fails to enqueue something canary's worker awaits, we'd never see the gap because the matched-prefix isn't on the worker tid in the first place."
#30 (per-tid PC SID drift): shared-global SIDs work for process-global dispatchers (e.g., the work-queue semaphore at handle 0xF800003C in canary). But the wedge handle 0x12d0 uses a per-tid create-site SID that does NOT match across engines. So even when the same logical event exists in both engines, the diff harness reports SID mismatch and absorbs OR diverges incorrectly.
#38 (cross-spawn producer paths): static reachability (the sylpheed.db xrefs table) misses producer paths that cross thread-spawn boundaries. The result.md from Phase Non-match shows canary's tid=14 (XAudio voice-mask poll) communicates with downstream code via a path that has no static bl edge — it crosses via guest kernel APIs.

Alternative metric proposals

Option 1 — `draws ≥ 1` (sharp gate)

Pros: directly measures the target. Boolean. Reproducible. Cons: gives no signal during iteration — every iterate before the breakthrough is draws = 0. Loss function is non-smooth.

Option 2 — `swaps ≥ 2` (relaxed first-frame gate)

Pros: still sharp; one bit looser than draws. Distinguishes boot-init-only swap (swaps=1) from at-least-one-rendered-frame (swaps≥2). Cons: same non-smooth loss. Achievable in principle by a crowbar without solving the underlying bug.

Option 3 — Renderer-thread liveness: `events_emitted_by_renderer_thread ≥ N`

Compute: events emitted on the thread spawned at entry 0x822F1EE0 in any 90-s wallclock window. Canary: 594,000. Ours: ~0.

Pros: smooth-ish (event count can move slowly). Directly measures "is the renderer running." Bypasses the diff-tool jitter problem because it's a per-engine internal count. Cons: requires a non-trivial 90-s wallclock run (not 50M instr ceiling). Could be gamed by a crowbar that resumes the renderer without unblocking the wedge.

Option 4 — Worker-thread census: `count(threads_with_events ≥ 10k) ≥ 6`

Compute: how many tids in ours emit ≥10k events over 90 s wallclock. Canary at 90 s: 12 tids meet this (tids 1/2/4/6/9/10/11/12/13/14/15/16 plus the post-10s workers 21/27/28/29). Ours at 50M instr: 5 tids.

Pros: directly measures the AUDIT-057 thread-gap. Smooth metric: each unwedged thread adds 1 to the count. Cons: requires 90-s wallclock runs — ours can't reach this without solving the wedge first, so it's pre-requisite-equivalent to Option 3.

Option 5 — `worker_semaphore_release_count` (AUDIT-069 S5)

Compute: how many NtReleaseSemaphore calls on the work semaphore (handle 0xF800003C in canary, equivalent in ours) over 90 s wallclock. Canary: 414. Ours: 99 (24%).

Pros: pinpoints the under-production directly. Mechanically measurable. Already instrumented in canary (audit_70_semaphore_release_watch). Cons: same wallclock requirement; same gameability.

Option 6 — composite: `progression_score`

Define:

progression_score = 1 * swaps + 10 * draws + 100 * unique_render_targets
                  + 0.001 * matched_prefix

This recovers signal during iteration (matched-prefix moves) without pretending it's progression. The 1000:1 weight ratio matches the bug-class severity.

Pros: continuous gradient over both wedge-solving and canonicalization work. Honest about which is more important. Cons: arbitrary weights. Composite metrics drift in meaning.

Recommendation

Adopt Option 6 (composite progression_score) as the primary methodology metric, with a hard secondary gate of "Option 2 (swaps ≥ 2) is what matters; everything else is fitness."

Concrete proposal:

The digest.json output gains a progression_score field computed from the existing fields (zero new instrumentation).
Every iterate must report Δprogression_score in its re-validation.md.
Iterates that only move matched_prefix (i.e., Δprogression_score = (small) × Δmatched_prefix) MUST be tagged in their memory entry as "canonicalization only — no progression" and counted against a budget: max 5 consecutive iterates in this class before mandatory pivot to wedge-attack work.
Audits that move swaps or draws (the high-weight terms) are tagged "progression" and given priority for resource allocation.

This methodology change costs ~10 LOC in the digest output and imposes a discipline cap of 5 canonicalization-only audits between progression attempts.

Falsification of the matched-prefix-as-proxy belief

Phase C through C+25 explicitly assumed that matched-prefix is a proxy for progression. This assumption is now empirically falsified:

+2,960 events of matched-prefix advancement produced exactly ZERO units of progression.

Reading-error #39 (newly registered by this review):

#39 (matched-prefix as progression proxy): matched-prefix measures engine-to-engine divergence point, not game-to-game functional gap. When the wedge is on a different thread than the matched-prefix anchor thread, advancing matched-prefix is orthogonal to unwedging. Future audits MUST distinguish "ours's tid-X main thread diverges from canary's tid-Y" from "ours's tid-X main thread is blocked because tid-Z is wedged", and target the wedge directly when present.

What "progression discipline" looks like in practice

For the next 3 iterates:

Iterate N+1: Step 1 of shortest-path-roadmap (crowbar). No diff-tool work. Target: swaps ≥ 2.
Iterate N+2: Step 2 of roadmap (trigger ID via canary jsonl analysis). No engine LOC. Target: identification of the missing kernel call(s).
Iterate N+3: Step 3 of roadmap (mirror the trigger). Target: ours unblocks without the crowbar.

Each iterate must produce a progression_score delta report. If 3 iterates in a row produce Δprogression_score ≤ ε (where ε = +0.001 × +500 ≈ +0.5), the methodology should be re-reviewed again before continuing — this would mean even the crowbar approach failed and a deeper rethink is needed.

Closing note

The user's instinct in calling this strategic pause and review was correct. The matched-prefix-only chain was producing real canonicalization work but had ceased producing progression. The roadmap above is one principled attempt at breaking the cycle; if it fails, the next-level fallback is to formally accept Sylpheed's boot-state as currently unreachable in ours and pivot to a different title for the methodology demonstration.

7.9 KiB Raw Blame History Unescape Escape