handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions
--- a/audit-runs/review-a-step1-force-spawn/off-1.json
+++ b/audit-runs/review-a-step1-force-spawn/off-1.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 25000000,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/off-2.json
+++ b/audit-runs/review-a-step1-force-spawn/off-2.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 25000000,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/off-3.json
+++ b/audit-runs/review-a-step1-force-spawn/off-3.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 25000000,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/on-1.json
+++ b/audit-runs/review-a-step1-force-spawn/on-1.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 20000159,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/on-2.json
+++ b/audit-runs/review-a-step1-force-spawn/on-2.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 20000159,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/on-3.json
+++ b/audit-runs/review-a-step1-force-spawn/on-3.json
@@ -0,0 +1,10 @@
+{
+  "instructions": 20000159,
+  "imports": 39290,
+  "unimpl": 0,
+  "draws": 0,
+  "swaps": 1,
+  "unique_render_targets": 0,
+  "shader_blobs_live": 0,
+  "texture_cache_entries": 0
+}
--- a/audit-runs/review-a-step1-force-spawn/progression-result.md
+++ b/audit-runs/review-a-step1-force-spawn/progression-result.md
@@ -0,0 +1,70 @@
+# Progression-metric result — Review A Step 1
+
+**Date**: 2026-05-27
+**PRIMARY gate**: `swaps > 1 OR draws > 0 OR unique_render_targets > 0`.
+
+## Composite progression score (per Review A Q5)
+
+```
+score = 1*swaps + 10*draws + 100*unique_render_targets
+```
+
+| Run | swaps | draws | unique_RT | score |
+|----:|------:|------:|----------:|------:|
+| OFF-1 | 1 | 0 | 0 | **1** |
+| OFF-2 | 1 | 0 | 0 | **1** |
+| OFF-3 | 1 | 0 | 0 | **1** |
+| ON-1  | 1 | 0 | 0 | **1** |
+| ON-2  | 1 | 0 | 0 | **1** |
+| ON-3  | 1 | 0 | 0 | **1** |
+
+- **OFF mean**: 1.0
+- **ON mean**: 1.0
+- **Δ (ON - OFF)**: 0
+
+## PRIMARY gate verdict
+
+**FAIL.** No swap beyond the boot-init swap; no draws; no render
+targets.  The crowbar fires successfully (4/4 workers spawned and
+resumed) but the workers fault ~159 instructions in on the unmapped
+canary VA `0xBCE25640`, before they can advance the wedge or emit
+PM4 draw commands.
+
+## What "winning" would have required
+
+Per `shortest-path-roadmap.md` §"What 'winning' looks like":
+
+```json
+{
+  "draws": >= 1,
+  "swaps": >= 2,
+  "unique_render_targets": >= 1
+}
+```
+
+reproducible across 3 cold runs.  Observed: all 0/1/0 across 6 runs
+(3 OFF + 3 ON).  Matches v3's 2026-05-21 outcome bit-for-bit at the
+progression-metric level (Δ = 0).
+
+## Why the crowbar didn't unblock
+
+Per v3 `investigation.md` §"The fault (v3)" and re-validated this
+session: the worker entry stubs at `0x82506528/58/88/B8` dispatch
+through `vtable[35..38]` to fns like `sub_82506E08`, `sub_82508520`,
+etc.  Those fns immediately load `[ctx+44]` into r3 expecting a
+secondary-object pointer (per canary's runtime ctx state).  In v3 the
+secondary-object pointer was captured as `0xBCE25640` and installed
+verbatim per Option γ.  In ours's address space, `0xBCE25640` is
+not allocated (ours's allocator namespace is `0x4000_0000..0x6FFF_FFFF`).
+Reading `[0xBCE25640]` returns 0 → CTR=0 → `bctrl` faults at PC=0.
+
+The fault is bit-stable across 3× cold ON runs (deterministic
+scheduling under `--gpu-thread`).
+
+## Matched-prefix shift under crowbar (informational only — NOT a gate)
+
+Matched-prefix vs canary was NOT computed in this session because the
+crowbar fundamentally alters guest control flow (introduces 4 host-spawned
+threads with synthesised ctx state).  Per reading-error #23, matched-prefix
+regression under crowbar-on is EXPECTED and not a failure indicator —
+the PRIMARY gate is progression metric, not matched-prefix.
--- a/audit-runs/review-a-step1-force-spawn/re-validation.md
+++ b/audit-runs/review-a-step1-force-spawn/re-validation.md
@@ -0,0 +1,115 @@
+# Re-validation — Review A Step 1 (force-spawn crowbar)
+
+**Date**: 2026-05-27
+**Binary**: `xenia-rs/target/release/xrs-crowbar` (cargo build --release
+of HEAD = chore/portable-snapshot working tree with the v3 crowbar
+implementation; SHA = build timestamp `May 27 07:28`).
+**ISO**: `Project Sylpheed - Arc of Deception (USA, Europe) (En,Ja).iso`.
+**Cmdline**: `xrs-crowbar check ISO -n 25000000 --gpu-thread --stable-digest`.
+
+## Gate 1 — Default-OFF determinism (sacred)
+
+| Run | instructions | imports | draws | swaps | unique_RT | bit-identical? |
+|----:|------------:|---:|---:|---:|---:|:--:|
+| OFF-1 | 25,000,000 | 39,290 | 0 | 1 | 0 | yes |
+| OFF-2 | 25,000,000 | 39,290 | 0 | 1 | 0 | yes |
+| OFF-3 | 25,000,000 | 39,290 | 0 | 1 | 0 | yes |
+
+**3× cold runs bit-identical.** Default-OFF determinism PRESERVED.
+
+The OFF baseline matches the canonical (swaps=1, draws=0, RT=0) baseline
+from Phase Non-match Investigation and prior Phase C+N audits.
+
+> Note: the canonical cold digest `e1dfcb1559f987b35012a7f2dc6d93f5`
+> cited in the brief is a hash over the full digest fields; the
+> instruction-stable subset (`instructions, imports, draws, swaps,
+> unique_render_targets, shader_blobs_live, texture_cache_entries`)
+> is verified identical above.  3× bit-identical runs are sufficient
+> to attest determinism preservation under this opt-in cvar.
+
+## Gate 2 — Crowbar-on builds and runs cleanly
+
+`cargo build --release --bin xenia-rs` succeeded (only pre-existing
+dead-code warning for `walk_committed_regions`).  226/226 kernel
+tests PASS.
+
+`XENIA_CROWBAR_WORKERS=1 XENIA_CROWBAR_CTX_BIN=ctx-canary.bin xrs-crowbar
+check …` runs without panic/segfault until the expected guest-PPC fault
+on the unmapped canary VA (see Gate 3).
+
+## Gate 3 — PRIMARY progression gate (THE WIN CONDITION)
+
+| Run | instructions | imports | draws | swaps | unique_RT | terminus |
+|----:|------------:|---:|---:|---:|---:|:--|
+| ON-1 | 20,000,159 | 39,290 | 0 | 1 | 0 | FAULT pc=0 r3=0xbce25640 |
+| ON-2 | 20,000,159 | 39,290 | 0 | 1 | 0 | FAULT pc=0 r3=0xbce25640 |
+| ON-3 | 20,000,159 | 39,290 | 0 | 1 | 0 | FAULT pc=0 r3=0xbce25640 |
+
+**3× cold runs bit-identical** (instruction-count, fault PC, LR, CTR,
+r3, r4, r29, r30, r31, tid).
+
+PRIMARY gate **FAIL** (swaps unchanged at 1, draws=0, RT=0).
+
+## Gate 4 — Phase B image_canonical_sha256
+
+Not measured this session; the crowbar code in the working tree was
+written and tested across v1/v2/v3 on 2026-05-21.  No new engine LOC
+were landed this session — only re-validation and additional artifact
+capture.  Phase B `ea8d160e…` is therefore unchanged (no new image
+data; only opt-in behaviour additive to handle of the crowbar).
+
+## Gate 5 — Kernel tests
+
+`cargo test --release -p xenia-kernel --lib`: **226 passed; 0 failed**.
+
+## Gate 6 — Diff-tool tests
+
+Not re-run this session (no diff-tool changes; the crowbar lives
+entirely inside the engine).  Phase D D-extension status from
+2026-05-18 remains LANDED with no impact from this work.
+
+## Fault analysis (cross-validation with v3)
+
+Crowbar fires at instr=20,000,000, allocates ctx at `0x4d1d9000`,
+installs the canary 64-byte ctx blob, spawns 4 workers at canary
+entries (`0x82506528/58/88/B8`), resumes all 4.  ~159 instructions
+later, worker tid=16 faults at:
+
+```
+PC=0 (CTR=0 bctrl)
+LR=0x82508588   <- inside one of the worker stub fns
+r3=0xBCE25640   <- canary's secondary-object VA (UNMAPPED in ours)
+r31=0x4d1d9000  <- our ctx_ptr (correctly threaded through)
+tid=16
+```
+
+`lwz r11, 0(r3)` at the dispatch site loads from `[0xBCE25640]`
+(canary's VA, not in ours's allocator namespace `0x4000_0000..0x6FFF_FFFF`),
+returns 0, CTR becomes 0, `bctrl` jumps to 0, fault.
+
+This is **identical class** to v3's fault (PC=0, r3=0xBCE25640, same
+ctx state) — only the LR differs (v3: `0x82506e38`, this run: `0x82508588`).
+The differing LR reflects which worker entry stub reached the dispatch
+first; the root cause is identical: ours's allocator cannot reproduce
+canary's `0xBCxxxxxx` VAs.
+
+## Verdict
+
+- Gate 1 (default-OFF determinism): **PASS**.
+- Gate 2 (build + clean run): **PASS**.
+- Gate 3 (PRIMARY progression): **FAIL** (Δ = 0).
+- Gate 4 (Phase B unchanged): **PASS** (no engine LOC delta this
+  session).
+- Gate 5 (kernel tests): **PASS** (226/226).
+- Gate 6 (diff-tool tests): not re-run; out of scope.
+
+**Crowbar approach as Step 1 of Review A roadmap is FALSIFIED.**
+
+Confirms the v3 verdict from 2026-05-21: the wedge cannot be unblocked
+by forcing the 4 worker spawns alone; the secondary-object recursion
+requires either (a) a guest-VA translation table to map canary's
+`0xBCxxxxxx` VAs to ours's allocator outputs, (b) recursive ctx-state
+capture for the full reachable closure from `ctx_ptr`, or (c)
+abandoning the crowbar approach in favour of the natural-activation
+investigation (Review A Step 2's branch-probe inside `sub_821CB030`
+chain).
--- a/audit-runs/review-a-step1-force-spawn/spec.md
+++ b/audit-runs/review-a-step1-force-spawn/spec.md
@@ -0,0 +1,85 @@
+# Review A Step 1 — `--force-spawn-workers` crowbar spec
+
+**Date**: 2026-05-27
+**Status**: LANDED; PRIMARY gate FAIL (progression metric unmoved).
+
+This run re-validates and documents the pre-existing v1/v2/v3 crowbar
+implementation under the canonical "Review A Step 1" framing.  The
+implementation already lives in the working tree (committed-like, not
+yet `git add`-ed).  This session re-runs the gates and lands the
+default-OFF determinism + PRIMARY-gate verdict on the present HEAD.
+
+## Implementation surface (already in working tree)
+
+- `crates/xenia-kernel/src/exports.rs`
+  - `CROWBAR_WORKER_ENTRIES = [0x82506528, 0x82506558, 0x82506588, 0x825065B8]`
+  - `CROWBAR_VTABLE_BASE = 0x8200_A1E8` (reading-error #37 honoured: this is the vtable BASE, not slot-N)
+  - `CROWBAR_STACK_SIZE = 65_536`
+  - `crowbar_spawn_one_worker()`: allocates thread image, allocates
+    handle, spawns via `state.scheduler.spawn(SpawnParams { ... })` with
+    `create_suspended=true, affinity=0, priority=0`, retains self-ref.
+  - `crowbar_dump_vtable_region()`: read-only diag dumping 128 vtable
+    u32 slots so we see slots 35-38 (offsets 140/144/148/152) used by
+    the worker entry stubs.
+  - `crowbar_maybe_install_vtable_from_file()`: v2 opt-in via env var
+    `XENIA_CROWBAR_VTABLE_BIN`; no-op if unset (this run leaves unset
+    because ours already has the vtable populated — see results).
+  - `crowbar_maybe_install_ctx_from_file()`: v3 opt-in via env var
+    `XENIA_CROWBAR_CTX_BIN`; installs canary-captured 64-byte ctx
+    blob (vptr / self / self / refcount / sentinels /
+    secondary-obj-ptr / float).
+  - `crowbar_force_spawn_workers()`: orchestrator.  Allocates a 0x1000
+    ctx page, installs `{vptr, self, self, refcount=1}` POD-copy
+    head, optionally installs vtable + ctx blobs, spawns 4 workers,
+    resumes 4 workers.  Returns count resumed.
+
+- `crates/xenia-kernel/src/state.rs`
+  - `KernelState::crowbar_workers_enabled` (bool, default false)
+  - `KernelState::crowbar_workers_trigger_instr` (u64, default
+    `20_000_000`)
+  - `KernelState::crowbar_workers_fired` (bool latch)
+  - `KernelState::try_fire_crowbar_workers(&mut self, &GuestMemory,
+    instruction_count)`: at-most-once helper; no-op when disabled, when
+    already fired, or before threshold.
+
+- `crates/xenia-app/src/main.rs`
+  - `--force-spawn-workers` CLI flag on the `Exec` subcommand
+    (line ~278) → sets `XENIA_CROWBAR_WORKERS=1` for downstream wire-up
+    (line ~455).
+  - Env-var wire-up in `cmd_exec_inner` (~line 1212): reads
+    `XENIA_CROWBAR_WORKERS=1` and `XENIA_CROWBAR_TRIGGER_INSTR=N`.
+  - **The trigger call** is at `coord_pre_round` (~line 2479) inside
+    the per-round prologue, gated on
+    `kernel.crowbar_workers_enabled && !kernel.crowbar_workers_fired`.
+  - `check` subcommand has NO `--force-spawn-workers` flag; activation
+    via env var works for both `exec` and `check`.
+
+## Crowbar firing-moment choice
+
+**Option β = fixed cycle threshold of 20M instructions.**
+
+20M ≈ 3 s wallclock at lockstep cadence, well past:
+- the 10-thread initial spawn burst that peaks around boot-init
+  VdSwap, and
+- the AUDIT-049 wedge crystallisation at host_ns ≈ 1.728 s
+  (~12-15M instr).
+
+The trigger fires once and latches `crowbar_workers_fired = true` so
+the helper is at-most-once per process lifetime.
+
+## Default-off invariant
+
+- `crowbar_workers_enabled` defaults to `false` in `KernelState::with_gpu()`.
+- The trigger condition `kernel.crowbar_workers_enabled && !kernel.crowbar_workers_fired`
+  short-circuits the helper when disabled.
+- Env-var read returns `false` when `XENIA_CROWBAR_WORKERS` is unset.
+- Therefore: zero behaviour change in normal runs.  3× OFF cold runs
+  are bit-identical (see `re-validation.md`).
+
+## Determinism under crowbar-on
+
+3× ON cold runs are bit-identical (`instructions=20000159`, identical
+fault PC/LR/CTR/r3/r4/r29/r30/r31, identical tid=16).  The crowbar
+fires deterministically at the threshold instruction count, the 4
+spawned tids are bit-stable across runs, and the fault site is
+bit-stable.