Files

MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-05 07:19:08 +02:00

8.7 KiB

Raw Blame History

Crowbar v3 — ctx-state install verbatim

Date: 2026-05-21 Predecessor: v2 at audit-runs/review-a-step1b-crowbar-v2/. Status: LANDED. Hypothesis FALSIFIED: wedge is NOT crowbar-soluble at the ctx-state-only level. Case (D) needed (recursive secondary-object install). v3 produces same composite progression score as OFF baseline.

TL;DR

v2 found case (C): [ctx+44] is a secondary-object pointer. vtable[36] reads it and dispatches through it.
v3 captured canary's actual [ctx+44] value = 0xBCE25640 (via the audit_68_host_mem_read_probe cvar) along with the rest of the 64-byte ctx head, then installed that state verbatim in ours.
Worker tid=15 now passes the [ctx+44] load (loads 0xBCE25640 into r3) but 0xBCE25640 is unmapped in ours's address space (ours's allocator returns 0x4D1Dxxxx VAs; canary's xenon-arena VAs in the 0xBCExxxxx range have no equivalent in ours).
Reading [0xBCE25640] returns 0 → CTR=0 → bctrl faults at PC=0 with r3=0xbce25640 (was r3=0x0 in v2 — confirming the install worked, just deeper recursion needed).
3x OFF / 3x ON runs deterministic: swaps=1, draws=0, unique_render_targets=0 identical. Composite progression Δ = 0.

Captured canary ctx state

Canary cold run (90s, --mute=true), with cvars:

--audit_61_branch_probe_pcs=0x825070F0
--audit_68_host_mem_read_probe=0xBCE251C0:8:1000000,0xBCE251C8:8:1000000,
                               0xBCE251D0:8:1000000,0xBCE251D8:8:1000000,
                               0xBCE251E0:8:1000000,0xBCE251E8:8:1000000,
                               0xBCE251F0:8:1000000,0xBCE251F8:8:1000000

AUDIT-061-BR confirmed ctx_ptr=0xBCE251C0 (per AUDIT-068 S3 expectation; no arena drift in this run). Read probe captured the install timeline:

host_ns	event
9.556 s	Install starts: `[ctx+0]=0x8200A1E8` (vtable), `[ctx+4]=ctx`, `[ctx+8]=ctx`, `[ctx+12]=1` (refcount), `[ctx+16]=0x01000000`, `[ctx+32]=0xFFFFFFFF`
9.571 s	`[ctx+44]=0xBCE25640` written, `[ctx+48]=0xBE568F00` written (looks float-ish)
9.754 s	Transient `[ctx+32]=1` and `[ctx+40]=0x30057018` writes that are cleared next probe tick — likely temporary scratch during a function call
9.755 s	Stable post-install state

Final ctx bytes (saved at ctx-canary.bin):

  +  0: 82 00 A1 E8 BC E2 51 C0 BC E2 51 C0 00 00 00 01   <- vptr / self / self / refcount
  + 16: 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  + 32: FF FF FF FF 00 00 00 00 00 00 00 00 BC E2 56 40   <- ...sentinel... / [ctx+44]=0xBCE25640
  + 48: BE 56 8F 00 00 00 00 00 00 00 00 00 00 00 00 00   <- [ctx+48]=0xBE568F00 (-0.21f?)

Install path in ours

v3 adds crowbar_maybe_install_ctx_from_file() (~63 LOC) that reads the binary at $XENIA_CROWBAR_CTX_BIN and writes the bytes via mem.write_u8(ctx_ptr + i, byte) — same pattern as v2's crowbar_maybe_install_vtable_from_file(). Plus ~12 LOC of comments and the call-site addition. ~75 LOC additive over v2.

The 64-byte ctx file overwrites the v2 init at +0/+4/+8/+12 with identical values (verified — they match), and fills +16..+63 with the captured state.

Post-install log confirms exact write:

CROWBAR: installed 64 bytes at ctx_ptr=0x4d1d9000
CROWBAR: post-ctx-install ctx[+  0] (=0x4d1d9000) = 0x8200a1e8
CROWBAR: post-ctx-install ctx[+ 32] (=0x4d1d9020) = 0xffffffff
CROWBAR: post-ctx-install ctx[+ 44] (=0x4d1d902c) = 0xbce25640    <-- secondary obj ptr installed
CROWBAR: post-ctx-install ctx[+ 48] (=0x4d1d9030) = 0xbe568f00

The fault (v3)

Identical fault PC, different r3 — that's the smoking gun:

	v1 (no ctx install)	v2 (init +0..+12 only)	v3 (full 64 bytes)
FAULT PC	0	0	0
LR	0x82506e38	0x82506e38	0x82506e38
CTR	0	0	0
r3	(any)	0x0	0xbce25640
r30 (ctx_ptr)	0x4D1D9000	0x4D1D9000	0x4D1D9000
tid	15	15	15

The lwz r11, 0(r3) at PC 0x82506e28 (per v2's disasm) loads from r3 = [ctx+44]. In v2, r3=0, so reads [0]=0. In v3, r3=0xBCE25640, so reads [0xBCE25640]. Both reads return 0 because:

v2: page 0 isn't mapped (well, it might be but the value is 0).
v3: page 0xBCE25640 is definitely unmapped in ours.

Ours's heap is at 0..0x6FFFFFFF (per KernelState::heap_alloc). The xenon physical-region VAs (0xBC000000..0xC0000000) never appear in ours's allocator namespace — MmAllocatePhysicalMemoryEx just calls heap_alloc() which returns low VAs.

Why this falsifies the v3 hypothesis

The brief's hypothesis: "with the full ctx state pre-installed AND the 4 workers spawned, ours produces swaps≥2 or draws≥1."

Outcome: ctx state IS installed, 4 workers ARE spawned and resumed, but the dispatch on the secondary object fails because the secondary object's VA isn't mappable.

This is exactly case (γ) → fault at new structural location that the brief predicted. The new fault PC isn't actually new (still 0), but the new fault PRIMARY CAUSE is different: in v2 the cause was "ctx+44 not initialized"; in v3 it's "ctx+44 points to an unmapped VA."

Composite progression score

Per brief's option 6 metric (excluding the matched_prefix term, which needs canary cross-comparison not available in check digests):

score = 1*swaps + 10*draws + 100*unique_render_targets

Run	swaps	score	instructions
OFF-1	1	1	25,000,000
OFF-2	1	1	25,000,000
OFF-3	1	1	25,000,000
ON-1	1	1	20,000,167 (faulted)
ON-2	1	1	20,000,167 (faulted)
ON-3	1	1	20,000,167 (faulted)

Δ = 0. The instruction count dropped from 25M to 20.0001M in ON runs because the fault halts the run early at instr=20000167, ~167 instr after the crowbar trigger (threshold=20M). Confirms the workers can't even complete one meaningful iteration before faulting.

LOC delta

crates/xenia-kernel/src/exports.rs: +63 LOC (helper)
- 13 LOC (call-site comments + wire-up) = +76 LOC over v2.
audit-runs/review-a-step1c-crowbar-v3/: artifacts (ctx-canary.bin, canary-probe-run1.log, off-{1,2,3}.json, on-{1,2,3}.json, this doc, summary.md, re-validation.md, fix.diff).
No tests added: the helper is structurally identical to v2's crowbar_maybe_install_vtable_from_file, which has no test (it's a diagnostic, opt-in via env var).
canary instrumentation: 0 LOC (reused existing audit_68_host_mem_read_probe cvar).

What this confirms

v2's case (C) framing is structurally correct: [ctx+44] IS a secondary-object pointer that vtable[36] dispatches through.
Cross-engine pointer-VA mismatch is real and non-trivial: ours's allocator namespace doesn't include 0xBCxxxxxx VAs.
The wedge is ≥4-deep (vtable + ctx primary + ctx secondary pointer + secondary object's own vtable + fn-pointer slot). Crowbar approach saturates without much deeper state capture.

What this does NOT confirm

That the actual canary VA 0xBCE25640 is the ONLY secondary object. There may be more pointers in deeper ctx slots (we only captured 64 bytes; the full struct may be larger).
That installing the secondary object would suffice. The secondary object likely has its own pointer fields (head node of a linked list — looks like a queue/work-list given the doubly-linked-list pattern at +4/+8).

Recommendation

Stop the crowbar approach. The wedge is structurally too deep for state synthesis to be cheaper than fixing the natural-activation gap. Per Q5 of the boot-state review (methodology-assessment.md): the matched-prefix metric is on the wrong thread, and the wedge is inherently a thread-activation problem, not a state-construction problem.

Pivot recommendations (in order of cost):

AUDIT-069 follow-up — the 25 vs 1 "other producers" gap from Session 5 is more actionable than the worker-spawn gap. The XAudio thread resume at canary 1.726 s is a candidate trigger that produces 8-24 helpers ahead of the wedge.
Recursive ctx-state capture (option β from brief) — write a probe-graph tool that captures canary's pointer-reachable closure from ctx_ptr (BFS via audit_68_host_mem_read_probe, follow each pointer field that's in the BC arena, capture another 64 bytes, repeat). Estimate: 200-400 LOC tooling + needs ours-side memory allocator extension to map BC-arena VAs. High complexity vs gain.
Pointer-translation table (option α) — map canary BC-VAs to ours allocator-VAs on install. Needs canary-vs-ours linked allocator walk; ~300 LOC.

The natural-activation path (Step 2 of the boot-state roadmap) is likely cheaper than any of these crowbar extensions.

8.7 KiB Raw Blame History Unescape Escape