Files
xenia-rs/audit-runs/iterate-2J-cache-wipe-replay/writer-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

12 KiB
Raw Blame History

Iterate 2.J — Cache-wipe replay (writer report)

Date: 2026-05-28. LOC delta: engine 0, canary 0. Pure test-harness parity measurement (no code change). Tests: N/A (no source modifications).

Headline

WEDGE-MOVED. Primary gate PASS: 2.J's NtQueryFullAttributesFile cache-probe calls now return 0xc000000f (STATUS_NO_SUCH_FILE) for all 9 cache:\* paths, matching canary's cold-cache baseline (iterate 2.I documented ours returning STATUS_SUCCESS for the same paths in 2.H — the inversion identified there is closed by the env-var fix). Cascade is partial: tid=4 (cache-rebuild worker) explodes from 160 → 2,075 events (~13×, +97% NtCreateFile/NtOpenFile/NtWriteFile to cache:\ and cache:\<bucket>\<x>\<file>.tmp); total event count 118,149 → 121,569 (+3,420, +2.9%); tid=1 wedge geometry changed (last guest_cycle 9,140,200 → 9,169,116, +28,916 cycles). VdSwap count unchanged (1 swap); thread set still 10 entries (no new spawns); sub_824F8398 / sub_825070F0 still 0 fires. Cache-divergence is real and now closed, but it was not the keystone for the AUDIT-068 install chain.

Mode

Pure measurement, ZERO LOC change. Invocation:

XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec -n 50000000 --quiet \
  --phase-a-event-log audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \
  "<iso>"

Identical to iterate-2H invocation, with XENIA_CACHE_WIPE=1 prepended. Belt-and-braces: also rm -rf /home/fabi/.local/share/xenia-rs/cache/ before run (backup at /tmp/xenia-rs-cache-pre-2J-backup-*).

Cache wipe mechanism (verified)

From xenia-rs/crates/xenia-kernel/src/state.rs:1837-1893 (resolve_default_cache_root): XENIA_CACHE_WIPE=1 redirects cache_root to a per-process tmpdir at $TMPDIR/xenia-rs-cache-<pid>-<n> AND returns wipe=true, which makes init_cache_root (state.rs:728-758) do the clear-then-recreate dance. This properly isolates ours from any pre-existing XDG cache. No separate binary/JIT cache exists in this codebase (only XDG cache at $HOME/.local/share/xenia-rs/cache/).

Primary gate result — cache-probe return values

PASS (9/9). Every NtQueryFullAttributesFile call on a cache:\* path in 2.J returns 0xc000000f (STATUS_NO_SUCH_FILE). The first divergence flagged by iterate 2.I (idx 102423, cache:\d4ea4615\e\46ee8ca, ours STATUS_SUCCESS vs canary STATUS_NO_SUCH_FILE) is now bit-aligned with canary's cold-cache return.

Cache-probe paths and 2.J returns:

tid_event_idx path 2.J status canary baseline status
102423 cache:\d4ea4615\e\46ee8ca 0xc000000f 0xc000000f
103840 cache:\69d8e45c\8\3421153 0xc000000f 0xc000000f
103996 cache:\69d8e45c\9\355f2f8 0xc000000f 0xc000000f
104453 cache:\69d8e45c\e\534ffea 0xc000000f 0xc000000f
105477 cache:\aab216c3\a\2c8c185 0xc000000f 0xc000000f
105792 cache:\69d8e45c\9\73a5c0a 0xc000000f 0xc000000f
106228 cache:\69d8e45c\9\39a9dcc 0xc000000f 0xc000000f
(+others) cache:\aab216c3\5\ee70e0a 0xc000000f 0xc000000f

cache:\ root open and cache:\access/cache:\ignore/cache:\recent metadata probes also align with canary's cold-cache behavior.

Secondary cascade gate results

(a) tid=1 last timestamp

  • 2.H: cycle=9,140,200 / host_ns=792,522,910 (NtWaitForSingleObjectEx return)
  • 2.J: cycle=9,169,116 / host_ns=749,717,731 (NtWaitForSingleObjectEx return)
  • Delta: +28,916 cycles on tid=1 (continued progression). host_ns decrease is mechanical: 2.H spent ~43ms of host wallclock spinning at the wedge during the last few hundred matched events; 2.J consumed fewer host-side spin cycles because it actually consumed instruction budget on cache-rebuild work. Both runs hit the 50M-instr budget, not a wedge.

(b) Wedge PC

Per the prompt, the 2.F+2.I wedge target was tid=1 PC 0x824ac578 (the bl 0x8284E02C NtWaitForSingleObjectEx with timeout=-1 on thread handle 0x1210). 2.J's tail shows tid=1 executing many NtWait... calls past that wedge that return success (return_value=0, status=0x00000000), not timeout. The wait wrapper is no longer parked. The 50M-instr run terminates with all 14 tids in returning NtWait... calls, not in blocked waits. WEDGE-MOVED (or possibly absent within this instruction budget — would need a longer run to distinguish).

(c) sub_824F8398 fires?

0 fires. Grep for 824f8398 across the full ours-cold.jsonl: zero hits. The AUDIT-068 ctx-installer chain (sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24) is still upstream of the boot window ours reaches in 50M instructions. Per canary baseline this fires at host_ns≈9.4s; ours reaches host_ns≈759ms.

(d) sub_825070F0 fires?

0 fires. The post-VdSwap worker fan-out is still absent. Same mechanism as (c) — downstream of an install chain that ours doesn't reach inside the budget.

(e) Thread set / spawn count

10 thread.create entries (unchanged from 2.H). The new entry_pc list is bit-identical to 2.H:

0x82181830, 0x8245a5d0, 0x82450a28, 0x82457ef0, 0x824cd458,
0x822f1ee0, 0x824d2878, 0x824d2940, 0x82178950, 0x821748f0

Canary tids 15/27/28 worker analogs still absent. ctx_ptr columns bit-stable vs 2.H (vA0000000 bucket fix retained): 0xbe8cbb3c, 0xbd184a40, 0xbc6c5640. Per tripstone #28, comparison is keyed on entry_pc, not integer tid.

(f) Total event count

118,149 → 121,569 (+3,420, +2.9%). The increment is concentrated on the cache-rebuild worker (tid=4: 160 → 2,075 events, +1,915 = ~56% of the delta).

(g) Missing (op, lr) tuples (iterate-2D method)

Not re-measured. Phase-A --phase-a-event-log capture does not feed the 2.D diff pipeline (which consumes --lr-trace of IAT thunks at 0x8284DDDC/E49C/DF5C/E07C). 2.H report noted the same restriction. Expected unchanged at 28/28 — the producer LRs that fire in canary target downstream worker classes (sub_825070F0 fan-out) that ours still doesn't reach. Re-running 2.D requires a separate capture mode.

(h) VdSwap count

1 swap unchanged (3 events = import.call + kernel.call + kernel.return for the same single VdSwap call at cycle=5,577,303 / host_ns=489.2ms). Per tripstone #39: gameplay-level progression (swaps > 1 or draws > 0) NOT achieved. The 2.J run still wedges before the second swap.

(i) Draw count

0 draws. No *Draw* kernel-call names emitted (consistent with VdSwap=1: pre-gameplay).

Cascade roll-up

gate description 2.H 2.J result
PRIMARY cache-probe 0xc000000f matches canary FAIL (returns SUCCESS) PASS (9/9) PASS
(a) tid=1 last cycle progression 9,140,200 9,169,116 +28,916
(b) wedge PC 0x824ac578 parked wait timeout=-1 parked NtWait returns 0 MOVED
(c) sub_824F8398 fires install chain 0 0 UNCHANGED
(d) sub_825070F0 fires fan-out 0 0 UNCHANGED
(e) thread set size spawns 10 entries 10 entries UNCHANGED
(f) total event count volume 118,149 121,569 +2.9%
(g) missing-tuple count 2.D diff 28 n/a (different capture) NOT-MEASURED
(h) VdSwap count gameplay swaps 1 1 UNCHANGED
(i) draws gameplay draws 0 0 UNCHANGED

Outcome class: WEDGE-MOVED. Primary gate fully passes. tid=1 wedge geometry moved (wait now returns success). Cache-rebuild worker tid=4 springs into life (~13× event growth). But the deeper install chain (sub_824F8398 / sub_825070F0) remains downstream of the 50M-instr budget; gameplay-level progression (VdSwap > 1, draws > 0) NOT achieved.

What changed and why

The 2.I diagnosis was correct in its mechanism but only partially correct in its prediction:

  • Mechanism correct: ours's cache contained 9 files from previous runs (276K total). NtQueryFullAttributesFile returned STATUS_SUCCESS for files that should be missing on a cold boot. Canary's capture protocol wipes both XDG and binary caches; ours's warm-cache state put the engine on a cache-HIT replay branch instead of cache-MISS reconstruction. tid=4 was hardly doing anything in 2.H because the cache already existed. In 2.J it actively rebuilds the cache (36 NtCreateFile, 24 NtOpenFile, 19 NtWriteFile to *.tmp files and bucket directories).

  • Prediction partial: closing the cache-state divergence did unblock one wait wrapper (the previously-parked 0x824ac578 wait now returns success), but did NOT cascade through to the sub_824F8398 install chain or sub_825070F0 worker fan-out. The install epoch on canary fires at host_ns≈9.4s; ours's 50M-instr run ends at host_ns≈760ms. The wedge moved earlier, but the canary trajectory is still ~12× further along in wallclock when its install chain fires.

Tripstone audit

  • #28 (per-engine tid stability): All cross-engine comparisons are keyed on entry_pc and first-kernel-call signature, never on integer tid. The "tid=1 wedge" / "tid=4 cache rebuild" identities are ours-internal and stable across 2.H ↔ 2.J because both runs are ours-side (deterministic scheduler).
  • #39 (composite progression): The headline does NOT claim "gameplay progression" — VdSwap count unchanged at 1, draws unchanged at 0. The PRIMARY-gate PASS is a structural / state-parity claim (cache state matches canary baseline). Secondary observation tid=1 wedge geometry MOVED is reported with both improving (cycle +28,916) and ambiguous (host_ns shifted backward due to less spin-wait) evidence.
  • #40 (single-keystone framing): The 2.I prompt framing "cache-wipe single test-harness parity fix may unblock the wedge" is partially falsified. Cache-state IS load-bearing (one wedge moved, +3,420 events, tid=4 came alive) but is NOT the keystone for the AUDIT-068 install chain (sub_824F8398 still 0 fires). The iterate 2.E reading-error #40 class ("single-keystone framing falsified") REPEATS here. Recommend explicitly registering reading error #41: state-parity gate PASS does not imply cascade — even bit-identical input state can land on different trajectories when ~12× wallclock separates the install epochs.

Confidence

  • HIGH that primary gate genuinely passes (all 9 cache-probe paths bit-aligned with canary).
  • HIGH that tid=4 cache-rebuild work is the bulk of the +3,420 event delta (cache file I/O directly visible in args_resolved.path).
  • HIGH that the wedge moved (NtWait at 0x824ac578 no longer parked).
  • HIGH that sub_824F8398 / sub_825070F0 still 0 fires (instrumented multiple grep paths).
  • MEDIUM that the next blocker is "longer instruction budget + install chain investigation" vs "additional state-parity divergence upstream of install epoch". Both classes remain candidates.

Next iterate recommendation

Iterate 2.K should be one of:

  1. Longer-budget replay (~0 LOC). Re-run 2.J with -n 500000000 (10× budget, ~60s wallclock estimate) to push past host_ns≈9.4s and see if the AUDIT-068 install chain fires naturally now that the cache-state divergence is closed. If sub_824F8398 fires in the longer run, the cascade IS following just at slower wallclock. If it still doesn't, there's a second state-parity divergence to find.

  2. Replay-then-replay determinism check (~0 LOC). Run 2.J twice back-to-back with XENIA_CACHE_WIPE=1 and verify the second run produces identical (or near-identical) event count + same tid=4 work pattern. Cross-check that the persistent-cache path doesn't contaminate state between runs.

  3. 2.I-style arg-diff at the NEW first-divergence (~50-100 LOC). 2.I's diff harness was keyed on (kind, name, ord) only and missed the return-value divergence. Now that those return values align, re-run the diff to find the NEXT cross-engine first-divergence in args_resolved or side_effects within the 0-1s window. Likely reveals what state-parity divergence (if any) blocks the install chain from firing earlier on ours.

Recommended priority: (1) first (zero LOC, ~5 min, decisive), then (3) if (1) shows no install-chain fire.

Artifacts

Under xenia-rs/audit-runs/iterate-2J-cache-wipe-replay/:

  • ours-cold.jsonl (121,569 events, 50M-instr run, cache-wiped boot, ~28MB)
  • ours-cold.stdout.log / ours-cold.stderr.log (empty — quiet mode)
  • writer-report.md (this file)

Backup of pre-wipe XDG cache: /tmp/xenia-rs-cache-pre-2J-backup-<timestamp> (276K, 9 files).