Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
12 KiB
Iterate 2.J — Cache-wipe replay (writer report)
Date: 2026-05-28. LOC delta: engine 0, canary 0. Pure test-harness parity measurement (no code change). Tests: N/A (no source modifications).
Headline
WEDGE-MOVED. Primary gate PASS: 2.J's NtQueryFullAttributesFile
cache-probe calls now return 0xc000000f (STATUS_NO_SUCH_FILE) for all
9 cache:\* paths, matching canary's cold-cache baseline (iterate 2.I
documented ours returning STATUS_SUCCESS for the same paths in 2.H —
the inversion identified there is closed by the env-var fix). Cascade is
partial: tid=4 (cache-rebuild worker) explodes from 160 → 2,075
events (~13×, +97% NtCreateFile/NtOpenFile/NtWriteFile to cache:\ and
cache:\<bucket>\<x>\<file>.tmp); total event count 118,149 → 121,569
(+3,420, +2.9%); tid=1 wedge geometry changed (last guest_cycle
9,140,200 → 9,169,116, +28,916 cycles). VdSwap count unchanged (1
swap); thread set still 10 entries (no new spawns); sub_824F8398 /
sub_825070F0 still 0 fires. Cache-divergence is real and now closed,
but it was not the keystone for the AUDIT-068 install chain.
Mode
Pure measurement, ZERO LOC change. Invocation:
XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec -n 50000000 --quiet \
--phase-a-event-log audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \
"<iso>"
Identical to iterate-2H invocation, with XENIA_CACHE_WIPE=1 prepended.
Belt-and-braces: also rm -rf /home/fabi/.local/share/xenia-rs/cache/
before run (backup at /tmp/xenia-rs-cache-pre-2J-backup-*).
Cache wipe mechanism (verified)
From xenia-rs/crates/xenia-kernel/src/state.rs:1837-1893
(resolve_default_cache_root): XENIA_CACHE_WIPE=1 redirects
cache_root to a per-process tmpdir at
$TMPDIR/xenia-rs-cache-<pid>-<n> AND returns wipe=true, which makes
init_cache_root (state.rs:728-758) do the clear-then-recreate dance.
This properly isolates ours from any pre-existing XDG cache. No
separate binary/JIT cache exists in this codebase
(only XDG cache at $HOME/.local/share/xenia-rs/cache/).
Primary gate result — cache-probe return values
PASS (9/9). Every NtQueryFullAttributesFile call on a cache:\*
path in 2.J returns 0xc000000f (STATUS_NO_SUCH_FILE). The first
divergence flagged by iterate 2.I (idx 102423,
cache:\d4ea4615\e\46ee8ca, ours STATUS_SUCCESS vs canary
STATUS_NO_SUCH_FILE) is now bit-aligned with canary's cold-cache
return.
Cache-probe paths and 2.J returns:
| tid_event_idx | path | 2.J status | canary baseline status |
|---|---|---|---|
| 102423 | cache:\d4ea4615\e\46ee8ca |
0xc000000f |
0xc000000f |
| 103840 | cache:\69d8e45c\8\3421153 |
0xc000000f |
0xc000000f |
| 103996 | cache:\69d8e45c\9\355f2f8 |
0xc000000f |
0xc000000f |
| 104453 | cache:\69d8e45c\e\534ffea |
0xc000000f |
0xc000000f |
| 105477 | cache:\aab216c3\a\2c8c185 |
0xc000000f |
0xc000000f |
| 105792 | cache:\69d8e45c\9\73a5c0a |
0xc000000f |
0xc000000f |
| 106228 | cache:\69d8e45c\9\39a9dcc |
0xc000000f |
0xc000000f |
| (+others) | cache:\aab216c3\5\ee70e0a |
0xc000000f |
0xc000000f |
cache:\ root open and cache:\access/cache:\ignore/cache:\recent
metadata probes also align with canary's cold-cache behavior.
Secondary cascade gate results
(a) tid=1 last timestamp
- 2.H: cycle=9,140,200 / host_ns=792,522,910 (NtWaitForSingleObjectEx return)
- 2.J: cycle=9,169,116 / host_ns=749,717,731 (NtWaitForSingleObjectEx return)
- Delta: +28,916 cycles on tid=1 (continued progression). host_ns decrease is mechanical: 2.H spent ~43ms of host wallclock spinning at the wedge during the last few hundred matched events; 2.J consumed fewer host-side spin cycles because it actually consumed instruction budget on cache-rebuild work. Both runs hit the 50M-instr budget, not a wedge.
(b) Wedge PC
Per the prompt, the 2.F+2.I wedge target was tid=1 PC 0x824ac578 (the
bl 0x8284E02C NtWaitForSingleObjectEx with timeout=-1 on thread
handle 0x1210). 2.J's tail shows tid=1 executing many NtWait...
calls past that wedge that return success (return_value=0,
status=0x00000000), not timeout. The wait wrapper is no longer
parked. The 50M-instr run terminates with all 14 tids in returning
NtWait... calls, not in blocked waits. WEDGE-MOVED (or possibly
absent within this instruction budget — would need a longer run to
distinguish).
(c) sub_824F8398 fires?
0 fires. Grep for 824f8398 across the full ours-cold.jsonl: zero
hits. The AUDIT-068 ctx-installer chain (sub_824F8398 → sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24) is still upstream
of the boot window ours reaches in 50M instructions. Per canary
baseline this fires at host_ns≈9.4s; ours reaches host_ns≈759ms.
(d) sub_825070F0 fires?
0 fires. The post-VdSwap worker fan-out is still absent. Same mechanism as (c) — downstream of an install chain that ours doesn't reach inside the budget.
(e) Thread set / spawn count
10 thread.create entries (unchanged from 2.H). The new entry_pc list is bit-identical to 2.H:
0x82181830, 0x8245a5d0, 0x82450a28, 0x82457ef0, 0x824cd458,
0x822f1ee0, 0x824d2878, 0x824d2940, 0x82178950, 0x821748f0
Canary tids 15/27/28 worker analogs still absent. ctx_ptr columns
bit-stable vs 2.H (vA0000000 bucket fix retained):
0xbe8cbb3c, 0xbd184a40, 0xbc6c5640. Per tripstone #28, comparison
is keyed on entry_pc, not integer tid.
(f) Total event count
118,149 → 121,569 (+3,420, +2.9%). The increment is concentrated on the cache-rebuild worker (tid=4: 160 → 2,075 events, +1,915 = ~56% of the delta).
(g) Missing (op, lr) tuples (iterate-2D method)
Not re-measured. Phase-A --phase-a-event-log capture does not feed
the 2.D diff pipeline (which consumes --lr-trace of IAT thunks at
0x8284DDDC/E49C/DF5C/E07C). 2.H report noted the same restriction.
Expected unchanged at 28/28 — the producer LRs that fire in canary
target downstream worker classes (sub_825070F0 fan-out) that ours
still doesn't reach. Re-running 2.D requires a separate capture mode.
(h) VdSwap count
1 swap unchanged (3 events = import.call + kernel.call + kernel.return for the same single VdSwap call at cycle=5,577,303 / host_ns=489.2ms). Per tripstone #39: gameplay-level progression (swaps > 1 or draws > 0) NOT achieved. The 2.J run still wedges before the second swap.
(i) Draw count
0 draws. No *Draw* kernel-call names emitted (consistent with
VdSwap=1: pre-gameplay).
Cascade roll-up
| gate | description | 2.H | 2.J | result |
|---|---|---|---|---|
| PRIMARY | cache-probe 0xc000000f matches canary |
FAIL (returns SUCCESS) | PASS (9/9) | PASS |
| (a) tid=1 last cycle | progression | 9,140,200 | 9,169,116 | +28,916 |
(b) wedge PC 0x824ac578 parked |
wait timeout=-1 | parked | NtWait returns 0 | MOVED |
(c) sub_824F8398 fires |
install chain | 0 | 0 | UNCHANGED |
(d) sub_825070F0 fires |
fan-out | 0 | 0 | UNCHANGED |
| (e) thread set size | spawns | 10 entries | 10 entries | UNCHANGED |
| (f) total event count | volume | 118,149 | 121,569 | +2.9% |
| (g) missing-tuple count | 2.D diff | 28 | n/a (different capture) | NOT-MEASURED |
| (h) VdSwap count | gameplay swaps | 1 | 1 | UNCHANGED |
| (i) draws | gameplay draws | 0 | 0 | UNCHANGED |
Outcome class: WEDGE-MOVED. Primary gate fully passes. tid=1 wedge
geometry moved (wait now returns success). Cache-rebuild worker tid=4
springs into life (~13× event growth). But the deeper install chain
(sub_824F8398 / sub_825070F0) remains downstream of the 50M-instr
budget; gameplay-level progression (VdSwap > 1, draws > 0) NOT achieved.
What changed and why
The 2.I diagnosis was correct in its mechanism but only partially correct in its prediction:
-
Mechanism correct: ours's cache contained 9 files from previous runs (276K total).
NtQueryFullAttributesFilereturnedSTATUS_SUCCESSfor files that should be missing on a cold boot. Canary's capture protocol wipes both XDG and binary caches; ours's warm-cache state put the engine on a cache-HIT replay branch instead of cache-MISS reconstruction. tid=4 was hardly doing anything in 2.H because the cache already existed. In 2.J it actively rebuilds the cache (36 NtCreateFile, 24 NtOpenFile, 19 NtWriteFile to*.tmpfiles and bucket directories). -
Prediction partial: closing the cache-state divergence did unblock one wait wrapper (the previously-parked
0x824ac578wait now returns success), but did NOT cascade through to thesub_824F8398install chain orsub_825070F0worker fan-out. The install epoch on canary fires at host_ns≈9.4s; ours's 50M-instr run ends at host_ns≈760ms. The wedge moved earlier, but the canary trajectory is still ~12× further along in wallclock when its install chain fires.
Tripstone audit
- #28 (per-engine tid stability): All cross-engine comparisons are
keyed on
entry_pcand first-kernel-call signature, never on integer tid. The "tid=1 wedge" / "tid=4 cache rebuild" identities are ours-internal and stable across 2.H ↔ 2.J because both runs are ours-side (deterministic scheduler). - #39 (composite progression): The headline does NOT claim "gameplay progression" — VdSwap count unchanged at 1, draws unchanged at 0. The PRIMARY-gate PASS is a structural / state-parity claim (cache state matches canary baseline). Secondary observation tid=1 wedge geometry MOVED is reported with both improving (cycle +28,916) and ambiguous (host_ns shifted backward due to less spin-wait) evidence.
- #40 (single-keystone framing): The 2.I prompt framing
"cache-wipe single test-harness parity fix may unblock the wedge"
is partially falsified. Cache-state IS load-bearing (one wedge
moved, +3,420 events, tid=4 came alive) but is NOT the keystone for
the AUDIT-068 install chain (
sub_824F8398still 0 fires). The iterate 2.E reading-error #40 class ("single-keystone framing falsified") REPEATS here. Recommend explicitly registering reading error #41: state-parity gate PASS does not imply cascade — even bit-identical input state can land on different trajectories when ~12× wallclock separates the install epochs.
Confidence
- HIGH that primary gate genuinely passes (all 9 cache-probe paths bit-aligned with canary).
- HIGH that tid=4 cache-rebuild work is the bulk of the +3,420 event delta (cache file I/O directly visible in args_resolved.path).
- HIGH that the wedge moved (NtWait at
0x824ac578no longer parked). - HIGH that
sub_824F8398/sub_825070F0still 0 fires (instrumented multiple grep paths). - MEDIUM that the next blocker is "longer instruction budget + install chain investigation" vs "additional state-parity divergence upstream of install epoch". Both classes remain candidates.
Next iterate recommendation
Iterate 2.K should be one of:
-
Longer-budget replay (~0 LOC). Re-run 2.J with
-n 500000000(10× budget, ~60s wallclock estimate) to push past host_ns≈9.4s and see if the AUDIT-068 install chain fires naturally now that the cache-state divergence is closed. Ifsub_824F8398fires in the longer run, the cascade IS following just at slower wallclock. If it still doesn't, there's a second state-parity divergence to find. -
Replay-then-replay determinism check (~0 LOC). Run 2.J twice back-to-back with
XENIA_CACHE_WIPE=1and verify the second run produces identical (or near-identical) event count + same tid=4 work pattern. Cross-check that the persistent-cache path doesn't contaminate state between runs. -
2.I-style arg-diff at the NEW first-divergence (~50-100 LOC). 2.I's diff harness was keyed on (kind, name, ord) only and missed the return-value divergence. Now that those return values align, re-run the diff to find the NEXT cross-engine first-divergence in args_resolved or side_effects within the 0-1s window. Likely reveals what state-parity divergence (if any) blocks the install chain from firing earlier on ours.
Recommended priority: (1) first (zero LOC, ~5 min, decisive), then (3) if (1) shows no install-chain fire.
Artifacts
Under xenia-rs/audit-runs/iterate-2J-cache-wipe-replay/:
ours-cold.jsonl(121,569 events, 50M-instr run, cache-wiped boot, ~28MB)ours-cold.stdout.log/ours-cold.stderr.log(empty — quiet mode)writer-report.md(this file)
Backup of pre-wipe XDG cache:
/tmp/xenia-rs-cache-pre-2J-backup-<timestamp> (276K, 9 files).