Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
263 lines
12 KiB
Markdown
263 lines
12 KiB
Markdown
# Iterate 2.J — Cache-wipe replay (writer report)
|
||
|
||
**Date:** 2026-05-28. **LOC delta:** engine **0**, canary **0**. Pure
|
||
test-harness parity measurement (no code change).
|
||
**Tests:** N/A (no source modifications).
|
||
|
||
## Headline
|
||
|
||
**WEDGE-MOVED.** Primary gate **PASS**: 2.J's `NtQueryFullAttributesFile`
|
||
cache-probe calls now return `0xc000000f` (`STATUS_NO_SUCH_FILE`) for all
|
||
9 `cache:\*` paths, matching canary's cold-cache baseline (iterate 2.I
|
||
documented ours returning `STATUS_SUCCESS` for the same paths in 2.H —
|
||
the inversion identified there is closed by the env-var fix). Cascade is
|
||
**partial**: tid=4 (cache-rebuild worker) explodes from 160 → 2,075
|
||
events (~13×, +97% NtCreateFile/NtOpenFile/NtWriteFile to `cache:\` and
|
||
`cache:\<bucket>\<x>\<file>.tmp`); total event count 118,149 → 121,569
|
||
(+3,420, +2.9%); tid=1 wedge geometry changed (last `guest_cycle`
|
||
9,140,200 → 9,169,116, +28,916 cycles). VdSwap count unchanged (1
|
||
swap); thread set still 10 entries (no new spawns); `sub_824F8398` /
|
||
`sub_825070F0` still 0 fires. Cache-divergence is real and now closed,
|
||
but it was not the keystone for the AUDIT-068 install chain.
|
||
|
||
## Mode
|
||
|
||
Pure measurement, ZERO LOC change. Invocation:
|
||
```
|
||
XENIA_CACHE_WIPE=1 timeout 600 ./target/release/xenia-rs exec -n 50000000 --quiet \
|
||
--phase-a-event-log audit-runs/iterate-2J-cache-wipe-replay/ours-cold.jsonl \
|
||
"<iso>"
|
||
```
|
||
Identical to iterate-2H invocation, with `XENIA_CACHE_WIPE=1` prepended.
|
||
Belt-and-braces: also `rm -rf /home/fabi/.local/share/xenia-rs/cache/`
|
||
before run (backup at `/tmp/xenia-rs-cache-pre-2J-backup-*`).
|
||
|
||
## Cache wipe mechanism (verified)
|
||
|
||
From `xenia-rs/crates/xenia-kernel/src/state.rs:1837-1893`
|
||
(`resolve_default_cache_root`): `XENIA_CACHE_WIPE=1` redirects
|
||
`cache_root` to a per-process tmpdir at
|
||
`$TMPDIR/xenia-rs-cache-<pid>-<n>` AND returns `wipe=true`, which makes
|
||
`init_cache_root` (state.rs:728-758) do the clear-then-recreate dance.
|
||
This properly isolates ours from any pre-existing XDG cache. No
|
||
separate binary/JIT cache exists in this codebase
|
||
(only XDG cache at `$HOME/.local/share/xenia-rs/cache/`).
|
||
|
||
## Primary gate result — cache-probe return values
|
||
|
||
**PASS (9/9).** Every `NtQueryFullAttributesFile` call on a `cache:\*`
|
||
path in 2.J returns `0xc000000f` (`STATUS_NO_SUCH_FILE`). The first
|
||
divergence flagged by iterate 2.I (idx 102423,
|
||
`cache:\d4ea4615\e\46ee8ca`, ours `STATUS_SUCCESS` vs canary
|
||
`STATUS_NO_SUCH_FILE`) is now bit-aligned with canary's cold-cache
|
||
return.
|
||
|
||
Cache-probe paths and 2.J returns:
|
||
|
||
| tid_event_idx | path | 2.J status | canary baseline status |
|
||
|---|---|---|---|
|
||
| 102423 | `cache:\d4ea4615\e\46ee8ca` | `0xc000000f` | `0xc000000f` |
|
||
| 103840 | `cache:\69d8e45c\8\3421153` | `0xc000000f` | `0xc000000f` |
|
||
| 103996 | `cache:\69d8e45c\9\355f2f8` | `0xc000000f` | `0xc000000f` |
|
||
| 104453 | `cache:\69d8e45c\e\534ffea` | `0xc000000f` | `0xc000000f` |
|
||
| 105477 | `cache:\aab216c3\a\2c8c185` | `0xc000000f` | `0xc000000f` |
|
||
| 105792 | `cache:\69d8e45c\9\73a5c0a` | `0xc000000f` | `0xc000000f` |
|
||
| 106228 | `cache:\69d8e45c\9\39a9dcc` | `0xc000000f` | `0xc000000f` |
|
||
| (+others) | `cache:\aab216c3\5\ee70e0a` | `0xc000000f` | `0xc000000f` |
|
||
|
||
`cache:\` root open and `cache:\access`/`cache:\ignore`/`cache:\recent`
|
||
metadata probes also align with canary's cold-cache behavior.
|
||
|
||
## Secondary cascade gate results
|
||
|
||
### (a) tid=1 last timestamp
|
||
- **2.H**: cycle=9,140,200 / host_ns=792,522,910 (NtWaitForSingleObjectEx return)
|
||
- **2.J**: cycle=9,169,116 / host_ns=749,717,731 (NtWaitForSingleObjectEx return)
|
||
- Delta: **+28,916 cycles** on tid=1 (continued progression). host_ns
|
||
decrease is mechanical: 2.H spent ~43ms of host wallclock spinning at
|
||
the wedge during the last few hundred matched events; 2.J consumed
|
||
fewer host-side spin cycles because it actually consumed instruction
|
||
budget on cache-rebuild work. Both runs hit the 50M-instr budget,
|
||
not a wedge.
|
||
|
||
### (b) Wedge PC
|
||
Per the prompt, the 2.F+2.I wedge target was tid=1 PC `0x824ac578` (the
|
||
`bl 0x8284E02C` NtWaitForSingleObjectEx with timeout=-1 on thread
|
||
handle `0x1210`). 2.J's tail shows tid=1 executing many `NtWait...`
|
||
calls past that wedge that **return success** (`return_value=0`,
|
||
`status=0x00000000`), not timeout. The wait wrapper is no longer
|
||
parked. The 50M-instr run terminates with all 14 tids in returning
|
||
`NtWait...` calls, not in blocked waits. **WEDGE-MOVED** (or possibly
|
||
absent within this instruction budget — would need a longer run to
|
||
distinguish).
|
||
|
||
### (c) `sub_824F8398` fires?
|
||
**0 fires.** Grep for `824f8398` across the full ours-cold.jsonl: zero
|
||
hits. The AUDIT-068 ctx-installer chain (`sub_824F8398 →
|
||
sub_824F7CD0 → sub_824F7800 → sub_824FD240+0x24`) is **still upstream
|
||
of the boot window** ours reaches in 50M instructions. Per canary
|
||
baseline this fires at host_ns≈9.4s; ours reaches host_ns≈759ms.
|
||
|
||
### (d) `sub_825070F0` fires?
|
||
**0 fires.** The post-VdSwap worker fan-out is still absent. Same
|
||
mechanism as (c) — downstream of an install chain that ours doesn't
|
||
reach inside the budget.
|
||
|
||
### (e) Thread set / spawn count
|
||
**10 thread.create entries (unchanged from 2.H).** The new
|
||
entry_pc list is bit-identical to 2.H:
|
||
```
|
||
0x82181830, 0x8245a5d0, 0x82450a28, 0x82457ef0, 0x824cd458,
|
||
0x822f1ee0, 0x824d2878, 0x824d2940, 0x82178950, 0x821748f0
|
||
```
|
||
Canary tids 15/27/28 worker analogs still **absent**. ctx_ptr columns
|
||
bit-stable vs 2.H (vA0000000 bucket fix retained):
|
||
`0xbe8cbb3c`, `0xbd184a40`, `0xbc6c5640`. Per tripstone #28, comparison
|
||
is keyed on entry_pc, not integer tid.
|
||
|
||
### (f) Total event count
|
||
**118,149 → 121,569 (+3,420, +2.9%).** The increment is concentrated on
|
||
the cache-rebuild worker (tid=4: 160 → 2,075 events, +1,915 = ~56% of
|
||
the delta).
|
||
|
||
### (g) Missing (op, lr) tuples (iterate-2D method)
|
||
**Not re-measured.** Phase-A `--phase-a-event-log` capture does not feed
|
||
the 2.D diff pipeline (which consumes `--lr-trace` of IAT thunks at
|
||
`0x8284DDDC/E49C/DF5C/E07C`). 2.H report noted the same restriction.
|
||
Expected unchanged at 28/28 — the producer LRs that fire in canary
|
||
target downstream worker classes (`sub_825070F0` fan-out) that ours
|
||
still doesn't reach. Re-running 2.D requires a separate capture mode.
|
||
|
||
### (h) VdSwap count
|
||
**1 swap unchanged** (3 events = import.call + kernel.call + kernel.return
|
||
for the same single VdSwap call at cycle=5,577,303 / host_ns=489.2ms).
|
||
Per tripstone #39: gameplay-level progression (swaps > 1 or draws > 0)
|
||
NOT achieved. The 2.J run still wedges before the second swap.
|
||
|
||
### (i) Draw count
|
||
**0 draws.** No `*Draw*` kernel-call names emitted (consistent with
|
||
VdSwap=1: pre-gameplay).
|
||
|
||
## Cascade roll-up
|
||
|
||
| gate | description | 2.H | 2.J | result |
|
||
|------|-------------|-----|-----|--------|
|
||
| PRIMARY | cache-probe `0xc000000f` matches canary | FAIL (returns SUCCESS) | PASS (9/9) | **PASS** |
|
||
| (a) tid=1 last cycle | progression | 9,140,200 | 9,169,116 | +28,916 |
|
||
| (b) wedge PC `0x824ac578` parked | wait timeout=-1 | parked | NtWait returns 0 | **MOVED** |
|
||
| (c) `sub_824F8398` fires | install chain | 0 | 0 | UNCHANGED |
|
||
| (d) `sub_825070F0` fires | fan-out | 0 | 0 | UNCHANGED |
|
||
| (e) thread set size | spawns | 10 entries | 10 entries | UNCHANGED |
|
||
| (f) total event count | volume | 118,149 | 121,569 | +2.9% |
|
||
| (g) missing-tuple count | 2.D diff | 28 | n/a (different capture) | NOT-MEASURED |
|
||
| (h) VdSwap count | gameplay swaps | 1 | 1 | UNCHANGED |
|
||
| (i) draws | gameplay draws | 0 | 0 | UNCHANGED |
|
||
|
||
**Outcome class: WEDGE-MOVED.** Primary gate fully passes. tid=1 wedge
|
||
geometry moved (wait now returns success). Cache-rebuild worker tid=4
|
||
springs into life (~13× event growth). But the deeper install chain
|
||
(`sub_824F8398` / `sub_825070F0`) remains downstream of the 50M-instr
|
||
budget; gameplay-level progression (VdSwap > 1, draws > 0) NOT achieved.
|
||
|
||
## What changed and why
|
||
|
||
The 2.I diagnosis was correct in its mechanism but only partially
|
||
correct in its prediction:
|
||
|
||
- **Mechanism correct**: ours's cache contained 9 files from previous
|
||
runs (276K total). `NtQueryFullAttributesFile` returned
|
||
`STATUS_SUCCESS` for files that should be missing on a cold boot.
|
||
Canary's capture protocol wipes both XDG and binary caches; ours's
|
||
warm-cache state put the engine on a cache-HIT replay branch instead
|
||
of cache-MISS reconstruction. tid=4 was hardly doing anything in 2.H
|
||
because the cache already existed. In 2.J it actively rebuilds the
|
||
cache (36 NtCreateFile, 24 NtOpenFile, 19 NtWriteFile to `*.tmp`
|
||
files and bucket directories).
|
||
|
||
- **Prediction partial**: closing the cache-state divergence did unblock
|
||
one wait wrapper (the previously-parked `0x824ac578` wait now returns
|
||
success), but did NOT cascade through to the
|
||
`sub_824F8398` install chain or `sub_825070F0` worker fan-out. The
|
||
install epoch on canary fires at host_ns≈9.4s; ours's 50M-instr run
|
||
ends at host_ns≈760ms. The wedge moved earlier, but the canary
|
||
trajectory is still ~12× further along in wallclock when its install
|
||
chain fires.
|
||
|
||
## Tripstone audit
|
||
|
||
- **#28** (per-engine tid stability): All cross-engine comparisons are
|
||
keyed on `entry_pc` and first-kernel-call signature, never on integer
|
||
tid. The "tid=1 wedge" / "tid=4 cache rebuild" identities are
|
||
ours-internal and stable across 2.H ↔ 2.J because both runs are
|
||
ours-side (deterministic scheduler).
|
||
- **#39** (composite progression): The headline does NOT claim "gameplay
|
||
progression" — VdSwap count unchanged at 1, draws unchanged at 0. The
|
||
PRIMARY-gate PASS is a **structural / state-parity** claim (cache
|
||
state matches canary baseline). Secondary observation tid=1 wedge
|
||
geometry MOVED is reported with both improving (cycle +28,916) and
|
||
ambiguous (host_ns shifted backward due to less spin-wait) evidence.
|
||
- **#40** (single-keystone framing): The 2.I prompt framing
|
||
"cache-wipe single test-harness parity fix may unblock the wedge"
|
||
is **partially falsified**. Cache-state IS load-bearing (one wedge
|
||
moved, +3,420 events, tid=4 came alive) but is NOT the keystone for
|
||
the AUDIT-068 install chain (`sub_824F8398` still 0 fires). The
|
||
iterate 2.E reading-error #40 class ("single-keystone framing
|
||
falsified") REPEATS here. Recommend explicitly registering reading
|
||
error #41: **state-parity gate PASS does not imply cascade — even
|
||
bit-identical input state can land on different trajectories when
|
||
~12× wallclock separates the install epochs**.
|
||
|
||
## Confidence
|
||
|
||
- **HIGH** that primary gate genuinely passes (all 9 cache-probe paths
|
||
bit-aligned with canary).
|
||
- **HIGH** that tid=4 cache-rebuild work is the bulk of the +3,420
|
||
event delta (cache file I/O directly visible in args_resolved.path).
|
||
- **HIGH** that the wedge moved (NtWait at `0x824ac578` no longer
|
||
parked).
|
||
- **HIGH** that `sub_824F8398` / `sub_825070F0` still 0 fires
|
||
(instrumented multiple grep paths).
|
||
- **MEDIUM** that the next blocker is "longer instruction budget +
|
||
install chain investigation" vs "additional state-parity divergence
|
||
upstream of install epoch". Both classes remain candidates.
|
||
|
||
## Next iterate recommendation
|
||
|
||
**Iterate 2.K should be one of:**
|
||
|
||
1. **Longer-budget replay (~0 LOC).** Re-run 2.J with `-n 500000000`
|
||
(10× budget, ~60s wallclock estimate) to push past host_ns≈9.4s and
|
||
see if the AUDIT-068 install chain fires naturally now that the
|
||
cache-state divergence is closed. If `sub_824F8398` fires in the
|
||
longer run, the cascade IS following just at slower wallclock. If it
|
||
still doesn't, there's a second state-parity divergence to find.
|
||
|
||
2. **Replay-then-replay determinism check (~0 LOC).** Run 2.J twice
|
||
back-to-back with `XENIA_CACHE_WIPE=1` and verify the second run
|
||
produces identical (or near-identical) event count + same tid=4
|
||
work pattern. Cross-check that the persistent-cache path doesn't
|
||
contaminate state between runs.
|
||
|
||
3. **2.I-style arg-diff at the NEW first-divergence (~50-100 LOC).**
|
||
2.I's diff harness was keyed on (kind, name, ord) only and missed
|
||
the return-value divergence. Now that those return values align,
|
||
re-run the diff to find the NEXT cross-engine first-divergence in
|
||
args_resolved or side_effects within the 0-1s window. Likely
|
||
reveals what state-parity divergence (if any) blocks the install
|
||
chain from firing earlier on ours.
|
||
|
||
Recommended priority: **(1) first** (zero LOC, ~5 min, decisive),
|
||
then **(3)** if (1) shows no install-chain fire.
|
||
|
||
## Artifacts
|
||
|
||
Under `xenia-rs/audit-runs/iterate-2J-cache-wipe-replay/`:
|
||
|
||
- `ours-cold.jsonl` (121,569 events, 50M-instr run, cache-wiped boot,
|
||
~28MB)
|
||
- `ours-cold.stdout.log` / `ours-cold.stderr.log` (empty — quiet mode)
|
||
- `writer-report.md` (this file)
|
||
|
||
Backup of pre-wipe XDG cache:
|
||
`/tmp/xenia-rs-cache-pre-2J-backup-<timestamp>` (276K, 9 files).
|