Files
xenia-rs/audit-runs/cache-subsystem-plan/plan.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

30 KiB
Raw Blame History

Plan — cache:\ subsystem fix for Phase C+11 main-chain advance

Context

Phase C+10 (2026-05-14) escalated the cache:\ divergence at Phase A idx=102404:

canary[6][102403] NtQueryFullAttributesFile path="cache:\d4ea4615\e\46ee8ca"
ours  [1][102403] NtQueryFullAttributesFile path="cache:\d4ea4615\e\46ee8ca"
canary[6][102404] return=0           (file resolved in persistent cache)
ours  [1][102404] return=0xC0000034  (file missing from per-process tmpdir)

Both engines query the same path byte-for-byte (C+10 emitter extension confirms). Canary's cache mount ~/.local/share/Xenia/cache/ is pre-populated with 23 files / 4.8 MB across 16 hash buckets, accumulated over prior boots. Ours's cache mount is per-process tmpdir at /tmp/xenia-rs-cache-PID-N, wiped per AUDIT-038 lockstep discipline (or — since AUDIT-054 — $HOME/.local/share/xenia-rs/cache when XENIA_CACHE_PERSIST=1).

The escalation criteria flagged "cache-population infrastructure" as out-of-scope for the C+10 session and deferred to this planning session.

Headline finding

The cache divergence is not "missing files" — it is two specific engine bugs in ours that prevent Sylpheed from building its own cache correctly:

  1. NtSetInformationFile class 10 (XFileRenameInformation) is a no-op stub in ours. Canary properly implements it via file->Rename(target_path) (xboxkrnl_io_info.cc:226-243). Ours falls through to the catch-all arm that returns STATUS_SUCCESS without renaming (exports.rs:1820-1905; specifically line 1820 lists class 10 in min_length but no case-arm in the match info_class body at 1847-1905; the _ => (STATUS_SUCCESS, min_length) arm catches it).

  2. cache:\access, cache:\ignore, cache:\recent are created as directories in ours when they should be files. After running ours with XENIA_CACHE_PERSIST=1, these top-level cache entries appear in the host filesystem as empty directories (4096 B each), whereas canary's cache has them as files (access = 240 B host file; recent = 160 B). The bug is in exports.rs::open_cache_file's is_dir_open discriminator (lines 1041-1051) misclassifying these create requests. Suspected cause: want_dir = (create_options & FILE_DIRECTORY_FILE) != 0 is true on Sylpheed's first NtCreateFile cache:\access call. Either Sylpheed actually sets bit 0x1 (which canary tolerates without creating a directory because its HostPathDevice respects the disposition differently), or ours's create_options arg-position read is wrong for the calls in question. Needs instrumentation to confirm.

Together these bugs produce the observed asymmetry:

  • Canary's cache (warm, populated from prior boots) has 23 hierarchical leaf files (<H1>/<X>/<H2> form), top-level access (240 B) and recent (160 B) manifests, and zero .tmp files.
  • Ours's persistent cache after one 50M boot has 7 flat .tmp journals at the cache root (<H1><H2>.tmp form, total 1.4 MB), 7 empty hash subdirectories, and access/ignore/recent as directories instead of files.
  • Persistence experiment confirms: even with XENIA_CACHE_PERSIST=1 and a warm boot (the .tmp files already present from a prior cold run), main matched-prefix is still 102404 (unchanged from C+10's default-tmpdir result). Persistence alone does not advance the matched-prefix because the hierarchical leaf file cache:\d4ea4615\e\46ee8ca never materializes — the .tmp rename to leaf path is silently dropped by ours's stubbed XFileRenameInformation.

These findings reframe AUDIT-038/052/053/054's debate. The cache-population problem is not "ours needs canary's cache content" or "ours needs Sylpheed's cache-build logic implemented from scratch" — it is "ours has bugs in two existing kernel exports that block Sylpheed's own cache-build logic from completing". Sylpheed's cache-build path already fires in ours (visible as .tmp writes, directory creates, NtSetInformationFile calls); it just cannot promote .tmp to leaf because of bug #1, and writes garbage state for the top-level manifests because of bug #2.

Investigation summary (verified facts)

Canary's cache (from disk enumeration of ~/.local/share/Xenia/cache/)

top-level type size notes
access file 240 B 20 × 12-byte records: (hash1, hash2, refcount) manifest
recent file 160 B 20 × 8-byte records: (hash1, hash2) recently-used list
d4ea4615/ dir 1 leaf (e/46ee8ca, 400 B Shift-JIS Japanese localization text with [SYSTEM]/[LANGUAGE]/XC_LANGUAGE_* table)
69d8e45c/ dir 9 leaves across 7 sub-letters (40 B114 KB; IPFB-magic binary blobs)
87719002/ dir 7 leaves across 4 sub-letters (38 KB2.7 MB; largest blob is 2.7 MB asset)
aab216c3/ dir 3 leaves across 2 sub-letters (2 KB102 KB)

Total: 23 files / 4.8 MB. Zero .tmp files.

Cache content is game-asset cache, not shader/PSO cache: localization text, font/asset binary blobs (IPFB magic suggests Japanese game-asset format), and the two manifest files (access enumerates known hashes; recent tracks recently used).

Canary's cache code (from canary source read)

  • Mount registered in xenia-canary/src/xenia/app/xenia_main.cc:612-652: three HostPathDevice mounts (\\CACHE0, \\CACHE1, \\CACHE) with symbolic-link aliases cache0:, cache1:, cache: — registered in that order because VirtualFileSystem::ResolvePath does starts_with matching.
  • Cache root = storage_root / "cache". storage_root defaults to $XDG_DATA_HOME/Xenia or $HOME/.local/share/Xenia on POSIX (filesystem_posix.cc:76-97).
  • Cache is persistent: no wipe logic exists anywhere in canary source. Directories created on-demand by HostPathDevice::Initialize if missing (host_path_device.cc:31-48).
  • NtQueryFullAttributesFile (xboxkrnl_io.cc:474-513) returns X_STATUS_SUCCESS when file_system()->ResolvePath() returns an entry; X_STATUS_NO_SUCH_FILE otherwise. (Note: canary uses NO_SUCH_FILE = 0xC000000F; ours returns OBJECT_NAME_NOT_FOUND = 0xC0000034. Both are negative NTSTATUS values; both treated equivalently by Sylpheed.)
  • NtCreateFile (xboxkrnl_io.cc:39-111) routes through FileSystem::OpenFileHostPathEntry::CreateEntryInternal which calls std::filesystem::create_directories for the parent + OpenFile("wb") for the file (host_path_entry.cc:78-98).
  • All file IO is synchronous; canary's XFile::Write calls WriteSync unconditionally (xfile.cc:262-293).

Ours's cache code (from current tree read)

  • KernelState::resolve_default_cache_root() at state.rs:1235-1273: defaults to per-process tmpdir + wipe; honors XENIA_CACHE_ROOT=<path> (no wipe) and XENIA_CACHE_PERSIST=1 ($XDG_DATA_HOME/xenia-rs/cache or $HOME/.local/share/xenia-rs/cache, no wipe). Called from KernelState::new_with_gpu at state.rs:418-425, before any guest code runs.
  • init_cache_root at state.rs:499-510: when wipe=true, calls remove_dir_all then create_dir_all; when wipe=false, only create_dir_all.
  • open_cache_file at exports.rs:1023-1196: AUDIT-054's FILE_DIRECTORY_FILE-bit handling lives here. is_dir_open logic (lines 1041-1051) decides file-vs-directory based on FILE_DIRECTORY_FILE bit (0x1) and host_path.is_dir(). Has a suspicious fallback host_path == state.cache_root.as_deref().unwrap_or(host_path) that is a tautology when cache_root is None.
  • nt_set_information_file at exports.rs:1809-1909: validates min_length for class 10 (correctly 16 bytes) but has no match-arm for class 10; falls through to _ => (STATUS_SUCCESS, min_length) catch-all at line 1904. This is the rename bug.
  • C+10 emitter extension at call_export state.rs:657-687: wired for NtQueryFullAttributesFile, NtOpenSymbolicLinkObject, NtCreateFile, NtOpenFile. Not wired for NtSetInformationFile (the rename target path is in the info buffer, not in OBJECT_ATTRIBUTES, so this is the right design — but it means the rename target won't show up in args_resolved.path; a separate emitter hook would be needed if we want diff visibility on rename targets).

Sylpheed's cache-build flow (from disassembly + event logs)

  • Dispatcher sub_82452DC0 at PC 0x82452DEC tries primary data first (sub_82452068, sub_82452200). If primary returns 0 (not found), falls back to cache via sub_8245B000 at PC 0x82452E1C. (The "cache is fallback" framing reverses the AUDIT-052 framing slightly — cache is the fallback, not the primary path.)
  • Cache gate sub_8245B000 validates the hash-key struct, then calls sub_8245AD00 which formats the path via sub_82459130 (using sprintf to render cache:\<HASH1>\<X>\<HASH2>) and queries via sub_82612A78 (NtQueryFullAttributesFile wrapper). On miss (r3 == -1), branches to failure path PC 0x8245ADFC; on hit, enters critical section, calls sub_8245B1F8 (cache file processor), and returns 1.
  • Cache-write path is NOT in sub_82452DC0. The agent that disassembled the dispatcher did not find any NtCreateFile calls in the cache-miss branch. So the cache-build is in a different code path — likely fired by sub_82452068/sub_82452200 (the "primary data" handlers) which, on first-time access, both compute the data AND write it to cache. The Sylpheed binary references the strings cache:\access (0x820B5794), cache:\recent (0x820B5774), %s%08x%08x.tmp (0x820B57AC), cache:\ignore (0x820B5784), cache:\*.tmp (0x820B5764), and cache:\ (0x820B57A4) — confirming the game DOES manage these files itself.
  • Event-log evidence confirms cache-build fires in ours: ours.jsonl tid=4 events at idx 28-484 show the full sequence: NtCreateFile cache:\accessNtCreateFile cache:\ignoreNtCreateFile cache:\recentNtCreateFile cache:\d4ea4615e46ee8ca.tmpNtCreateFile cache:\d4ea4615 (dir, AUDIT-054 path) → NtCreateFile cache:\d4ea4615\e (subdir) → NtOpenFile cache:\d4ea4615e46ee8ca.tmp → ... → 111 total NtSetInformationFile calls. Canary's same trace has 0 NtSetInformationFile events in the 50M window because canary's cache is warm and doesn't fire the build path.

Persistence experiment (cold + warm boot, 50M each)

  • Boot 1 (cold, XENIA_CACHE_PERSIST=1): digest instructions=50000003, imports=40485, swaps=1, draws=0. Differs from C+10 default-tmpdir baseline (50000002, 40465) by +1 instruction / +20 imports — the persistence path takes slightly more guest cycles. Resulting on-disk cache: 7 .tmp flat journals (1.4 MB total), 7 empty hash subdirectories, 3 empty directories named access/ignore/recent.
  • Boot 2 (warm): digest unchanged from boot 1 (instructions=50000003, imports=40485). No cxx_throw regression at 50M (AUDIT-053's regression was at 500M+; not reproduced in this window). .tmp files grew (e.g. d4ea4615e46ee8ca.tmp: 2400 B → 2800 B; aab216c3a2c8c185.tmp: 614 KB → 717 KB) — confirming AUDIT-053's "journal appends per boot" finding.
  • Boot 2 diff vs C+10 canary baseline: canary_tid=6 → ours_tid=1 matched=102404 (unchanged); divergence at the same NtQueryFullAttributesFile return-value (canary=0 SUCCESS, ours=0xC0000034 NOT_FOUND). Persistence alone does not advance matched-prefix.

This experiment validates: enabling persistence is necessary but not sufficient. The .tmp files are produced but the rename-to-leaf step is broken, so the next boot's NtQuery for the leaf still returns NOT_FOUND.

Approaches considered

I considered five approaches, scored on lockstep digest impact, AUDIT-038 oracle-state risk, LOC, first-boot vs subsequent-boot behavior, and risk of regressing matched-prefix.

(a) Flip default to XENIA_CACHE_PERSIST=1 only

  • What: Change resolve_default_cache_root so persistence is on by default.
  • Won't work alone: experiment proves matched-prefix stays at 102404 because the .tmp-to-leaf promotion is broken (bug #1). Necessary but not sufficient.

(b) Implement Sylpheed's cache-generation logic in the engine

  • What: Write engine-side code that mirrors what Sylpheed's primary-data path does (build cache from XGD assets).
  • Don't need it: Sylpheed's binary already does this — the cache-build path fires in ours; it just doesn't finish because of bug #1 (rename). Reverse-engineering Sylpheed's asset extractor would be hundreds of LOC and is not necessary. The game does the work; ours just needs to honor the rename so the leaf file appears.

(c) Seed-from-canary at startup

  • What: Copy canary's ~/.local/share/Xenia/cache/* to ours's cache root at boot.
  • Disqualified per user direction: AUDIT-038 oracle-state violation. The user's task explicitly says "Disqualify this option unless there's a strong-enough caveat". The strong caveat doesn't apply here because (b)-via-engine-bug-fix is feasible. Save this option as last-resort fallback.

(d) Synthesize on-demand

  • What: Intercept NtQueryFullAttributesFile for cache:\ paths and lie SUCCESS even when the file is missing.
  • Doesn't work: canary follows the query with NtCreateFile at idx 102481 (78 events later) to actually open and read the file. A SUCCESS lie without backing bytes only postpones the divergence by 78 events.
  • What:
    1. Implement NtSetInformationFile class 10 (XFileRenameInformation) properly — mirror canary's file->Rename(target_path) for cache:-backed handles.
    2. Fix open_cache_file's file-vs-directory misclassification for top-level cache files (access, ignore, recent).
    3. Flip default to persistent cache so the cache survives across boots and the build path can complete over N iterations. Keep XENIA_CACHE_WIPE=1 as opt-out.
    4. Extend Phase A emitter to capture NtSetInformationFile class-10 rename target paths (~60 LOC across both engines) so future rename divergences are diff-visible.
  • Why it's right:
    • No oracle state — ours builds its own cache from the same primary game data.
    • Cache convergence is deterministic because cache content is derived from XEX assets, not engine-specific behavior. After N boots ours's cache should be byte-identical to canary's.
    • Two engine bugs are documented + reproducible; both have direct canary mirrors to copy semantics from.
    • AUDIT-053 warm-start cxx_throw regression was at 500M and is NOT reproduced at 50M; the Phase A diff harness window is 50M, so the regression is not blocking for the diff-harness use-case. (Document the regression as a separate known-issue for 500M+ runs.)
  • LOC estimate: ~150-200 across 4-5 files. Breakdown below.
  • Lockstep digest impact: NEW baseline. Both engines should be re-baselined together with XENIA_CACHE_PERSIST=1 enabled and a deterministic cache-warmup procedure.
  • Risk of matched-prefix regression (reading-error #23): LOW. The fix only adds behavior on previously-no-op kernel paths; it doesn't change existing successful paths. Determinism gate validates.

Implement the two engine-side bug fixes and flip the persistence default. Let Sylpheed build its own cache over N boots. No oracle state, no .tmp-to-leaf magic, no cache seeding.

Implementation stages

Each stage is independently landable and verifiable.

Stage 1 — Implement NtSetInformationFile class 10 (XFileRenameInformation) + extend emitter to surface rename target

  • Files:
    • Ours: exports.rs (~40 LOC body); path.rs (~10 LOC info-buffer parser); state.rs call_export dispatch (~15 LOC); event_log.rs (re-use emit_kernel_call_with_path — 0 LOC).
    • Canary: xboxkrnl_io_info.cc is already correct (no change needed for body); event_log.cc's EmitImportAndCallWithCtx dispatch (~30 LOC) — extend to dispatch on name == "NtSetInformationFile" and read the rename target ANSI_STRING from the info buffer when info_class==10.
    • Total: ~95 LOC additive across both engines.
  • Scope (body fix, ours only):
    • Add a case 10 arm in nt_set_information_file's match (around line 1847).
    • Parse the X_FILE_RENAME_INFORMATION struct at info_ptr: skip replace_if_exists/root_directory (per canary, ignored on Xbox); read the trailing ANSI_STRING name.
    • Translate the new name via the same cache:\-aware path resolver used by open_cache_file.
    • If the source handle has host_path = Some(_), call std::fs::rename(src, dst) and update the handle's stored path + host_path + size fields.
    • If the source handle is VFS-backed (not cache:), return STATUS_INVALID_PARAMETER or NOT_IMPLEMENTED — Sylpheed only renames cache: files.
    • Create parent directories for dst as needed (create_dir_all(dst.parent())).
    • Honor the source handle's open-mode (close + re-open if necessary for write-renames).
  • Scope (emitter extension, both engines):
    • Add a new helper info_buffer_rename_target_raw(mem, info_ptr, info_length) in path.rs (ours) and an equivalent ReadFileRenameInformationTarget in canary's event_log.cc. Both return the raw trimmed target path without normalization, mirroring the C+10 design for object_attributes_raw_name.
    • In call_export's dispatch (state.rs:657-687 ours; phase_a_bridge::EmitImportAndCallWithCtx in canary), add: when name == "NtSetInformationFile" and gpr[7] == 10 (info_class) and gpr[6] >= 16 (info_length), resolve target via the helper and call emit_kernel_call_with_path. Otherwise legacy form.
    • No schema version bump — args_resolved.path is already declared free-form.
  • Validation:
    • New unit test in exports.rs: create cache:\foo.tmp, write some bytes, call NtSetInformationFile class 10 with target cache:\bar, verify host filesystem has <root>/bar with the correct bytes and no <root>/foo.tmp.
    • Determinism gate (3× --stable-digest 50M): with cvar OFF (no Phase A emitter), digest unchanged from baseline b8fa0e0460359a4f660adb7605e053de. With cvar ON, Phase A emitter det-fields stable across 2 runs but differ from C+10's 7489e90e… (because rename-target paths are now in det signature).
    • Re-run persistence experiment: after Stage 1, ours's cache after 50M boot should produce hierarchical leaf files (<H1>/<X>/<H2>) instead of flat .tmp files.
    • Phase A diff: re-run tools/diff-events/diff_events.py with new ours run vs new canary run; expected matched-prefix advance.
  • Rollback criterion: if cvar-OFF determinism digest changes from baseline, or if any of the 165 existing unit tests fail, revert.

Stage 2 — Fix top-level cache file misclassification

  • Files: exports.rs open_cache_file (~10-20 LOC at lines 1041-1051).
  • Scope:
    • Instrument first: add a one-shot tracing log at top of open_cache_file printing path, create_options, create_disposition, want_dir, host_path.is_dir(), and the final is_dir_open value. Run ours with persistence + check the log for the cache:\access call.
    • Two likely fixes depending on what instrumentation shows:
      • Option 2a (canary parity): if Sylpheed passes FILE_DIRECTORY_FILE bit 0x1 for these files, canary tolerates it because its disposition / non-directory bit takes precedence ((create_options & FILE_DIRECTORY_FILE) != 0 is only treated as authoritative when bit 0x2, FILE_NON_DIRECTORY_FILE, is not also set). Cross-check the bit in canary's NtCreateFile_entry.
      • Option 2b (arg-reading fix): if ours is reading create_options from the wrong slot (similar to AUDIT-053's r7→r8 mistake), correct it.
    • Add explicit unit test: NtCreateFile cache:\access with the bit-pattern Sylpheed uses must result in a host file, not a directory.
  • Validation:
    • After Stage 2, persistent run of ours should produce <root>/access, <root>/ignore, <root>/recent as files (matching canary), not directories.
    • Phase A diff: should not regress matched-prefix.
  • Rollback criterion: same as Stage 1.

Stage 3 — Flip default to persistent cache + re-baseline

  • Files: state.rs resolve_default_cache_root (~10 LOC); related unit test cache_root_cleared_on_init may need updating.
  • Scope:
    • Change default: (default_persistent_path(), false) instead of (tmpdir_path(), true). Persistent cache becomes the new default for both cargo run and CI Phase A runs.
    • Add XENIA_CACHE_WIPE=1 opt-out (re-enables AUDIT-038 tmpdir-wipe behavior). Document in state.rs:1235's docstring as "preserved for emergency lockstep-state-reset scenarios; not recommended for diff-harness runs because the C+10 path emitter now makes cache divergences diff-visible regardless".
    • Confirm both XENIA_CACHE_ROOT=<path> and XENIA_CACHE_PERSIST=1 retain their prior semantics (the latter becomes a no-op when default is already persistent, but keep it for backwards compat).
    • Re-baseline both engines' Phase A digests under the new default. Run a "cache warmup" of e.g. 5 sequential 50M boots so the cache stabilizes, then capture the new C+11 baseline.
    • Update existing test cache_root_cleared_on_init to use XENIA_CACHE_WIPE=1 explicitly (its determinism-gate purpose is preserved).
  • Validation:
    • Determinism: 3× 50M runs with default settings must produce the same --stable-digest (post-warmup).
    • Phase A: re-run diff. Expected behavior: matched-prefix advances dramatically past 102404 (canary's cache:\d4ea4615\e\46ee8ca query returns SUCCESS in both engines on a warm cache; the next ~16 cache-hash queries also resolve; matched-prefix advances by hundreds-to-thousands of events until a non-cache divergence appears).
    • Phase B image_loaded_sha256: unchanged (ea8d160e…) — cache state doesn't affect image hash.
    • Unit tests: all 165 pass.
  • Rollback criterion: if the new baseline is non-deterministic (3 runs produce different digests) or if matched-prefix REGRESSES below 102404, revert and investigate.

Stage 4 (optional, deferred) — Re-test AUDIT-053 warm-start regression at 500M

  • Scope: Run ours XENIA_CACHE_PERSIST=1 for 500M instructions across 5 successive boots; check for cxx_throw events from version-header mismatch (the AUDIT-053 / AUDIT-054 regression). If reproduced, investigate .tmp journal truncation logic. If not reproduced (AUDIT-054's FILE_DIRECTORY_FILE fix + Stage 1's rename fix together resolve it), update memory entries accordingly.
  • Validation: 5×500M sequential boots with no cxx_throw regression; cache content stabilizes (no unbounded .tmp growth).
  • Why deferred: Stage 1-3 unblock the 50M Phase A diff window which is the immediate goal. 500M warm-start is a separate property to validate but not on the critical path for Phase C+11.

Out of scope / deferred

  • STFS / SVOD content packages — separate VFS subsystem; not touched.
  • XAM content packages (DLC, themes, gamerpics) — handled by separate content_root, not by cache:.
  • Save games — separate content: mount, not by cache:.
  • GPU shader cache — handled by cache_root cvar for graphics_system_ in canary; ours does not yet implement this (and Sylpheed at 50M doesn't fire the shader-cache path). Deferred.
  • Sylpheed binary writers for access/recent manifests — investigation found string refs but did not locate the writers in 50M event window. Bug fixes in this plan should be sufficient because the writers will fire eventually when ours's cache hierarchy supports them.
  • cache0: and cache1: aliases — canary mounts three; ours currently funnels all three to one cache root via resolve_cache_path prefix-strip (state.rs:534-543). If Sylpheed uses cache0/cache1 distinctly, a follow-up may need to separate them. Not yet known whether Sylpheed does.
  • Phase A emitter for NtSetInformationFile rename target path — schema-v1 supports args_resolved.path already; emitter would need extending to dispatch on info_class==10 and read the X_FILE_RENAME_INFORMATION name. Optional, not blocking.

Validation strategy ("done enough" for iteration to resume)

The cache subsystem is "done enough" when:

  1. Phase A diff matched-prefix advances past 102,404 by at least several hundred events on the main chain (canary tid=6 ↔ ours tid=1). Cascading cache-hash resolutions should advance the matched-prefix by ~100s to ~1000s of events each; the next non-cache divergence appears past idx ~110K.
  2. All 6 sister chains hold or advance (no regression on tid=4↔11, tid=7↔2, tid=12↔7, tid=14↔9, tid=15↔10).
  3. 165 existing unit tests pass; ~3 new tests land for cache rename + cache top-level files.
  4. Phase A determinism digest reproducible: 3× --stable-digest runs at 50M produce identical digest. New C+11 baseline captured.
  5. Phase B image_loaded_sha256 unchanged: ea8d160e… still matches.
  6. Both engines build clean (cargo build --release for ours, xenia-canary MSVC Debug for canary).
  7. On-disk cache content (post Stage 3) approximately matches canary's: same 16 top-level hash buckets, same hierarchical leaf structure, same access/recent manifests as files (byte-identical content not required because game-data-derived).

If matched-prefix advances past 102,404 but stops at a NEW cache-related divergence (e.g. a 17th hash bucket that wasn't in the original 16), this counts as in-scope continuation. If matched-prefix stops at a non-cache divergence (a different kernel export, a thread-scheduling difference), the cache subsystem is complete and the next session inherits the new divergence.

Critical files to read before implementation

Reading-error class

No new class. Existing classes re-affirmed:

  • Class #28 (oracle source supersedes spec): verified canary's NtSetInformationFile implementation by reading xboxkrnl_io_info.cc:226-243; not assumed.
  • Class #15 / ζ (VFS layout aliasing per AUDIT-053): the AUDIT-054 fix was correct but didn't catch this sibling bug (rename) or the top-level-file-as-directory bug. Both are now identified.

A possible future class would be: "stub-by-min-length-validation": ours's nt_set_information_file validated min_length for class 10 in its lookup table but had no actual implementation, so calls returned STATUS_SUCCESS without performing the operation. This is reading-error class #29 candidate ("validation table claims support that the body doesn't deliver") — defer the formal naming until a second instance is found.

Open questions (for next implementation session, NOT this plan)

  1. Does Sylpheed actually call NtSetInformationFile class 10, or does it use NtDeleteFile + NtCreateFile to "rename"? Stage 1 instrumentation should confirm class 10 is hit; if not, the bug is elsewhere. (Strong indirect evidence says class 10: canary properly implements it, Sylpheed binary references rename-style cache:\ patterns, ours has 111 NtSetInformationFile calls per boot but 0 in canary.)
  2. Does Sylpheed write cache:\access and cache:\recent from the same 50M window, or does that fire later (e.g. after cache-build cycle completes)? If later, those files only appear after Stage 3's multi-boot warmup.
  3. Are cache:\access and cache:\recent size-deterministic byte-for-byte across engines, or do they include host-allocator addresses / timestamps / RNG state? If non-deterministic, matching ours's cache to canary's content would require canonicalization in the diff tool (similar to AUDIT-043's ALLOCATOR_RETURN_FNS).
  4. Should Stage 3 introduce a "cache warmup harness" (run N boots automatically) or leave warmup to the developer? Probably the latter — keep tests simple, document the procedure.

Deliverables expected after this plan is approved

  • xenia-rs/audit-runs/cache-subsystem-plan/plan.md — this plan (copied from /home/fabi/.claude/plans/you-are-starting-a-inherited-pizza.md)
  • xenia-rs/audit-runs/cache-subsystem-plan/investigation.md — investigation notes captured here (canary cache enumeration, Sylpheed disassembly summary, persistence experiment result)
  • xenia-rs/audit-runs/cache-subsystem-plan/canary-cache-listing.csv — already collected (23 files / 4.8 MB enumerated)
  • xenia-rs/audit-runs/cache-subsystem-plan/persistent-experiment.md — already collected (cold-vs-warm 50M digest table, .tmp growth observation, matched-prefix unchanged result)
  • xenia-rs/audit-runs/cache-subsystem-plan/persist-warm-events.jsonl — already collected (121,450 events from XENIA_CACHE_PERSIST=1 warm boot)
  • Memory entry: project_cache_subsystem_plan_2026_05_14.md — summary + recommendation + sized roadmap
  • MEMORY.md index update — one line