Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
30 KiB
Plan — cache:\ subsystem fix for Phase C+11 main-chain advance
Context
Phase C+10 (2026-05-14) escalated the cache:\ divergence at Phase A idx=102404:
canary[6][102403] NtQueryFullAttributesFile path="cache:\d4ea4615\e\46ee8ca"
ours [1][102403] NtQueryFullAttributesFile path="cache:\d4ea4615\e\46ee8ca"
canary[6][102404] return=0 (file resolved in persistent cache)
ours [1][102404] return=0xC0000034 (file missing from per-process tmpdir)
Both engines query the same path byte-for-byte (C+10 emitter extension confirms). Canary's cache mount ~/.local/share/Xenia/cache/ is pre-populated with 23 files / 4.8 MB across 16 hash buckets, accumulated over prior boots. Ours's cache mount is per-process tmpdir at /tmp/xenia-rs-cache-PID-N, wiped per AUDIT-038 lockstep discipline (or — since AUDIT-054 — $HOME/.local/share/xenia-rs/cache when XENIA_CACHE_PERSIST=1).
The escalation criteria flagged "cache-population infrastructure" as out-of-scope for the C+10 session and deferred to this planning session.
Headline finding
The cache divergence is not "missing files" — it is two specific engine bugs in ours that prevent Sylpheed from building its own cache correctly:
-
NtSetInformationFileclass 10 (XFileRenameInformation) is a no-op stub in ours. Canary properly implements it viafile->Rename(target_path)(xboxkrnl_io_info.cc:226-243). Ours falls through to the catch-all arm that returnsSTATUS_SUCCESSwithout renaming (exports.rs:1820-1905; specifically line 1820 lists class 10 inmin_lengthbut no case-arm in thematch info_classbody at 1847-1905; the_ => (STATUS_SUCCESS, min_length)arm catches it). -
cache:\access,cache:\ignore,cache:\recentare created as directories in ours when they should be files. After running ours withXENIA_CACHE_PERSIST=1, these top-level cache entries appear in the host filesystem as empty directories (4096 Beach), whereas canary's cache has them as files (access= 240 B host file;recent= 160 B). The bug is in exports.rs::open_cache_file'sis_dir_opendiscriminator (lines 1041-1051) misclassifying these create requests. Suspected cause:want_dir = (create_options & FILE_DIRECTORY_FILE) != 0is true on Sylpheed's firstNtCreateFile cache:\accesscall. Either Sylpheed actually sets bit 0x1 (which canary tolerates without creating a directory because its HostPathDevice respects the disposition differently), or ours'screate_optionsarg-position read is wrong for the calls in question. Needs instrumentation to confirm.
Together these bugs produce the observed asymmetry:
- Canary's cache (warm, populated from prior boots) has 23 hierarchical leaf files (
<H1>/<X>/<H2>form), top-levelaccess(240 B) andrecent(160 B) manifests, and zero.tmpfiles. - Ours's persistent cache after one 50M boot has 7 flat
.tmpjournals at the cache root (<H1><H2>.tmpform, total 1.4 MB), 7 empty hash subdirectories, andaccess/ignore/recentas directories instead of files. - Persistence experiment confirms: even with
XENIA_CACHE_PERSIST=1and a warm boot (the.tmpfiles already present from a prior cold run), main matched-prefix is still 102404 (unchanged from C+10's default-tmpdir result). Persistence alone does not advance the matched-prefix because the hierarchical leaf filecache:\d4ea4615\e\46ee8canever materializes — the.tmprename to leaf path is silently dropped by ours's stubbedXFileRenameInformation.
These findings reframe AUDIT-038/052/053/054's debate. The cache-population problem is not "ours needs canary's cache content" or "ours needs Sylpheed's cache-build logic implemented from scratch" — it is "ours has bugs in two existing kernel exports that block Sylpheed's own cache-build logic from completing". Sylpheed's cache-build path already fires in ours (visible as .tmp writes, directory creates, NtSetInformationFile calls); it just cannot promote .tmp to leaf because of bug #1, and writes garbage state for the top-level manifests because of bug #2.
Investigation summary (verified facts)
Canary's cache (from disk enumeration of ~/.local/share/Xenia/cache/)
| top-level | type | size | notes |
|---|---|---|---|
access |
file | 240 B | 20 × 12-byte records: (hash1, hash2, refcount) manifest |
recent |
file | 160 B | 20 × 8-byte records: (hash1, hash2) recently-used list |
d4ea4615/ |
dir | — | 1 leaf (e/46ee8ca, 400 B Shift-JIS Japanese localization text with [SYSTEM]/[LANGUAGE]/XC_LANGUAGE_* table) |
69d8e45c/ |
dir | — | 9 leaves across 7 sub-letters (40 B–114 KB; IPFB-magic binary blobs) |
87719002/ |
dir | — | 7 leaves across 4 sub-letters (38 KB–2.7 MB; largest blob is 2.7 MB asset) |
aab216c3/ |
dir | — | 3 leaves across 2 sub-letters (2 KB–102 KB) |
Total: 23 files / 4.8 MB. Zero .tmp files.
Cache content is game-asset cache, not shader/PSO cache: localization text, font/asset binary blobs (IPFB magic suggests Japanese game-asset format), and the two manifest files (access enumerates known hashes; recent tracks recently used).
Canary's cache code (from canary source read)
- Mount registered in xenia-canary/src/xenia/app/xenia_main.cc:612-652: three
HostPathDevicemounts (\\CACHE0,\\CACHE1,\\CACHE) with symbolic-link aliasescache0:,cache1:,cache:— registered in that order becauseVirtualFileSystem::ResolvePathdoesstarts_withmatching. - Cache root =
storage_root / "cache".storage_rootdefaults to$XDG_DATA_HOME/Xeniaor$HOME/.local/share/Xeniaon POSIX (filesystem_posix.cc:76-97). - Cache is persistent: no wipe logic exists anywhere in canary source. Directories created on-demand by
HostPathDevice::Initializeif missing (host_path_device.cc:31-48). NtQueryFullAttributesFile(xboxkrnl_io.cc:474-513) returnsX_STATUS_SUCCESSwhenfile_system()->ResolvePath()returns an entry;X_STATUS_NO_SUCH_FILEotherwise. (Note: canary usesNO_SUCH_FILE = 0xC000000F; ours returnsOBJECT_NAME_NOT_FOUND = 0xC0000034. Both are negative NTSTATUS values; both treated equivalently by Sylpheed.)NtCreateFile(xboxkrnl_io.cc:39-111) routes throughFileSystem::OpenFile→HostPathEntry::CreateEntryInternalwhich callsstd::filesystem::create_directoriesfor the parent +OpenFile("wb")for the file (host_path_entry.cc:78-98).- All file IO is synchronous; canary's
XFile::WritecallsWriteSyncunconditionally (xfile.cc:262-293).
Ours's cache code (from current tree read)
KernelState::resolve_default_cache_root()at state.rs:1235-1273: defaults to per-process tmpdir + wipe; honorsXENIA_CACHE_ROOT=<path>(no wipe) andXENIA_CACHE_PERSIST=1($XDG_DATA_HOME/xenia-rs/cacheor$HOME/.local/share/xenia-rs/cache, no wipe). Called fromKernelState::new_with_gpuat state.rs:418-425, before any guest code runs.init_cache_rootat state.rs:499-510: whenwipe=true, callsremove_dir_allthencreate_dir_all; whenwipe=false, onlycreate_dir_all.open_cache_fileat exports.rs:1023-1196: AUDIT-054'sFILE_DIRECTORY_FILE-bit handling lives here.is_dir_openlogic (lines 1041-1051) decides file-vs-directory based onFILE_DIRECTORY_FILEbit (0x1) andhost_path.is_dir(). Has a suspicious fallbackhost_path == state.cache_root.as_deref().unwrap_or(host_path)that is a tautology whencache_rootisNone.nt_set_information_fileat exports.rs:1809-1909: validatesmin_lengthfor class 10 (correctly 16 bytes) but has no match-arm for class 10; falls through to_ => (STATUS_SUCCESS, min_length)catch-all at line 1904. This is the rename bug.- C+10 emitter extension at
call_exportstate.rs:657-687: wired forNtQueryFullAttributesFile,NtOpenSymbolicLinkObject,NtCreateFile,NtOpenFile. Not wired forNtSetInformationFile(the rename target path is in the info buffer, not in OBJECT_ATTRIBUTES, so this is the right design — but it means the rename target won't show up inargs_resolved.path; a separate emitter hook would be needed if we want diff visibility on rename targets).
Sylpheed's cache-build flow (from disassembly + event logs)
- Dispatcher
sub_82452DC0at PC 0x82452DEC tries primary data first (sub_82452068,sub_82452200). If primary returns 0 (not found), falls back to cache viasub_8245B000at PC 0x82452E1C. (The "cache is fallback" framing reverses the AUDIT-052 framing slightly — cache is the fallback, not the primary path.) - Cache gate
sub_8245B000validates the hash-key struct, then callssub_8245AD00which formats the path viasub_82459130(usingsprintfto rendercache:\<HASH1>\<X>\<HASH2>) and queries viasub_82612A78(NtQueryFullAttributesFile wrapper). On miss (r3 == -1), branches to failure path PC 0x8245ADFC; on hit, enters critical section, callssub_8245B1F8(cache file processor), and returns 1. - Cache-write path is NOT in sub_82452DC0. The agent that disassembled the dispatcher did not find any
NtCreateFilecalls in the cache-miss branch. So the cache-build is in a different code path — likely fired bysub_82452068/sub_82452200(the "primary data" handlers) which, on first-time access, both compute the data AND write it to cache. The Sylpheed binary references the stringscache:\access(0x820B5794),cache:\recent(0x820B5774),%s%08x%08x.tmp(0x820B57AC),cache:\ignore(0x820B5784),cache:\*.tmp(0x820B5764), andcache:\(0x820B57A4) — confirming the game DOES manage these files itself. - Event-log evidence confirms cache-build fires in ours: ours.jsonl tid=4 events at idx 28-484 show the full sequence:
NtCreateFile cache:\access→NtCreateFile cache:\ignore→NtCreateFile cache:\recent→NtCreateFile cache:\d4ea4615e46ee8ca.tmp→NtCreateFile cache:\d4ea4615(dir, AUDIT-054 path) →NtCreateFile cache:\d4ea4615\e(subdir) →NtOpenFile cache:\d4ea4615e46ee8ca.tmp→ ... → 111 totalNtSetInformationFilecalls. Canary's same trace has 0NtSetInformationFileevents in the 50M window because canary's cache is warm and doesn't fire the build path.
Persistence experiment (cold + warm boot, 50M each)
- Boot 1 (cold,
XENIA_CACHE_PERSIST=1): digestinstructions=50000003, imports=40485, swaps=1, draws=0. Differs from C+10 default-tmpdir baseline (50000002,40465) by +1 instruction / +20 imports — the persistence path takes slightly more guest cycles. Resulting on-disk cache: 7.tmpflat journals (1.4 MB total), 7 empty hash subdirectories, 3 empty directories namedaccess/ignore/recent. - Boot 2 (warm): digest unchanged from boot 1 (
instructions=50000003, imports=40485). No cxx_throw regression at 50M (AUDIT-053's regression was at 500M+; not reproduced in this window)..tmpfiles grew (e.g.d4ea4615e46ee8ca.tmp: 2400 B → 2800 B;aab216c3a2c8c185.tmp: 614 KB → 717 KB) — confirming AUDIT-053's "journal appends per boot" finding. - Boot 2 diff vs C+10 canary baseline:
canary_tid=6 → ours_tid=1matched=102404 (unchanged); divergence at the sameNtQueryFullAttributesFilereturn-value (canary=0 SUCCESS, ours=0xC0000034 NOT_FOUND). Persistence alone does not advance matched-prefix.
This experiment validates: enabling persistence is necessary but not sufficient. The .tmp files are produced but the rename-to-leaf step is broken, so the next boot's NtQuery for the leaf still returns NOT_FOUND.
Approaches considered
I considered five approaches, scored on lockstep digest impact, AUDIT-038 oracle-state risk, LOC, first-boot vs subsequent-boot behavior, and risk of regressing matched-prefix.
(a) Flip default to XENIA_CACHE_PERSIST=1 only
- What: Change
resolve_default_cache_rootso persistence is on by default. - Won't work alone: experiment proves matched-prefix stays at 102404 because the
.tmp-to-leaf promotion is broken (bug #1). Necessary but not sufficient.
(b) Implement Sylpheed's cache-generation logic in the engine
- What: Write engine-side code that mirrors what Sylpheed's primary-data path does (build cache from XGD assets).
- Don't need it: Sylpheed's binary already does this — the cache-build path fires in ours; it just doesn't finish because of bug #1 (rename). Reverse-engineering Sylpheed's asset extractor would be hundreds of LOC and is not necessary. The game does the work; ours just needs to honor the rename so the leaf file appears.
(c) Seed-from-canary at startup
- What: Copy canary's
~/.local/share/Xenia/cache/*to ours's cache root at boot. - Disqualified per user direction: AUDIT-038 oracle-state violation. The user's task explicitly says "Disqualify this option unless there's a strong-enough caveat". The strong caveat doesn't apply here because (b)-via-engine-bug-fix is feasible. Save this option as last-resort fallback.
(d) Synthesize on-demand
- What: Intercept
NtQueryFullAttributesFileforcache:\paths and lie SUCCESS even when the file is missing. - Doesn't work: canary follows the query with
NtCreateFileat idx 102481 (78 events later) to actually open and read the file. A SUCCESS lie without backing bytes only postpones the divergence by 78 events.
(e) Fix the two engine bugs that block Sylpheed's own cache-build (RECOMMENDED)
- What:
- Implement
NtSetInformationFileclass 10 (XFileRenameInformation) properly — mirror canary'sfile->Rename(target_path)for cache:-backed handles. - Fix
open_cache_file's file-vs-directory misclassification for top-level cache files (access,ignore,recent). - Flip default to persistent cache so the cache survives across boots and the build path can complete over N iterations. Keep
XENIA_CACHE_WIPE=1as opt-out. - Extend Phase A emitter to capture
NtSetInformationFileclass-10 rename target paths (~60 LOC across both engines) so future rename divergences are diff-visible.
- Implement
- Why it's right:
- No oracle state — ours builds its own cache from the same primary game data.
- Cache convergence is deterministic because cache content is derived from XEX assets, not engine-specific behavior. After N boots ours's cache should be byte-identical to canary's.
- Two engine bugs are documented + reproducible; both have direct canary mirrors to copy semantics from.
- AUDIT-053 warm-start cxx_throw regression was at 500M and is NOT reproduced at 50M; the Phase A diff harness window is 50M, so the regression is not blocking for the diff-harness use-case. (Document the regression as a separate known-issue for 500M+ runs.)
- LOC estimate: ~150-200 across 4-5 files. Breakdown below.
- Lockstep digest impact: NEW baseline. Both engines should be re-baselined together with
XENIA_CACHE_PERSIST=1enabled and a deterministic cache-warmup procedure. - Risk of matched-prefix regression (reading-error #23): LOW. The fix only adds behavior on previously-no-op kernel paths; it doesn't change existing successful paths. Determinism gate validates.
Recommended approach: (e)
Implement the two engine-side bug fixes and flip the persistence default. Let Sylpheed build its own cache over N boots. No oracle state, no .tmp-to-leaf magic, no cache seeding.
Implementation stages
Each stage is independently landable and verifiable.
Stage 1 — Implement NtSetInformationFile class 10 (XFileRenameInformation) + extend emitter to surface rename target
- Files:
- Ours: exports.rs (~40 LOC body); path.rs (~10 LOC info-buffer parser); state.rs
call_exportdispatch (~15 LOC); event_log.rs (re-useemit_kernel_call_with_path— 0 LOC). - Canary: xboxkrnl_io_info.cc is already correct (no change needed for body);
event_log.cc'sEmitImportAndCallWithCtxdispatch (~30 LOC) — extend to dispatch onname == "NtSetInformationFile"and read the rename target ANSI_STRING from the info buffer when info_class==10. - Total: ~95 LOC additive across both engines.
- Ours: exports.rs (~40 LOC body); path.rs (~10 LOC info-buffer parser); state.rs
- Scope (body fix, ours only):
- Add a
case 10arm innt_set_information_file's match (around line 1847). - Parse the
X_FILE_RENAME_INFORMATIONstruct atinfo_ptr: skipreplace_if_exists/root_directory(per canary, ignored on Xbox); read the trailing ANSI_STRING name. - Translate the new name via the same
cache:\-aware path resolver used byopen_cache_file. - If the source handle has
host_path = Some(_), callstd::fs::rename(src, dst)and update the handle's storedpath+host_path+sizefields. - If the source handle is VFS-backed (not cache:), return STATUS_INVALID_PARAMETER or NOT_IMPLEMENTED — Sylpheed only renames cache: files.
- Create parent directories for
dstas needed (create_dir_all(dst.parent())). - Honor the source handle's open-mode (close + re-open if necessary for write-renames).
- Add a
- Scope (emitter extension, both engines):
- Add a new helper
info_buffer_rename_target_raw(mem, info_ptr, info_length)in path.rs (ours) and an equivalentReadFileRenameInformationTargetin canary'sevent_log.cc. Both return the raw trimmed target path without normalization, mirroring the C+10 design forobject_attributes_raw_name. - In
call_export's dispatch (state.rs:657-687 ours;phase_a_bridge::EmitImportAndCallWithCtxin canary), add: whenname == "NtSetInformationFile"andgpr[7] == 10(info_class) andgpr[6] >= 16(info_length), resolve target via the helper and callemit_kernel_call_with_path. Otherwise legacy form. - No schema version bump —
args_resolved.pathis already declared free-form.
- Add a new helper
- Validation:
- New unit test in
exports.rs: createcache:\foo.tmp, write some bytes, call NtSetInformationFile class 10 with targetcache:\bar, verify host filesystem has<root>/barwith the correct bytes and no<root>/foo.tmp. - Determinism gate (3×
--stable-digest50M): with cvar OFF (no Phase A emitter), digest unchanged from baselineb8fa0e0460359a4f660adb7605e053de. With cvar ON, Phase A emitter det-fields stable across 2 runs but differ from C+10's7489e90e…(because rename-target paths are now in det signature). - Re-run persistence experiment: after Stage 1, ours's cache after 50M boot should produce hierarchical leaf files (
<H1>/<X>/<H2>) instead of flat.tmpfiles. - Phase A diff: re-run
tools/diff-events/diff_events.pywith new ours run vs new canary run; expected matched-prefix advance.
- New unit test in
- Rollback criterion: if cvar-OFF determinism digest changes from baseline, or if any of the 165 existing unit tests fail, revert.
Stage 2 — Fix top-level cache file misclassification
- Files: exports.rs
open_cache_file(~10-20 LOC at lines 1041-1051). - Scope:
- Instrument first: add a one-shot tracing log at top of
open_cache_fileprintingpath,create_options,create_disposition,want_dir,host_path.is_dir(), and the finalis_dir_openvalue. Run ours with persistence + check the log for the cache:\access call. - Two likely fixes depending on what instrumentation shows:
- Option 2a (canary parity): if Sylpheed passes
FILE_DIRECTORY_FILEbit 0x1 for these files, canary tolerates it because its disposition / non-directory bit takes precedence ((create_options & FILE_DIRECTORY_FILE) != 0is only treated as authoritative when bit 0x2,FILE_NON_DIRECTORY_FILE, is not also set). Cross-check the bit in canary's NtCreateFile_entry. - Option 2b (arg-reading fix): if ours is reading
create_optionsfrom the wrong slot (similar to AUDIT-053's r7→r8 mistake), correct it.
- Option 2a (canary parity): if Sylpheed passes
- Add explicit unit test:
NtCreateFile cache:\accesswith the bit-pattern Sylpheed uses must result in a host file, not a directory.
- Instrument first: add a one-shot tracing log at top of
- Validation:
- After Stage 2, persistent run of ours should produce
<root>/access,<root>/ignore,<root>/recentas files (matching canary), not directories. - Phase A diff: should not regress matched-prefix.
- After Stage 2, persistent run of ours should produce
- Rollback criterion: same as Stage 1.
Stage 3 — Flip default to persistent cache + re-baseline
- Files: state.rs
resolve_default_cache_root(~10 LOC); related unit testcache_root_cleared_on_initmay need updating. - Scope:
- Change default:
(default_persistent_path(), false)instead of(tmpdir_path(), true). Persistent cache becomes the new default for bothcargo runand CI Phase A runs. - Add
XENIA_CACHE_WIPE=1opt-out (re-enables AUDIT-038 tmpdir-wipe behavior). Document in state.rs:1235's docstring as "preserved for emergency lockstep-state-reset scenarios; not recommended for diff-harness runs because the C+10 path emitter now makes cache divergences diff-visible regardless". - Confirm both
XENIA_CACHE_ROOT=<path>andXENIA_CACHE_PERSIST=1retain their prior semantics (the latter becomes a no-op when default is already persistent, but keep it for backwards compat). - Re-baseline both engines' Phase A digests under the new default. Run a "cache warmup" of e.g. 5 sequential 50M boots so the cache stabilizes, then capture the new C+11 baseline.
- Update existing test
cache_root_cleared_on_initto useXENIA_CACHE_WIPE=1explicitly (its determinism-gate purpose is preserved).
- Change default:
- Validation:
- Determinism: 3× 50M runs with default settings must produce the same
--stable-digest(post-warmup). - Phase A: re-run diff. Expected behavior: matched-prefix advances dramatically past 102404 (canary's
cache:\d4ea4615\e\46ee8caquery returns SUCCESS in both engines on a warm cache; the next ~16 cache-hash queries also resolve; matched-prefix advances by hundreds-to-thousands of events until a non-cache divergence appears). - Phase B
image_loaded_sha256: unchanged (ea8d160e…) — cache state doesn't affect image hash. - Unit tests: all 165 pass.
- Determinism: 3× 50M runs with default settings must produce the same
- Rollback criterion: if the new baseline is non-deterministic (3 runs produce different digests) or if matched-prefix REGRESSES below 102404, revert and investigate.
Stage 4 (optional, deferred) — Re-test AUDIT-053 warm-start regression at 500M
- Scope: Run ours
XENIA_CACHE_PERSIST=1for 500M instructions across 5 successive boots; check forcxx_throwevents from version-header mismatch (the AUDIT-053 / AUDIT-054 regression). If reproduced, investigate.tmpjournal truncation logic. If not reproduced (AUDIT-054's FILE_DIRECTORY_FILE fix + Stage 1's rename fix together resolve it), update memory entries accordingly. - Validation: 5×500M sequential boots with no cxx_throw regression; cache content stabilizes (no unbounded
.tmpgrowth). - Why deferred: Stage 1-3 unblock the 50M Phase A diff window which is the immediate goal. 500M warm-start is a separate property to validate but not on the critical path for Phase C+11.
Out of scope / deferred
- STFS / SVOD content packages — separate VFS subsystem; not touched.
- XAM content packages (DLC, themes, gamerpics) — handled by separate content_root, not by
cache:. - Save games — separate
content:mount, not bycache:. - GPU shader cache — handled by
cache_rootcvar forgraphics_system_in canary; ours does not yet implement this (and Sylpheed at 50M doesn't fire the shader-cache path). Deferred. - Sylpheed binary writers for
access/recentmanifests — investigation found string refs but did not locate the writers in 50M event window. Bug fixes in this plan should be sufficient because the writers will fire eventually when ours's cache hierarchy supports them. cache0:andcache1:aliases — canary mounts three; ours currently funnels all three to one cache root viaresolve_cache_pathprefix-strip (state.rs:534-543). If Sylpheed uses cache0/cache1 distinctly, a follow-up may need to separate them. Not yet known whether Sylpheed does.- Phase A emitter for
NtSetInformationFilerename target path — schema-v1 supportsargs_resolved.pathalready; emitter would need extending to dispatch on info_class==10 and read the X_FILE_RENAME_INFORMATION name. Optional, not blocking.
Validation strategy ("done enough" for iteration to resume)
The cache subsystem is "done enough" when:
- Phase A diff matched-prefix advances past 102,404 by at least several hundred events on the main chain (canary tid=6 ↔ ours tid=1). Cascading cache-hash resolutions should advance the matched-prefix by ~100s to ~1000s of events each; the next non-cache divergence appears past idx ~110K.
- All 6 sister chains hold or advance (no regression on tid=4↔11, tid=7↔2, tid=12↔7, tid=14↔9, tid=15↔10).
- 165 existing unit tests pass; ~3 new tests land for cache rename + cache top-level files.
- Phase A determinism digest reproducible: 3×
--stable-digestruns at 50M produce identical digest. New C+11 baseline captured. - Phase B
image_loaded_sha256unchanged:ea8d160e…still matches. - Both engines build clean (cargo build --release for ours,
xenia-canaryMSVC Debug for canary). - On-disk cache content (post Stage 3) approximately matches canary's: same 16 top-level hash buckets, same hierarchical leaf structure, same
access/recentmanifests as files (byte-identical content not required because game-data-derived).
If matched-prefix advances past 102,404 but stops at a NEW cache-related divergence (e.g. a 17th hash bucket that wasn't in the original 16), this counts as in-scope continuation. If matched-prefix stops at a non-cache divergence (a different kernel export, a thread-scheduling difference), the cache subsystem is complete and the next session inherits the new divergence.
Critical files to read before implementation
- exports.rs:1023-1196 —
open_cache_file(Stage 2 target) - exports.rs:1809-1909 —
nt_set_information_file(Stage 1 target) - exports.rs:6830-6980 — cache test suite (Stage 1/2 add tests here)
- state.rs:1235-1273 —
resolve_default_cache_root(Stage 3 target) - xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:226-243 — canary's XFileRenameInformation impl (mirror semantics)
Reading-error class
No new class. Existing classes re-affirmed:
- Class #28 (oracle source supersedes spec): verified canary's
NtSetInformationFileimplementation by reading xboxkrnl_io_info.cc:226-243; not assumed. - Class #15 / ζ (VFS layout aliasing per AUDIT-053): the AUDIT-054 fix was correct but didn't catch this sibling bug (rename) or the top-level-file-as-directory bug. Both are now identified.
A possible future class would be: "stub-by-min-length-validation": ours's nt_set_information_file validated min_length for class 10 in its lookup table but had no actual implementation, so calls returned STATUS_SUCCESS without performing the operation. This is reading-error class #29 candidate ("validation table claims support that the body doesn't deliver") — defer the formal naming until a second instance is found.
Open questions (for next implementation session, NOT this plan)
- Does Sylpheed actually call NtSetInformationFile class 10, or does it use NtDeleteFile + NtCreateFile to "rename"? Stage 1 instrumentation should confirm class 10 is hit; if not, the bug is elsewhere. (Strong indirect evidence says class 10: canary properly implements it, Sylpheed binary references rename-style cache:\ patterns, ours has 111 NtSetInformationFile calls per boot but 0 in canary.)
- Does Sylpheed write
cache:\accessandcache:\recentfrom the same 50M window, or does that fire later (e.g. after cache-build cycle completes)? If later, those files only appear after Stage 3's multi-boot warmup. - Are
cache:\accessandcache:\recentsize-deterministic byte-for-byte across engines, or do they include host-allocator addresses / timestamps / RNG state? If non-deterministic, matching ours's cache to canary's content would require canonicalization in the diff tool (similar to AUDIT-043's ALLOCATOR_RETURN_FNS). - Should Stage 3 introduce a "cache warmup harness" (run N boots automatically) or leave warmup to the developer? Probably the latter — keep tests simple, document the procedure.
Deliverables expected after this plan is approved
xenia-rs/audit-runs/cache-subsystem-plan/plan.md— this plan (copied from/home/fabi/.claude/plans/you-are-starting-a-inherited-pizza.md)xenia-rs/audit-runs/cache-subsystem-plan/investigation.md— investigation notes captured here (canary cache enumeration, Sylpheed disassembly summary, persistence experiment result)xenia-rs/audit-runs/cache-subsystem-plan/canary-cache-listing.csv— already collected (23 files / 4.8 MB enumerated)xenia-rs/audit-runs/cache-subsystem-plan/persistent-experiment.md— already collected (cold-vs-warm 50M digest table, .tmp growth observation, matched-prefix unchanged result)xenia-rs/audit-runs/cache-subsystem-plan/persist-warm-events.jsonl— already collected (121,450 events fromXENIA_CACHE_PERSIST=1warm boot)- Memory entry:
project_cache_subsystem_plan_2026_05_14.md— summary + recommendation + sized roadmap MEMORY.mdindex update — one line