Files
xenia-rs/audit-runs/cache-subsystem-plan/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

15 KiB
Raw Blame History

Cache subsystem investigation — Phase C+11 planning (2026-05-14)

Scope

This investigation informs the plan at plan.md. It was run as a dedicated planning session after Phase C+10 escalated the cache divergence at idx 102404. Findings are READ-ONLY observations; no source modified.

1. Canary's cache enumeration

Canary's mount: ~/.local/share/Xenia/cache/ (the POSIX storage_root / "cache" convention; canary's xenia-canary/src/xenia/app/xenia_main.cc:612-652 registers three HostPathDevice mounts at \\CACHE0, \\CACHE1, \\CACHE aliased to cache0:, cache1:, cache: symbolic links).

State at session start: 23 files / 4.8 MB across 16 hash buckets. Pre-populated across many prior canary boots. Full enumeration in canary-cache-listing.csv.

Notable properties:

  • Zero .tmp files — canary's cache holds only resolved hierarchical leaves (<H1>/<X>/<H2> form) plus two top-level manifests (access, recent). The .tmp flat-journal files Sylpheed uses for staging are renamed/removed before they persist.
  • Top-level access and recent are files, not directories. Layouts:
    • access: 20×12-byte records (hash1 u32 BE, hash2 u32 BE, refcount u32). The 240 B file enumerates the 20 known cache entries (note: 23 files total on disk but only 20 manifest entries — three of the on-disk files are not indexed; possibly recent-only or orphans).
    • recent: 20×8-byte records (hash1 u32 BE, hash2 u32 BE). Recently-used ordering of the same hash pairs.
  • Cache content is game-asset cache: Shift-JIS Japanese localization text (d4ea4615/e/46ee8ca[SYSTEM]/[LANGUAGE]/XC_LANGUAGE_* table); IPFB-magic binary blobs (game-asset format, likely font/sprite/level data); large blobs up to 2.7 MB. This is NOT shader cache or PSO cache.

2. Canary's cache code (xenia-canary)

Mount/init:

  • xenia-canary/src/xenia/app/xenia_main.cc:612-652 — registers three HostPathDevice mounts.
  • xenia-canary/src/xenia/base/filesystem_posix.cc:76-97 — POSIX path resolution for storage_root via $XDG_DATA_HOME then $HOME/.local/share.
  • xenia-canary/src/xenia/vfs/devices/host_path_device.cc:31-48 — creates the host directory if missing (std::filesystem::create_directories). No wipe logic anywhere in canary source. Cache survives across boots.
  • xenia-canary/src/xenia/vfs/devices/host_path_entry.cc:78-98CreateEntryInternal calls create_directories(parent) + OpenFile("wb").

NT IO handlers:

  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-111NtCreateFile routes through FileSystem::OpenFile with is_directory = (create_options & FILE_DIRECTORY_FILE) != 0 and is_non_directory = (create_options & FILE_NON_DIRECTORY_FILE) != 0.

  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:474-513NtQueryFullAttributesFile: returns X_STATUS_SUCCESS (0) on ResolvePath hit; X_STATUS_NO_SUCH_FILE (0xC000000F) on miss.

  • xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:226-243NtSetInformationFile class 10 (XFileRenameInformation) correctly implemented:

    case XFileRenameInformation: {
      auto info = info_ptr.as<X_FILE_RENAME_INFORMATION*>();
      std::filesystem::path target_path =
          util::TranslateAnsiPath(kernel_memory(), &info->ansi_string);
      if (!IsValidPath(target_path.string(), false)) {
        return X_STATUS_OBJECT_NAME_INVALID;
      }
      if (!target_path.has_filename()) {
        return X_STATUS_INVALID_PARAMETER;
      }
      file->Rename(target_path);
      out_length = sizeof(*info);
      break;
    }
    

All file IO is synchronous on the host (XFile::WriteWriteSyncstd::fwrite).

3. Ours's cache code (xenia-rs current HEAD)

Mount/init:

  • xenia-rs/crates/xenia-kernel/src/state.rs:1235-1273resolve_default_cache_root:
    • Default: per-process tmpdir std::env::temp_dir()/xenia-rs-cache-{pid}-{counter} with wipe=true (AUDIT-038).
    • XENIA_CACHE_ROOT=<path> env: explicit path, no wipe.
    • XENIA_CACHE_PERSIST=1 (or "true" case-insensitive): $XDG_DATA_HOME/xenia-rs/cache or $HOME/.local/share/xenia-rs/cache, no wipe.
  • xenia-rs/crates/xenia-kernel/src/state.rs:499-510init_cache_root: conditionally wipes and recreates.
  • xenia-rs/crates/xenia-kernel/src/state.rs:519-554resolve_cache_path: case-insensitive prefix-match on cache:\, cache:/, cache0:\, cache0:/, cache1:\, cache1:/; backslash → forward slash normalization; ../. / empty filtered for traversal safety. Funnels all three (cache, cache0, cache1) to a single backing root — different from canary which has three separate HostPathDevice mounts.

NT IO handlers:

  • xenia-rs/crates/xenia-kernel/src/exports.rs:1023-1196open_cache_file. AUDIT-054 FILE_DIRECTORY_FILE-bit handling at lines 1041-1051. The is_dir_open decision uses (create_options & FILE_DIRECTORY_FILE) != 0 || host_path.is_dir() || host_path == state.cache_root.unwrap_or(host_path). The last term is a tautology when cache_root is None (returns host_path == host_path = true), but harmless when cache_root is Some(_).
  • xenia-rs/crates/xenia-kernel/src/exports.rs:1354-1373nt_create_file: reads create_options from sp + 0x54 (per AUDIT-054's shim_utils.h:49-50 citation). r5=obj_attrs, r10=create_disposition.
  • xenia-rs/crates/xenia-kernel/src/exports.rs:1375-1405nt_open_file: reads open_options from r7 (AUDIT-053's r8→r7 fix, Phase C+5).
  • xenia-rs/crates/xenia-kernel/src/exports.rs:1809-1909nt_set_information_file: validates min_length for class 10 at line 1822 (10 => 16), but the match body at 1847-1905 has no case-arm for class 10. The _ => (STATUS_SUCCESS, min_length) catch-all at line 1904 fires for class 10, returning success without performing the rename. This is bug #1 in the plan's headline finding.
  • xenia-rs/crates/xenia-kernel/src/exports.rs:1913-1990nt_query_full_attributes_file. Cache short-circuit at lines 1930-1957 uses std::fs::metadata(&hp) directly; returns STATUS_OBJECT_NAME_NOT_FOUND (0xC0000034) on miss. Different value than canary's 0xC000000F but treated equivalently by Sylpheed.

C+10 emitter extension:

  • xenia-rs/crates/xenia-kernel/src/state.rs:657-687call_export dispatches by name to object_attributes_raw_name (path.rs:109-115) for the 4 OBJECT_ATTRIBUTES*-taking exports: NtQueryFullAttributesFile (r3), NtOpenSymbolicLinkObject (r4), NtCreateFile (r5), NtOpenFile (r5). Calls emit_kernel_call_with_path (event_log.rs:202-229). Not wired for NtSetInformationFile (info buffer has the path, not OBJECT_ATTRIBUTES). Stage 1 of the plan extends this dispatch to class-10 rename targets.

Tests:

  • xenia-rs/crates/xenia-kernel/src/exports.rs:6830-6980 — 5 cache-specific tests: cache_create_write_read_roundtrip, cache_file_create_collision, cache_file_open_missing, cache_root_cleared_on_init, cache_resolve_strips_path_traversal. Plus 3 async/sync file tests.
  • No tests cover NtSetInformationFile class 10. Stage 1 of the plan adds this test.

4. Sylpheed's cache code (guest PPC binary)

Disassembly of the cache-fallback dispatcher chain (via xenia-rs disasm + sylpheed.db):

  • sub_82452DC0 (PC 0x82452DC00x82453024): high-level dispatcher.
    • 0x82452DEC: tries primary data via sub_82452068 + sub_82452200.
    • 0x82452E08: checks r3 == 0. On not-found, branches to cache fallback at 0x82452E1C.
    • 0x82452E1C: calls cache gate sub_8245B000.
    • 0x82452E28: if cache returns 0 (miss), branches to 0x82452E88 (skip cache).
    • 0x82452E30: cache hit → call callback sub_8245B078.
  • sub_8245B000 (cache gate): validates hash key, calls sub_8245AD00.
  • sub_8245AD00 (cache query): formats path via sub_82459130 (sprintf cache:\<H1>\<X>\<H2>); queries via sub_82612A78 (NtQueryFullAttributesFile wrapper). On miss (r3 == -1 at 0x8245AD90), branches to failure 0x8245ADFC. On hit, enters critical section + calls sub_8245B1F8 (cache reader).
  • sub_82459130 (path formatter): pure sprintf, no cache write.
  • sub_82612A78 (NtQueryFullAttributesFile wrapper): wraps the kernel import; converts STATUS to -1 on error.

Cache-write path was NOT located in sub_82452DC0's disassembly. The dispatcher agent reported no NtCreateFile in the miss branch. Likely the cache build fires from a different code path (probably inside sub_82452068/sub_82452200, the "primary data" handlers, which on first-time access compute the data AND write it to cache).

Sylpheed binary string references (all confirmed via .pe text-search):

  • cache:\access at 0x820B5794
  • cache:\recent at 0x820B5774
  • cache:\ignore at 0x820B5784
  • cache:\*.tmp at 0x820B5764
  • cache:\ at 0x820B57A4
  • %s%08x%08x.tmp at 0x820B57AC (format string for cache:\<H1><H2>.tmp flat journal)

Conclusion: Sylpheed manages its own cache content. The game has both the read path (sub_82452DC0 dispatcher) and the write path (currently unlocated, likely in primary-data handlers). The write path is what creates .tmp files and (we infer) calls NtSetInformationFile class 10 to rename them to hierarchical leaves.

5. Event-log evidence (Phase A jsonl)

From xenia-rs/audit-runs/phase-c10-NtQueryFullAttributesFile/ours.jsonl, tid=4's cache-build sequence on COLD cache:

idx event path
13 NtOpenFile cache:\ (probe mount root)
19 NtClose (close root probe)
28 NtCreateFile cache:\access → returns 0xC0000034 NOT_FOUND on cold
37 NtCreateFile cache:\ignore → returns 0xC0000034
46 NtCreateFile cache:\recent → returns 0xC0000034
64 NtCreateFile cache:\d4ea4615e46ee8ca.tmp (flat journal, FILE_CREATE)
69 NtSetInformationFile (class TBD; ours emitter doesn't capture info_class)
196 NtCreateFile cache:\d4ea4615 (DIR, post-AUDIT-054)
205 NtCreateFile cache:\d4ea4615\e (subdir)
214 NtOpenFile cache:\d4ea4615e46ee8ca.tmp (reopen flat journal)
286 NtCreateFile cache:\69d8e45ce534ffea.tmp (next flat journal)
325 NtOpenFile cache:\
409 NtCreateFile cache:\access (retry)
466 NtCreateFile cache:\69d8e45c (DIR)
475 NtCreateFile cache:\69d8e45c\e (subdir)

Statistics across the 50M window:

  • Ours emits 69 cache: events on tid=4, plus the main-chain divergent events on tid=1.
  • Ours emits 111 NtSetInformationFile calls; canary emits 0. Canary's cache is warm, so it skips cache-build entirely.

6. Persistence experiment

See persistent-experiment.md for the full table and per-boot cache-content delta. Headline result:

  • XENIA_CACHE_PERSIST=1 + 50M boot 1 (cold): digest instructions=50000003 imports=40485 swaps=1 draws=0. Differs from C+10 default-tmpdir baseline (50000002, 40465) by +1 instruction / +20 imports. Persistent path is slightly different from tmpdir.
  • XENIA_CACHE_PERSIST=1 + 50M boot 2 (warm): same digest. No cxx_throw regression at 50M.
  • On-disk cache after boot 2: 7 .tmp flat journals (grew on each boot from +400 B to +114 KB per file); access, ignore, recent as DIRECTORIES (bug #2); zero hierarchical leaf files (bug #1 prevents promotion).
  • Phase A diff vs canary baseline: matched-prefix on canary_tid=6 → ours_tid=1 main chain = 102404 (unchanged from C+10's default-tmpdir result). Divergence at the same NtQueryFullAttributesFile return-value (canary=0 SUCCESS, ours=0xC0000034 NOT_FOUND).

Persistence alone does not advance the matched-prefix. The .tmp files exist but the hierarchical leaf doesn't, so the leaf NtQuery still misses.

7. Discipline / methodology checks

  • --mute=true: not used in this session because no canary runs were required (the C+10 canary.jsonl was reused as-is for the matched-prefix comparison). Future re-baselines under the plan must use --mute=true.
  • Binary rename for stop hook: ours run via xrs-c10 (pre-existing from C+10). No background long-run; the experiments completed in <3 s wall-clock on the test host.
  • Reading-error #28 (oracle source supersedes spec): verified canary's NtSetInformationFile class-10 implementation by reading xboxkrnl_io_info.cc:226-243; did not assume from docs.
  • No source touched: this session was read-only-by-design. Plan-mode kept the tree clean; the only file-system side effects were Phase A event log output to audit-runs/cache-subsystem-plan/persist-warm-events.jsonl and this directory's deliverables.

8. Confidence ratings

claim source confidence
Bug #1: nt_set_information_file class 10 is a no-op stub direct source read of exports.rs:1809-1909 HIGH
Bug #1 prevents .tmp-to-leaf promotion indirect: ours's cache has .tmp + no leaf; canary's has leaf + no .tmp; canary properly implements class 10 HIGH (3 independent confirmations)
Bug #2: top-level cache files mis-created as directories direct on-disk observation post-experiment HIGH
Bug #2 root cause: is_dir_open discriminator misclassification source-read inference; not yet instrumented MEDIUM (Stage 2 instrumentation required)
Persistence alone doesn't advance matched-prefix experimentally verified via diff_events.py HIGH
AUDIT-053 cxx_throw regression not reproduced at 50M experimentally verified (2 sequential boots, same digest) MEDIUM (AUDIT-053's regression was at 500M; this window is too short to fully rule it out)
Sylpheed has its own cache-build path that already fires in ours event-log evidence (69 cache: events on tid=4) HIGH
The two engine bugs are the ONLY blockers inferred from the above; could be additional bugs uncovered post-Stage 1 MEDIUM (Stages are independently rollback-able; if a Stage doesn't advance matched-prefix, investigate further)

9. Open questions

See plan §"Open questions". Critical ones to resolve during implementation:

  1. Confirm via instrumentation that Sylpheed actually calls NtSetInformationFile class 10 for the .tmp→leaf rename. If it uses a different path (NtDeleteFile + NtCreateFile, or some custom flow), Stage 1's fix won't fully solve the problem.
  2. Confirm via instrumentation whether cache:\access/ignore/recent creates have FILE_DIRECTORY_FILE set in create_options, or whether ours's arg-position read is wrong.
  3. Validate whether access and recent manifest contents are deterministic byte-for-byte across engines, or whether they include host-allocator addresses / timestamps that need diff-tool canonicalization.

See plan §"Recommended approach" and §"Implementation stages". Three landable stages, ~150-200 LOC total, expected matched-prefix advance of hundreds-to- thousands of events post Stage 3.