Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
290 lines
15 KiB
Markdown
290 lines
15 KiB
Markdown
# Cache subsystem investigation — Phase C+11 planning (2026-05-14)
|
||
|
||
## Scope
|
||
|
||
This investigation informs the plan at [plan.md](plan.md). It was run as a
|
||
dedicated planning session after Phase C+10 escalated the cache divergence at
|
||
idx 102404. Findings are READ-ONLY observations; no source modified.
|
||
|
||
## 1. Canary's cache enumeration
|
||
|
||
Canary's mount: `~/.local/share/Xenia/cache/` (the POSIX `storage_root / "cache"`
|
||
convention; canary's `xenia-canary/src/xenia/app/xenia_main.cc:612-652` registers
|
||
three `HostPathDevice` mounts at `\\CACHE0`, `\\CACHE1`, `\\CACHE` aliased to
|
||
`cache0:`, `cache1:`, `cache:` symbolic links).
|
||
|
||
State at session start: 23 files / 4.8 MB across 16 hash buckets. Pre-populated
|
||
across many prior canary boots. Full enumeration in
|
||
[canary-cache-listing.csv](canary-cache-listing.csv).
|
||
|
||
Notable properties:
|
||
|
||
* **Zero `.tmp` files** — canary's cache holds only resolved hierarchical leaves
|
||
(`<H1>/<X>/<H2>` form) plus two top-level manifests (`access`, `recent`). The
|
||
`.tmp` flat-journal files Sylpheed uses for staging are renamed/removed before
|
||
they persist.
|
||
* Top-level `access` and `recent` are **files**, not directories. Layouts:
|
||
* `access`: 20×12-byte records `(hash1 u32 BE, hash2 u32 BE, refcount u32)`.
|
||
The 240 B file enumerates the 20 known cache entries (note: 23 files total
|
||
on disk but only 20 manifest entries — three of the on-disk files are not
|
||
indexed; possibly `recent`-only or orphans).
|
||
* `recent`: 20×8-byte records `(hash1 u32 BE, hash2 u32 BE)`. Recently-used
|
||
ordering of the same hash pairs.
|
||
* Cache content is **game-asset cache**: Shift-JIS Japanese localization text
|
||
(`d4ea4615/e/46ee8ca` — `[SYSTEM]/[LANGUAGE]/XC_LANGUAGE_*` table); `IPFB`-magic
|
||
binary blobs (game-asset format, likely font/sprite/level data); large blobs
|
||
up to 2.7 MB. This is NOT shader cache or PSO cache.
|
||
|
||
## 2. Canary's cache code (xenia-canary)
|
||
|
||
Mount/init:
|
||
|
||
* `xenia-canary/src/xenia/app/xenia_main.cc:612-652` — registers three
|
||
`HostPathDevice` mounts.
|
||
* `xenia-canary/src/xenia/base/filesystem_posix.cc:76-97` — POSIX path
|
||
resolution for `storage_root` via `$XDG_DATA_HOME` then `$HOME/.local/share`.
|
||
* `xenia-canary/src/xenia/vfs/devices/host_path_device.cc:31-48` — creates the
|
||
host directory if missing (`std::filesystem::create_directories`). **No wipe
|
||
logic anywhere in canary source.** Cache survives across boots.
|
||
* `xenia-canary/src/xenia/vfs/devices/host_path_entry.cc:78-98` —
|
||
`CreateEntryInternal` calls `create_directories(parent)` + `OpenFile("wb")`.
|
||
|
||
NT IO handlers:
|
||
|
||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-111` — `NtCreateFile`
|
||
routes through `FileSystem::OpenFile` with `is_directory =
|
||
(create_options & FILE_DIRECTORY_FILE) != 0` and
|
||
`is_non_directory = (create_options & FILE_NON_DIRECTORY_FILE) != 0`.
|
||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:474-513` —
|
||
`NtQueryFullAttributesFile`: returns `X_STATUS_SUCCESS` (0) on
|
||
`ResolvePath` hit; `X_STATUS_NO_SUCH_FILE` (0xC000000F) on miss.
|
||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:226-243` —
|
||
**`NtSetInformationFile` class 10 (XFileRenameInformation)** correctly
|
||
implemented:
|
||
|
||
```cpp
|
||
case XFileRenameInformation: {
|
||
auto info = info_ptr.as<X_FILE_RENAME_INFORMATION*>();
|
||
std::filesystem::path target_path =
|
||
util::TranslateAnsiPath(kernel_memory(), &info->ansi_string);
|
||
if (!IsValidPath(target_path.string(), false)) {
|
||
return X_STATUS_OBJECT_NAME_INVALID;
|
||
}
|
||
if (!target_path.has_filename()) {
|
||
return X_STATUS_INVALID_PARAMETER;
|
||
}
|
||
file->Rename(target_path);
|
||
out_length = sizeof(*info);
|
||
break;
|
||
}
|
||
```
|
||
|
||
All file IO is synchronous on the host (`XFile::Write` → `WriteSync` →
|
||
`std::fwrite`).
|
||
|
||
## 3. Ours's cache code (xenia-rs current HEAD)
|
||
|
||
Mount/init:
|
||
|
||
* `xenia-rs/crates/xenia-kernel/src/state.rs:1235-1273` — `resolve_default_cache_root`:
|
||
* Default: per-process tmpdir `std::env::temp_dir()/xenia-rs-cache-{pid}-{counter}`
|
||
with `wipe=true` (AUDIT-038).
|
||
* `XENIA_CACHE_ROOT=<path>` env: explicit path, no wipe.
|
||
* `XENIA_CACHE_PERSIST=1` (or "true" case-insensitive): `$XDG_DATA_HOME/xenia-rs/cache`
|
||
or `$HOME/.local/share/xenia-rs/cache`, no wipe.
|
||
* `xenia-rs/crates/xenia-kernel/src/state.rs:499-510` — `init_cache_root`:
|
||
conditionally wipes and recreates.
|
||
* `xenia-rs/crates/xenia-kernel/src/state.rs:519-554` — `resolve_cache_path`:
|
||
case-insensitive prefix-match on `cache:\`, `cache:/`, `cache0:\`, `cache0:/`,
|
||
`cache1:\`, `cache1:/`; backslash → forward slash normalization; `..`/`.` /
|
||
empty filtered for traversal safety. **Funnels all three (cache, cache0,
|
||
cache1) to a single backing root** — different from canary which has three
|
||
separate `HostPathDevice` mounts.
|
||
|
||
NT IO handlers:
|
||
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1023-1196` — `open_cache_file`.
|
||
AUDIT-054 `FILE_DIRECTORY_FILE`-bit handling at lines 1041-1051. The
|
||
`is_dir_open` decision uses `(create_options & FILE_DIRECTORY_FILE) != 0 ||
|
||
host_path.is_dir() || host_path == state.cache_root.unwrap_or(host_path)`. The
|
||
last term is a tautology when `cache_root` is `None` (returns `host_path ==
|
||
host_path` = true), but harmless when `cache_root` is `Some(_)`.
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1354-1373` — `nt_create_file`:
|
||
reads `create_options` from `sp + 0x54` (per AUDIT-054's `shim_utils.h:49-50`
|
||
citation). r5=obj_attrs, r10=create_disposition.
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1375-1405` — `nt_open_file`:
|
||
reads `open_options` from r7 (AUDIT-053's r8→r7 fix, Phase C+5).
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1809-1909` — `nt_set_information_file`:
|
||
validates `min_length` for class 10 at line 1822 (`10 => 16`), but **the match
|
||
body at 1847-1905 has no case-arm for class 10**. The `_ =>
|
||
(STATUS_SUCCESS, min_length)` catch-all at line 1904 fires for class 10,
|
||
returning success without performing the rename. **This is bug #1 in the
|
||
plan's headline finding.**
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1913-1990` —
|
||
`nt_query_full_attributes_file`. Cache short-circuit at lines 1930-1957
|
||
uses `std::fs::metadata(&hp)` directly; returns
|
||
`STATUS_OBJECT_NAME_NOT_FOUND` (0xC0000034) on miss. Different value than
|
||
canary's 0xC000000F but treated equivalently by Sylpheed.
|
||
|
||
C+10 emitter extension:
|
||
|
||
* `xenia-rs/crates/xenia-kernel/src/state.rs:657-687` — `call_export`
|
||
dispatches by name to `object_attributes_raw_name` (path.rs:109-115) for the 4
|
||
OBJECT_ATTRIBUTES*-taking exports: NtQueryFullAttributesFile (r3),
|
||
NtOpenSymbolicLinkObject (r4), NtCreateFile (r5), NtOpenFile (r5). Calls
|
||
`emit_kernel_call_with_path` (event_log.rs:202-229). Not wired for
|
||
NtSetInformationFile (info buffer has the path, not OBJECT_ATTRIBUTES).
|
||
**Stage 1 of the plan extends this dispatch to class-10 rename targets.**
|
||
|
||
Tests:
|
||
|
||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:6830-6980` — 5 cache-specific
|
||
tests: `cache_create_write_read_roundtrip`, `cache_file_create_collision`,
|
||
`cache_file_open_missing`, `cache_root_cleared_on_init`,
|
||
`cache_resolve_strips_path_traversal`. Plus 3 async/sync file tests.
|
||
* No tests cover `NtSetInformationFile` class 10. **Stage 1 of the plan adds
|
||
this test.**
|
||
|
||
## 4. Sylpheed's cache code (guest PPC binary)
|
||
|
||
Disassembly of the cache-fallback dispatcher chain (via xenia-rs disasm +
|
||
sylpheed.db):
|
||
|
||
* **`sub_82452DC0`** (PC 0x82452DC0–0x82453024): high-level dispatcher.
|
||
* 0x82452DEC: tries primary data via `sub_82452068` + `sub_82452200`.
|
||
* 0x82452E08: checks `r3 == 0`. On not-found, branches to cache fallback at
|
||
0x82452E1C.
|
||
* 0x82452E1C: calls cache gate `sub_8245B000`.
|
||
* 0x82452E28: if cache returns 0 (miss), branches to 0x82452E88 (skip cache).
|
||
* 0x82452E30: cache hit → call callback `sub_8245B078`.
|
||
* **`sub_8245B000`** (cache gate): validates hash key, calls `sub_8245AD00`.
|
||
* **`sub_8245AD00`** (cache query): formats path via `sub_82459130`
|
||
(sprintf `cache:\<H1>\<X>\<H2>`); queries via `sub_82612A78` (NtQueryFullAttributesFile
|
||
wrapper). On miss (`r3 == -1` at 0x8245AD90), branches to failure 0x8245ADFC.
|
||
On hit, enters critical section + calls `sub_8245B1F8` (cache reader).
|
||
* **`sub_82459130`** (path formatter): pure sprintf, no cache write.
|
||
* **`sub_82612A78`** (NtQueryFullAttributesFile wrapper): wraps the kernel
|
||
import; converts STATUS to -1 on error.
|
||
|
||
**Cache-write path was NOT located in sub_82452DC0's disassembly.** The dispatcher
|
||
agent reported no NtCreateFile in the miss branch. Likely the cache build fires
|
||
from a different code path (probably inside `sub_82452068`/`sub_82452200`, the
|
||
"primary data" handlers, which on first-time access compute the data AND write
|
||
it to cache).
|
||
|
||
Sylpheed binary string references (all confirmed via .pe text-search):
|
||
* `cache:\access` at 0x820B5794
|
||
* `cache:\recent` at 0x820B5774
|
||
* `cache:\ignore` at 0x820B5784
|
||
* `cache:\*.tmp` at 0x820B5764
|
||
* `cache:\` at 0x820B57A4
|
||
* `%s%08x%08x.tmp` at 0x820B57AC (format string for `cache:\<H1><H2>.tmp` flat
|
||
journal)
|
||
|
||
**Conclusion**: Sylpheed manages its own cache content. The game has both the
|
||
read path (sub_82452DC0 dispatcher) and the write path (currently unlocated,
|
||
likely in primary-data handlers). The write path is what creates `.tmp` files
|
||
and (we infer) calls `NtSetInformationFile` class 10 to rename them to
|
||
hierarchical leaves.
|
||
|
||
## 5. Event-log evidence (Phase A jsonl)
|
||
|
||
From `xenia-rs/audit-runs/phase-c10-NtQueryFullAttributesFile/ours.jsonl`,
|
||
tid=4's cache-build sequence on COLD cache:
|
||
|
||
| idx | event | path |
|
||
|---|---|---|
|
||
| 13 | NtOpenFile | `cache:\` (probe mount root) |
|
||
| 19 | NtClose | (close root probe) |
|
||
| 28 | NtCreateFile | `cache:\access` → returns 0xC0000034 NOT_FOUND on cold |
|
||
| 37 | NtCreateFile | `cache:\ignore` → returns 0xC0000034 |
|
||
| 46 | NtCreateFile | `cache:\recent` → returns 0xC0000034 |
|
||
| 64 | NtCreateFile | `cache:\d4ea4615e46ee8ca.tmp` (flat journal, FILE_CREATE) |
|
||
| 69 | NtSetInformationFile | (class TBD; ours emitter doesn't capture info_class) |
|
||
| 196 | NtCreateFile | `cache:\d4ea4615` (DIR, post-AUDIT-054) |
|
||
| 205 | NtCreateFile | `cache:\d4ea4615\e` (subdir) |
|
||
| 214 | NtOpenFile | `cache:\d4ea4615e46ee8ca.tmp` (reopen flat journal) |
|
||
| 286 | NtCreateFile | `cache:\69d8e45ce534ffea.tmp` (next flat journal) |
|
||
| 325 | NtOpenFile | `cache:\` |
|
||
| 409 | NtCreateFile | `cache:\access` (retry) |
|
||
| 466 | NtCreateFile | `cache:\69d8e45c` (DIR) |
|
||
| 475 | NtCreateFile | `cache:\69d8e45c\e` (subdir) |
|
||
|
||
Statistics across the 50M window:
|
||
* Ours emits 69 `cache:` events on tid=4, plus the main-chain divergent
|
||
events on tid=1.
|
||
* Ours emits **111** `NtSetInformationFile` calls; canary emits **0**.
|
||
Canary's cache is warm, so it skips cache-build entirely.
|
||
|
||
## 6. Persistence experiment
|
||
|
||
See [persistent-experiment.md](persistent-experiment.md) for the full table
|
||
and per-boot cache-content delta. Headline result:
|
||
|
||
* `XENIA_CACHE_PERSIST=1` + 50M boot 1 (cold): digest
|
||
`instructions=50000003 imports=40485 swaps=1 draws=0`. Differs from C+10
|
||
default-tmpdir baseline (`50000002`, `40465`) by +1 instruction / +20
|
||
imports. Persistent path is slightly different from tmpdir.
|
||
* `XENIA_CACHE_PERSIST=1` + 50M boot 2 (warm): same digest. No cxx_throw
|
||
regression at 50M.
|
||
* On-disk cache after boot 2: 7 `.tmp` flat journals (grew on each boot from
|
||
+400 B to +114 KB per file); `access`, `ignore`, `recent` as DIRECTORIES (bug
|
||
#2); zero hierarchical leaf files (bug #1 prevents promotion).
|
||
* Phase A diff vs canary baseline: matched-prefix on `canary_tid=6 → ours_tid=1`
|
||
main chain = **102404** (unchanged from C+10's default-tmpdir result). Divergence
|
||
at the same `NtQueryFullAttributesFile` return-value (canary=0 SUCCESS,
|
||
ours=0xC0000034 NOT_FOUND).
|
||
|
||
**Persistence alone does not advance the matched-prefix.** The `.tmp` files
|
||
exist but the hierarchical leaf doesn't, so the leaf NtQuery still misses.
|
||
|
||
## 7. Discipline / methodology checks
|
||
|
||
* **`--mute=true`**: not used in this session because no canary runs were
|
||
required (the C+10 canary.jsonl was reused as-is for the matched-prefix
|
||
comparison). Future re-baselines under the plan must use `--mute=true`.
|
||
* **Binary rename for stop hook**: ours run via `xrs-c10` (pre-existing from
|
||
C+10). No background long-run; the experiments completed in <3 s wall-clock
|
||
on the test host.
|
||
* **Reading-error #28** (oracle source supersedes spec): verified canary's
|
||
`NtSetInformationFile` class-10 implementation by reading
|
||
`xboxkrnl_io_info.cc:226-243`; did not assume from docs.
|
||
* **No source touched**: this session was read-only-by-design. Plan-mode kept
|
||
the tree clean; the only file-system side effects were Phase A event log
|
||
output to `audit-runs/cache-subsystem-plan/persist-warm-events.jsonl` and
|
||
this directory's deliverables.
|
||
|
||
## 8. Confidence ratings
|
||
|
||
| claim | source | confidence |
|
||
|---|---|---|
|
||
| Bug #1: `nt_set_information_file` class 10 is a no-op stub | direct source read of [exports.rs:1809-1909](xenia-rs/crates/xenia-kernel/src/exports.rs#L1809-L1909) | HIGH |
|
||
| Bug #1 prevents .tmp-to-leaf promotion | indirect: ours's cache has .tmp + no leaf; canary's has leaf + no .tmp; canary properly implements class 10 | HIGH (3 independent confirmations) |
|
||
| Bug #2: top-level cache files mis-created as directories | direct on-disk observation post-experiment | HIGH |
|
||
| Bug #2 root cause: `is_dir_open` discriminator misclassification | source-read inference; not yet instrumented | MEDIUM (Stage 2 instrumentation required) |
|
||
| Persistence alone doesn't advance matched-prefix | experimentally verified via diff_events.py | HIGH |
|
||
| AUDIT-053 cxx_throw regression not reproduced at 50M | experimentally verified (2 sequential boots, same digest) | MEDIUM (AUDIT-053's regression was at 500M; this window is too short to fully rule it out) |
|
||
| Sylpheed has its own cache-build path that already fires in ours | event-log evidence (69 cache: events on tid=4) | HIGH |
|
||
| The two engine bugs are the ONLY blockers | inferred from the above; could be additional bugs uncovered post-Stage 1 | MEDIUM (Stages are independently rollback-able; if a Stage doesn't advance matched-prefix, investigate further) |
|
||
|
||
## 9. Open questions
|
||
|
||
See plan §"Open questions". Critical ones to resolve during implementation:
|
||
|
||
1. Confirm via instrumentation that Sylpheed actually calls
|
||
`NtSetInformationFile` class 10 for the .tmp→leaf rename. If it uses a
|
||
different path (NtDeleteFile + NtCreateFile, or some custom flow),
|
||
Stage 1's fix won't fully solve the problem.
|
||
2. Confirm via instrumentation whether `cache:\access`/`ignore`/`recent`
|
||
creates have `FILE_DIRECTORY_FILE` set in `create_options`, or whether
|
||
ours's arg-position read is wrong.
|
||
3. Validate whether `access` and `recent` manifest contents are deterministic
|
||
byte-for-byte across engines, or whether they include host-allocator
|
||
addresses / timestamps that need diff-tool canonicalization.
|
||
|
||
## 10. Recommended next session
|
||
|
||
See plan §"Recommended approach" and §"Implementation stages". Three landable
|
||
stages, ~150-200 LOC total, expected matched-prefix advance of hundreds-to-
|
||
thousands of events post Stage 3.
|