handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
289
audit-runs/cache-subsystem-plan/investigation.md
Normal file
289
audit-runs/cache-subsystem-plan/investigation.md
Normal file
@@ -0,0 +1,289 @@
|
||||
# Cache subsystem investigation — Phase C+11 planning (2026-05-14)
|
||||
|
||||
## Scope
|
||||
|
||||
This investigation informs the plan at [plan.md](plan.md). It was run as a
|
||||
dedicated planning session after Phase C+10 escalated the cache divergence at
|
||||
idx 102404. Findings are READ-ONLY observations; no source modified.
|
||||
|
||||
## 1. Canary's cache enumeration
|
||||
|
||||
Canary's mount: `~/.local/share/Xenia/cache/` (the POSIX `storage_root / "cache"`
|
||||
convention; canary's `xenia-canary/src/xenia/app/xenia_main.cc:612-652` registers
|
||||
three `HostPathDevice` mounts at `\\CACHE0`, `\\CACHE1`, `\\CACHE` aliased to
|
||||
`cache0:`, `cache1:`, `cache:` symbolic links).
|
||||
|
||||
State at session start: 23 files / 4.8 MB across 16 hash buckets. Pre-populated
|
||||
across many prior canary boots. Full enumeration in
|
||||
[canary-cache-listing.csv](canary-cache-listing.csv).
|
||||
|
||||
Notable properties:
|
||||
|
||||
* **Zero `.tmp` files** — canary's cache holds only resolved hierarchical leaves
|
||||
(`<H1>/<X>/<H2>` form) plus two top-level manifests (`access`, `recent`). The
|
||||
`.tmp` flat-journal files Sylpheed uses for staging are renamed/removed before
|
||||
they persist.
|
||||
* Top-level `access` and `recent` are **files**, not directories. Layouts:
|
||||
* `access`: 20×12-byte records `(hash1 u32 BE, hash2 u32 BE, refcount u32)`.
|
||||
The 240 B file enumerates the 20 known cache entries (note: 23 files total
|
||||
on disk but only 20 manifest entries — three of the on-disk files are not
|
||||
indexed; possibly `recent`-only or orphans).
|
||||
* `recent`: 20×8-byte records `(hash1 u32 BE, hash2 u32 BE)`. Recently-used
|
||||
ordering of the same hash pairs.
|
||||
* Cache content is **game-asset cache**: Shift-JIS Japanese localization text
|
||||
(`d4ea4615/e/46ee8ca` — `[SYSTEM]/[LANGUAGE]/XC_LANGUAGE_*` table); `IPFB`-magic
|
||||
binary blobs (game-asset format, likely font/sprite/level data); large blobs
|
||||
up to 2.7 MB. This is NOT shader cache or PSO cache.
|
||||
|
||||
## 2. Canary's cache code (xenia-canary)
|
||||
|
||||
Mount/init:
|
||||
|
||||
* `xenia-canary/src/xenia/app/xenia_main.cc:612-652` — registers three
|
||||
`HostPathDevice` mounts.
|
||||
* `xenia-canary/src/xenia/base/filesystem_posix.cc:76-97` — POSIX path
|
||||
resolution for `storage_root` via `$XDG_DATA_HOME` then `$HOME/.local/share`.
|
||||
* `xenia-canary/src/xenia/vfs/devices/host_path_device.cc:31-48` — creates the
|
||||
host directory if missing (`std::filesystem::create_directories`). **No wipe
|
||||
logic anywhere in canary source.** Cache survives across boots.
|
||||
* `xenia-canary/src/xenia/vfs/devices/host_path_entry.cc:78-98` —
|
||||
`CreateEntryInternal` calls `create_directories(parent)` + `OpenFile("wb")`.
|
||||
|
||||
NT IO handlers:
|
||||
|
||||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:39-111` — `NtCreateFile`
|
||||
routes through `FileSystem::OpenFile` with `is_directory =
|
||||
(create_options & FILE_DIRECTORY_FILE) != 0` and
|
||||
`is_non_directory = (create_options & FILE_NON_DIRECTORY_FILE) != 0`.
|
||||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:474-513` —
|
||||
`NtQueryFullAttributesFile`: returns `X_STATUS_SUCCESS` (0) on
|
||||
`ResolvePath` hit; `X_STATUS_NO_SUCH_FILE` (0xC000000F) on miss.
|
||||
* `xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io_info.cc:226-243` —
|
||||
**`NtSetInformationFile` class 10 (XFileRenameInformation)** correctly
|
||||
implemented:
|
||||
|
||||
```cpp
|
||||
case XFileRenameInformation: {
|
||||
auto info = info_ptr.as<X_FILE_RENAME_INFORMATION*>();
|
||||
std::filesystem::path target_path =
|
||||
util::TranslateAnsiPath(kernel_memory(), &info->ansi_string);
|
||||
if (!IsValidPath(target_path.string(), false)) {
|
||||
return X_STATUS_OBJECT_NAME_INVALID;
|
||||
}
|
||||
if (!target_path.has_filename()) {
|
||||
return X_STATUS_INVALID_PARAMETER;
|
||||
}
|
||||
file->Rename(target_path);
|
||||
out_length = sizeof(*info);
|
||||
break;
|
||||
}
|
||||
```
|
||||
|
||||
All file IO is synchronous on the host (`XFile::Write` → `WriteSync` →
|
||||
`std::fwrite`).
|
||||
|
||||
## 3. Ours's cache code (xenia-rs current HEAD)
|
||||
|
||||
Mount/init:
|
||||
|
||||
* `xenia-rs/crates/xenia-kernel/src/state.rs:1235-1273` — `resolve_default_cache_root`:
|
||||
* Default: per-process tmpdir `std::env::temp_dir()/xenia-rs-cache-{pid}-{counter}`
|
||||
with `wipe=true` (AUDIT-038).
|
||||
* `XENIA_CACHE_ROOT=<path>` env: explicit path, no wipe.
|
||||
* `XENIA_CACHE_PERSIST=1` (or "true" case-insensitive): `$XDG_DATA_HOME/xenia-rs/cache`
|
||||
or `$HOME/.local/share/xenia-rs/cache`, no wipe.
|
||||
* `xenia-rs/crates/xenia-kernel/src/state.rs:499-510` — `init_cache_root`:
|
||||
conditionally wipes and recreates.
|
||||
* `xenia-rs/crates/xenia-kernel/src/state.rs:519-554` — `resolve_cache_path`:
|
||||
case-insensitive prefix-match on `cache:\`, `cache:/`, `cache0:\`, `cache0:/`,
|
||||
`cache1:\`, `cache1:/`; backslash → forward slash normalization; `..`/`.` /
|
||||
empty filtered for traversal safety. **Funnels all three (cache, cache0,
|
||||
cache1) to a single backing root** — different from canary which has three
|
||||
separate `HostPathDevice` mounts.
|
||||
|
||||
NT IO handlers:
|
||||
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1023-1196` — `open_cache_file`.
|
||||
AUDIT-054 `FILE_DIRECTORY_FILE`-bit handling at lines 1041-1051. The
|
||||
`is_dir_open` decision uses `(create_options & FILE_DIRECTORY_FILE) != 0 ||
|
||||
host_path.is_dir() || host_path == state.cache_root.unwrap_or(host_path)`. The
|
||||
last term is a tautology when `cache_root` is `None` (returns `host_path ==
|
||||
host_path` = true), but harmless when `cache_root` is `Some(_)`.
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1354-1373` — `nt_create_file`:
|
||||
reads `create_options` from `sp + 0x54` (per AUDIT-054's `shim_utils.h:49-50`
|
||||
citation). r5=obj_attrs, r10=create_disposition.
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1375-1405` — `nt_open_file`:
|
||||
reads `open_options` from r7 (AUDIT-053's r8→r7 fix, Phase C+5).
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1809-1909` — `nt_set_information_file`:
|
||||
validates `min_length` for class 10 at line 1822 (`10 => 16`), but **the match
|
||||
body at 1847-1905 has no case-arm for class 10**. The `_ =>
|
||||
(STATUS_SUCCESS, min_length)` catch-all at line 1904 fires for class 10,
|
||||
returning success without performing the rename. **This is bug #1 in the
|
||||
plan's headline finding.**
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:1913-1990` —
|
||||
`nt_query_full_attributes_file`. Cache short-circuit at lines 1930-1957
|
||||
uses `std::fs::metadata(&hp)` directly; returns
|
||||
`STATUS_OBJECT_NAME_NOT_FOUND` (0xC0000034) on miss. Different value than
|
||||
canary's 0xC000000F but treated equivalently by Sylpheed.
|
||||
|
||||
C+10 emitter extension:
|
||||
|
||||
* `xenia-rs/crates/xenia-kernel/src/state.rs:657-687` — `call_export`
|
||||
dispatches by name to `object_attributes_raw_name` (path.rs:109-115) for the 4
|
||||
OBJECT_ATTRIBUTES*-taking exports: NtQueryFullAttributesFile (r3),
|
||||
NtOpenSymbolicLinkObject (r4), NtCreateFile (r5), NtOpenFile (r5). Calls
|
||||
`emit_kernel_call_with_path` (event_log.rs:202-229). Not wired for
|
||||
NtSetInformationFile (info buffer has the path, not OBJECT_ATTRIBUTES).
|
||||
**Stage 1 of the plan extends this dispatch to class-10 rename targets.**
|
||||
|
||||
Tests:
|
||||
|
||||
* `xenia-rs/crates/xenia-kernel/src/exports.rs:6830-6980` — 5 cache-specific
|
||||
tests: `cache_create_write_read_roundtrip`, `cache_file_create_collision`,
|
||||
`cache_file_open_missing`, `cache_root_cleared_on_init`,
|
||||
`cache_resolve_strips_path_traversal`. Plus 3 async/sync file tests.
|
||||
* No tests cover `NtSetInformationFile` class 10. **Stage 1 of the plan adds
|
||||
this test.**
|
||||
|
||||
## 4. Sylpheed's cache code (guest PPC binary)
|
||||
|
||||
Disassembly of the cache-fallback dispatcher chain (via xenia-rs disasm +
|
||||
sylpheed.db):
|
||||
|
||||
* **`sub_82452DC0`** (PC 0x82452DC0–0x82453024): high-level dispatcher.
|
||||
* 0x82452DEC: tries primary data via `sub_82452068` + `sub_82452200`.
|
||||
* 0x82452E08: checks `r3 == 0`. On not-found, branches to cache fallback at
|
||||
0x82452E1C.
|
||||
* 0x82452E1C: calls cache gate `sub_8245B000`.
|
||||
* 0x82452E28: if cache returns 0 (miss), branches to 0x82452E88 (skip cache).
|
||||
* 0x82452E30: cache hit → call callback `sub_8245B078`.
|
||||
* **`sub_8245B000`** (cache gate): validates hash key, calls `sub_8245AD00`.
|
||||
* **`sub_8245AD00`** (cache query): formats path via `sub_82459130`
|
||||
(sprintf `cache:\<H1>\<X>\<H2>`); queries via `sub_82612A78` (NtQueryFullAttributesFile
|
||||
wrapper). On miss (`r3 == -1` at 0x8245AD90), branches to failure 0x8245ADFC.
|
||||
On hit, enters critical section + calls `sub_8245B1F8` (cache reader).
|
||||
* **`sub_82459130`** (path formatter): pure sprintf, no cache write.
|
||||
* **`sub_82612A78`** (NtQueryFullAttributesFile wrapper): wraps the kernel
|
||||
import; converts STATUS to -1 on error.
|
||||
|
||||
**Cache-write path was NOT located in sub_82452DC0's disassembly.** The dispatcher
|
||||
agent reported no NtCreateFile in the miss branch. Likely the cache build fires
|
||||
from a different code path (probably inside `sub_82452068`/`sub_82452200`, the
|
||||
"primary data" handlers, which on first-time access compute the data AND write
|
||||
it to cache).
|
||||
|
||||
Sylpheed binary string references (all confirmed via .pe text-search):
|
||||
* `cache:\access` at 0x820B5794
|
||||
* `cache:\recent` at 0x820B5774
|
||||
* `cache:\ignore` at 0x820B5784
|
||||
* `cache:\*.tmp` at 0x820B5764
|
||||
* `cache:\` at 0x820B57A4
|
||||
* `%s%08x%08x.tmp` at 0x820B57AC (format string for `cache:\<H1><H2>.tmp` flat
|
||||
journal)
|
||||
|
||||
**Conclusion**: Sylpheed manages its own cache content. The game has both the
|
||||
read path (sub_82452DC0 dispatcher) and the write path (currently unlocated,
|
||||
likely in primary-data handlers). The write path is what creates `.tmp` files
|
||||
and (we infer) calls `NtSetInformationFile` class 10 to rename them to
|
||||
hierarchical leaves.
|
||||
|
||||
## 5. Event-log evidence (Phase A jsonl)
|
||||
|
||||
From `xenia-rs/audit-runs/phase-c10-NtQueryFullAttributesFile/ours.jsonl`,
|
||||
tid=4's cache-build sequence on COLD cache:
|
||||
|
||||
| idx | event | path |
|
||||
|---|---|---|
|
||||
| 13 | NtOpenFile | `cache:\` (probe mount root) |
|
||||
| 19 | NtClose | (close root probe) |
|
||||
| 28 | NtCreateFile | `cache:\access` → returns 0xC0000034 NOT_FOUND on cold |
|
||||
| 37 | NtCreateFile | `cache:\ignore` → returns 0xC0000034 |
|
||||
| 46 | NtCreateFile | `cache:\recent` → returns 0xC0000034 |
|
||||
| 64 | NtCreateFile | `cache:\d4ea4615e46ee8ca.tmp` (flat journal, FILE_CREATE) |
|
||||
| 69 | NtSetInformationFile | (class TBD; ours emitter doesn't capture info_class) |
|
||||
| 196 | NtCreateFile | `cache:\d4ea4615` (DIR, post-AUDIT-054) |
|
||||
| 205 | NtCreateFile | `cache:\d4ea4615\e` (subdir) |
|
||||
| 214 | NtOpenFile | `cache:\d4ea4615e46ee8ca.tmp` (reopen flat journal) |
|
||||
| 286 | NtCreateFile | `cache:\69d8e45ce534ffea.tmp` (next flat journal) |
|
||||
| 325 | NtOpenFile | `cache:\` |
|
||||
| 409 | NtCreateFile | `cache:\access` (retry) |
|
||||
| 466 | NtCreateFile | `cache:\69d8e45c` (DIR) |
|
||||
| 475 | NtCreateFile | `cache:\69d8e45c\e` (subdir) |
|
||||
|
||||
Statistics across the 50M window:
|
||||
* Ours emits 69 `cache:` events on tid=4, plus the main-chain divergent
|
||||
events on tid=1.
|
||||
* Ours emits **111** `NtSetInformationFile` calls; canary emits **0**.
|
||||
Canary's cache is warm, so it skips cache-build entirely.
|
||||
|
||||
## 6. Persistence experiment
|
||||
|
||||
See [persistent-experiment.md](persistent-experiment.md) for the full table
|
||||
and per-boot cache-content delta. Headline result:
|
||||
|
||||
* `XENIA_CACHE_PERSIST=1` + 50M boot 1 (cold): digest
|
||||
`instructions=50000003 imports=40485 swaps=1 draws=0`. Differs from C+10
|
||||
default-tmpdir baseline (`50000002`, `40465`) by +1 instruction / +20
|
||||
imports. Persistent path is slightly different from tmpdir.
|
||||
* `XENIA_CACHE_PERSIST=1` + 50M boot 2 (warm): same digest. No cxx_throw
|
||||
regression at 50M.
|
||||
* On-disk cache after boot 2: 7 `.tmp` flat journals (grew on each boot from
|
||||
+400 B to +114 KB per file); `access`, `ignore`, `recent` as DIRECTORIES (bug
|
||||
#2); zero hierarchical leaf files (bug #1 prevents promotion).
|
||||
* Phase A diff vs canary baseline: matched-prefix on `canary_tid=6 → ours_tid=1`
|
||||
main chain = **102404** (unchanged from C+10's default-tmpdir result). Divergence
|
||||
at the same `NtQueryFullAttributesFile` return-value (canary=0 SUCCESS,
|
||||
ours=0xC0000034 NOT_FOUND).
|
||||
|
||||
**Persistence alone does not advance the matched-prefix.** The `.tmp` files
|
||||
exist but the hierarchical leaf doesn't, so the leaf NtQuery still misses.
|
||||
|
||||
## 7. Discipline / methodology checks
|
||||
|
||||
* **`--mute=true`**: not used in this session because no canary runs were
|
||||
required (the C+10 canary.jsonl was reused as-is for the matched-prefix
|
||||
comparison). Future re-baselines under the plan must use `--mute=true`.
|
||||
* **Binary rename for stop hook**: ours run via `xrs-c10` (pre-existing from
|
||||
C+10). No background long-run; the experiments completed in <3 s wall-clock
|
||||
on the test host.
|
||||
* **Reading-error #28** (oracle source supersedes spec): verified canary's
|
||||
`NtSetInformationFile` class-10 implementation by reading
|
||||
`xboxkrnl_io_info.cc:226-243`; did not assume from docs.
|
||||
* **No source touched**: this session was read-only-by-design. Plan-mode kept
|
||||
the tree clean; the only file-system side effects were Phase A event log
|
||||
output to `audit-runs/cache-subsystem-plan/persist-warm-events.jsonl` and
|
||||
this directory's deliverables.
|
||||
|
||||
## 8. Confidence ratings
|
||||
|
||||
| claim | source | confidence |
|
||||
|---|---|---|
|
||||
| Bug #1: `nt_set_information_file` class 10 is a no-op stub | direct source read of [exports.rs:1809-1909](xenia-rs/crates/xenia-kernel/src/exports.rs#L1809-L1909) | HIGH |
|
||||
| Bug #1 prevents .tmp-to-leaf promotion | indirect: ours's cache has .tmp + no leaf; canary's has leaf + no .tmp; canary properly implements class 10 | HIGH (3 independent confirmations) |
|
||||
| Bug #2: top-level cache files mis-created as directories | direct on-disk observation post-experiment | HIGH |
|
||||
| Bug #2 root cause: `is_dir_open` discriminator misclassification | source-read inference; not yet instrumented | MEDIUM (Stage 2 instrumentation required) |
|
||||
| Persistence alone doesn't advance matched-prefix | experimentally verified via diff_events.py | HIGH |
|
||||
| AUDIT-053 cxx_throw regression not reproduced at 50M | experimentally verified (2 sequential boots, same digest) | MEDIUM (AUDIT-053's regression was at 500M; this window is too short to fully rule it out) |
|
||||
| Sylpheed has its own cache-build path that already fires in ours | event-log evidence (69 cache: events on tid=4) | HIGH |
|
||||
| The two engine bugs are the ONLY blockers | inferred from the above; could be additional bugs uncovered post-Stage 1 | MEDIUM (Stages are independently rollback-able; if a Stage doesn't advance matched-prefix, investigate further) |
|
||||
|
||||
## 9. Open questions
|
||||
|
||||
See plan §"Open questions". Critical ones to resolve during implementation:
|
||||
|
||||
1. Confirm via instrumentation that Sylpheed actually calls
|
||||
`NtSetInformationFile` class 10 for the .tmp→leaf rename. If it uses a
|
||||
different path (NtDeleteFile + NtCreateFile, or some custom flow),
|
||||
Stage 1's fix won't fully solve the problem.
|
||||
2. Confirm via instrumentation whether `cache:\access`/`ignore`/`recent`
|
||||
creates have `FILE_DIRECTORY_FILE` set in `create_options`, or whether
|
||||
ours's arg-position read is wrong.
|
||||
3. Validate whether `access` and `recent` manifest contents are deterministic
|
||||
byte-for-byte across engines, or whether they include host-allocator
|
||||
addresses / timestamps that need diff-tool canonicalization.
|
||||
|
||||
## 10. Recommended next session
|
||||
|
||||
See plan §"Recommended approach" and §"Implementation stages". Three landable
|
||||
stages, ~150-200 LOC total, expected matched-prefix advance of hundreds-to-
|
||||
thousands of events post Stage 3.
|
||||
Reference in New Issue
Block a user