Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
9.5 KiB
Phase C+10 — NtQueryFullAttributesFile — Investigation
Phase 1: Emitter extension (LANDED)
Problem
C+9 left the divergence with no resolved path string:
canary[6][102404] kernel.return NtQueryFullAttributesFile return_value=0
ours [1][102404] kernel.return NtQueryFullAttributesFile return_value=0xC0000034
payload.args and payload.args_resolved were both empty objects.
We had no way to identify WHICH file the engine was querying.
Shape of the fix
Schema v1 already declares args_resolved as a free-form object
attached to kernel.call (schema-v1.md:108-117), and the existing
example explicitly shows {"path":"..."}. The emitter just wasn't
populating it. Extension is pure schema-v1 compliance, no version
bump.
Ours-side (event_log.rs / path.rs / state.rs)
-
Added
event_log::emit_kernel_call_with_path(tid, cycle, name, Option<&str>)— same byte format asemit_kernel_call, but whenpathisSome(non_empty)emitsargs_resolved:{"path":"..."}. WhenNoneor empty, degrades to the existingargs_resolved:{}form so unrelated exports' output is byte-identical to pre-extension. -
Added
path::object_attributes_raw_name(mem, ptr) -> Option<String>— returns the RAW path string (trimmed of whitespace, NO prefix-strip / no case-fold) so the diff surfaces upstream prefix-form differences instead of masking them via normalization. Pre-existingobject_attributes_to_vfs_path(which DOES normalize) is kept as-is for VFS lookup callers; emitter uses the new raw helper. -
state.rs::call_export, inside thephase_a_onguarded block: newmatch nameresolves OBJECT_ATTRIBUTES* from the right gpr position. Argument positions verified against canary'sxboxkrnl/xboxkrnl_io.ccsignatures:NtQueryFullAttributesFile→ r3 = obj_attrsNtOpenSymbolicLinkObject→ r4 = obj_attrsNtCreateFile,NtOpenFile→ r5 = obj_attrs Then callsemit_kernel_call_with_path(..., resolved.as_deref())instead ofemit_kernel_call(...). All other exports fall through toNoneand the legacy form.
Canary-side (event_log.h / event_log.cc / util/shim_utils.h)
event_log.h: declaredEmitKernelCallWithPath(name, path).event_log.cc: implemented same as ours (degrades to legacy form for empty path).event_log.cc::phase_a_bridge::EmitImportAndCallWithCtx(module, ord, name, ppc_context)— new bridge function. PPCContext is passed asvoid*to keep the header transitive include footprint small (the bridge cc reinterprets to PPCContext* internally). Inside the bridge, helperReadObjectAttributesRawName(ptr)reads the X_OBJECT_ATTRIBUTES.name_ptr, then the X_ANSI_STRING bytes directly out of guest memory (no util::TranslateAnsiPath normalization). Trims whitespace + trailing NULs to match ours's semantics byte-for-byte.util/shim_utils.h: both export trampolines (X::Trampoline / Y::Trampoline) switched thephase_a_bridge::EmitImportAndCallcall tophase_a_bridge::EmitImportAndCallWithCtx, passing the existingppc_contextargument that's already in scope. The legacyEmitImportAndCallstays declared and defined for any future callers that don't have a PPCContext.
Verification
- Build both engines clean.
- Determinism 3x: digest md5 =
b8fa0e0460359a4f660adb7605e053de(identical to C+9 baseline — extension is cvar-OFF zero-cost). - Phase A emitter determinism 2x: det-fields md5 =
7489e90e…byte identical. (Different from C+9's0b299c37…because the path field IS in the deterministic signature — but stable across runs.)
Phase 2: Re-run + capture path string
After the extension, both engines emit the path at
kernel.call.args_resolved.path:
canary[6][102403] NtQueryFullAttributesFile path = "cache:\d4ea4615\e\46ee8ca"
ours [1][102403] NtQueryFullAttributesFile path = "cache:\d4ea4615\e\46ee8ca"
Both engines query the same path. No upstream divergence — the ANSI_STRING content matches byte-for-byte.
Phase 3: Why does ours say NOT_FOUND?
Trace through ours's nt_query_full_attributes_file
exports.rs:1913-1990:
- Read OBJECT_ATTRIBUTES → path =
"cache:/d4ea4615/e/46ee8ca"(afternormalize_path). state.resolve_cache_path(&path)returnsSome(<temp_dir>/xenia-rs-cache-<pid>-0/d4ea4615/e/46ee8ca).std::fs::metadata(host_path)returnsErr(NotFound).- Return
STATUS_OBJECT_NAME_NOT_FOUND(0xC0000034).
The host path doesn't exist because ours's init_cache_root
(state.rs:499-510) clears the cache directory on every boot
(AUDIT-038 line: per-process tmpdir + full wipe so two consecutive
runs see byte-identical initial state).
Why does canary's NOT fail?
xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:474-513:
- Read OBJECT_ATTRIBUTES → target_path via TranslateAnsiPath.
kernel_state()->file_system()->ResolvePath(target_path).- If
entryfound, populate file_info, returnX_STATUS_SUCCESS. - Else return
X_STATUS_NO_SUCH_FILE(0xC0000035).
Canary returns 0 → entry was found. Canary's cache mount is at
/home/fabi/.local/share/Xenia/cache/ (a persistent host directory
populated over prior boots).
Verification of canary's cache state
$ ls /home/fabi/.local/share/Xenia/cache/d4ea4615/e/
-rw-rw-r-- 1 fabi fabi 400 May 11 21:01 46ee8ca
Single 400-byte file. Total cache: 23 files, ~5 MB across 16 distinct top-level hash directories.
Sibling-cache observations
ours.jsonl shows the SAME NtQueryFullAttributesFile fires for
multiple cache paths within the 50M window — all returning
0xC0000034. Example: idx 103810 queries
cache:\69d8e45c\8\3421153. So the divergence is not a single
missing file but a class of 16+ missing hashes.
Phase 4: Classification + scope decision
Per the plan, the classes are:
- (A) Missing file — a single plant fixes it (small).
- (B) Path-normalization bug — string operation (small).
- (C) VFS mount missing — add the mount (small-medium).
- (D) Subsystem-required — STFS or similar — ESCALATE.
- (E) Upstream divergence — walk back.
This is NOT (B) — both engines normalize identically (verified by matching args_resolved.path).
This is NOT (E) — upstream is bit-identical for 102,403 events.
This is NOT (A) for any single file — the game queries 16+ distinct cache hashes; planting one only postpones the divergence.
This is closest to a hybrid (C+D):
- (C)-ish: canary's cache MOUNT resolves to a populated host dir; ours's mount resolves to a wiped tmp dir.
- (D)-ish: canary's cache is populated because it ran the game
before and the game built the cache. To match canary's state
on a fresh boot, we either:
- implement the game's cache-build logic (subsystem),
- copy canary's pre-built cache (oracle state — AUDIT-038 violation),
- or accept that ours runs cold and the divergence is a fundamental cold-vs-warm asymmetry.
AUDIT-053 cross-check (warm-start regression risk)
Per AUDIT-053 memo:
Phase 2 permanent fix REVERTED — warm-start regression from VFS layout aliasing:
open_cache_filetreats allNtCreateFileas files, butcache:\d4ea4615 disp=CREATEis meant as a DIRECTORY.
AUDIT-054 fixed that specific aliasing (FILE_DIRECTORY_FILE bit
threading). But there's still the AUDIT-053 secondary concern:
Sylpheed's cache:\<hash>.tmp journal-style writes append on each
boot — making naive persistence self-inconsistent across boots.
Whether AUDIT-054's fix fully unblocks persistence is NOT RE-VERIFIED in this session. Re-testing the AUDIT-053 regression under AUDIT-054's fix-in-tree is itself a follow-up.
Scope per user direction
User said:
If the fix requires major VFS work, STFS subsystem implementation, or cache-population infrastructure: ESCALATE.
Choices 2-4 from escalation.md all qualify as "cache-population
infrastructure":
- Choice 1 (single file plant) won't solve the problem (16+ hashes).
- Choice 2 (seed from canary) is oracle state + warm-start regression risk per AUDIT-053.
- Choice 3 (synthesize cache reads) is multi-export semantic-change.
- Choice 4 (build cache from scratch) is a full subsystem.
ESCALATION declared. Phase 1 emitter extension landed as the session's permanent infrastructure contribution.
Discipline check
- Reading-error #28 (canary source-of-truth): verified canary's
actual
NtQueryFullAttributesFile_entrybody (xboxkrnl_io.cc:474-513), did not assume. - Reading-error #23 (downstream regression): no fix landed, so no regression risk. Emitter extension is cvar-OFF zero-cost.
- Escalation discipline: triggered cleanly; explicit memo; contributing infrastructure (emitter path resolution) kept.
- Path encoding: ANSI_STRING raw bytes captured; both engines agree byte-for-byte; no Unicode issues for the queried path.
- AUDIT-054 deferred-item: not re-touched. Cache persistence
remains opt-in via
XENIA_CACHE_PERSIST=1. Default keeps the AUDIT-038 wipe behavior. --mute=true: every canary run.- Renamed binaries:
xrs-c10/xc-c10.exe.
Confidence
- Phase 1 emitter extension: HIGH — schema-compliant, additive, cvar-OFF zero-cost verified via determinism.
- Phase 4 classification: HIGH — three independent observations agree (canary cache populated, ours cache wiped, multiple hashes).
- Cascade prediction at 102,404: cache fix lands only the FIRST in a series — next cache hash will be the next divergence. Likely net delta of several hundred to a few thousand matched events per cache slot resolved, until a non-cache divergence appears.