handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
236
audit-runs/phase-c10-NtQueryFullAttributesFile/investigation.md
Normal file
236
audit-runs/phase-c10-NtQueryFullAttributesFile/investigation.md
Normal file
@@ -0,0 +1,236 @@
|
||||
# Phase C+10 — NtQueryFullAttributesFile — Investigation
|
||||
|
||||
## Phase 1: Emitter extension (LANDED)
|
||||
|
||||
### Problem
|
||||
|
||||
C+9 left the divergence with no resolved path string:
|
||||
|
||||
```
|
||||
canary[6][102404] kernel.return NtQueryFullAttributesFile return_value=0
|
||||
ours [1][102404] kernel.return NtQueryFullAttributesFile return_value=0xC0000034
|
||||
```
|
||||
|
||||
`payload.args` and `payload.args_resolved` were both empty objects.
|
||||
We had no way to identify WHICH file the engine was querying.
|
||||
|
||||
### Shape of the fix
|
||||
|
||||
Schema v1 already declares `args_resolved` as a free-form object
|
||||
attached to `kernel.call` (schema-v1.md:108-117), and the existing
|
||||
example explicitly shows `{"path":"..."}`. The emitter just wasn't
|
||||
populating it. Extension is pure schema-v1 compliance, no version
|
||||
bump.
|
||||
|
||||
#### Ours-side (event_log.rs / path.rs / state.rs)
|
||||
|
||||
- Added `event_log::emit_kernel_call_with_path(tid, cycle, name,
|
||||
Option<&str>)` — same byte format as `emit_kernel_call`, but when
|
||||
`path` is `Some(non_empty)` emits `args_resolved:{"path":"..."}`.
|
||||
When `None` or empty, degrades to the existing
|
||||
`args_resolved:{}` form so unrelated exports' output is
|
||||
byte-identical to pre-extension.
|
||||
|
||||
- Added `path::object_attributes_raw_name(mem, ptr) -> Option<String>`
|
||||
— returns the RAW path string (trimmed of whitespace, NO
|
||||
prefix-strip / no case-fold) so the diff surfaces upstream
|
||||
prefix-form differences instead of masking them via normalization.
|
||||
Pre-existing `object_attributes_to_vfs_path` (which DOES normalize)
|
||||
is kept as-is for VFS lookup callers; emitter uses the new raw
|
||||
helper.
|
||||
|
||||
- `state.rs::call_export`, inside the `phase_a_on` guarded block:
|
||||
new `match name` resolves OBJECT_ATTRIBUTES* from the right gpr
|
||||
position. Argument positions verified against canary's
|
||||
`xboxkrnl/xboxkrnl_io.cc` signatures:
|
||||
- `NtQueryFullAttributesFile` → r3 = obj_attrs
|
||||
- `NtOpenSymbolicLinkObject` → r4 = obj_attrs
|
||||
- `NtCreateFile`, `NtOpenFile` → r5 = obj_attrs
|
||||
Then calls `emit_kernel_call_with_path(..., resolved.as_deref())`
|
||||
instead of `emit_kernel_call(...)`. All other exports fall through
|
||||
to `None` and the legacy form.
|
||||
|
||||
#### Canary-side (event_log.h / event_log.cc / util/shim_utils.h)
|
||||
|
||||
- `event_log.h`: declared `EmitKernelCallWithPath(name, path)`.
|
||||
- `event_log.cc`: implemented same as ours (degrades to legacy form
|
||||
for empty path).
|
||||
- `event_log.cc::phase_a_bridge::EmitImportAndCallWithCtx(module,
|
||||
ord, name, ppc_context)` — new bridge function. PPCContext is
|
||||
passed as `void*` to keep the header transitive include footprint
|
||||
small (the bridge cc reinterprets to PPCContext* internally).
|
||||
Inside the bridge, helper `ReadObjectAttributesRawName(ptr)` reads
|
||||
the X_OBJECT_ATTRIBUTES.name_ptr, then the X_ANSI_STRING bytes
|
||||
directly out of guest memory (no util::TranslateAnsiPath
|
||||
normalization). Trims whitespace + trailing NULs to match ours's
|
||||
semantics byte-for-byte.
|
||||
- `util/shim_utils.h`: both export trampolines (X::Trampoline /
|
||||
Y::Trampoline) switched the `phase_a_bridge::EmitImportAndCall`
|
||||
call to `phase_a_bridge::EmitImportAndCallWithCtx`, passing the
|
||||
existing `ppc_context` argument that's already in scope. The
|
||||
legacy `EmitImportAndCall` stays declared and defined for any
|
||||
future callers that don't have a PPCContext.
|
||||
|
||||
### Verification
|
||||
|
||||
- Build both engines clean.
|
||||
- Determinism 3x: digest md5 = `b8fa0e0460359a4f660adb7605e053de`
|
||||
(identical to C+9 baseline — extension is cvar-OFF zero-cost).
|
||||
- Phase A emitter determinism 2x: det-fields md5 = `7489e90e…` byte
|
||||
identical. (Different from C+9's `0b299c37…` because the path
|
||||
field IS in the deterministic signature — but stable across runs.)
|
||||
|
||||
## Phase 2: Re-run + capture path string
|
||||
|
||||
After the extension, both engines emit the path at
|
||||
`kernel.call.args_resolved.path`:
|
||||
|
||||
```
|
||||
canary[6][102403] NtQueryFullAttributesFile path = "cache:\d4ea4615\e\46ee8ca"
|
||||
ours [1][102403] NtQueryFullAttributesFile path = "cache:\d4ea4615\e\46ee8ca"
|
||||
```
|
||||
|
||||
Both engines query the **same path**. No upstream divergence — the
|
||||
ANSI_STRING content matches byte-for-byte.
|
||||
|
||||
## Phase 3: Why does ours say NOT_FOUND?
|
||||
|
||||
### Trace through ours's `nt_query_full_attributes_file`
|
||||
|
||||
`exports.rs:1913-1990`:
|
||||
|
||||
1. Read OBJECT_ATTRIBUTES → path =
|
||||
`"cache:/d4ea4615/e/46ee8ca"` (after `normalize_path`).
|
||||
2. `state.resolve_cache_path(&path)` returns
|
||||
`Some(<temp_dir>/xenia-rs-cache-<pid>-0/d4ea4615/e/46ee8ca)`.
|
||||
3. `std::fs::metadata(host_path)` returns `Err(NotFound)`.
|
||||
4. Return `STATUS_OBJECT_NAME_NOT_FOUND` (`0xC0000034`).
|
||||
|
||||
The host path doesn't exist because ours's `init_cache_root`
|
||||
(`state.rs:499-510`) **clears** the cache directory on every boot
|
||||
(AUDIT-038 line: per-process tmpdir + full wipe so two consecutive
|
||||
runs see byte-identical initial state).
|
||||
|
||||
### Why does canary's NOT fail?
|
||||
|
||||
`xenia-canary/src/xenia/kernel/xboxkrnl/xboxkrnl_io.cc:474-513`:
|
||||
|
||||
1. Read OBJECT_ATTRIBUTES → target_path via TranslateAnsiPath.
|
||||
2. `kernel_state()->file_system()->ResolvePath(target_path)`.
|
||||
3. If `entry` found, populate file_info, return `X_STATUS_SUCCESS`.
|
||||
4. Else return `X_STATUS_NO_SUCH_FILE` (`0xC0000035`).
|
||||
|
||||
Canary returns 0 → entry was found. Canary's cache mount is at
|
||||
`/home/fabi/.local/share/Xenia/cache/` (a persistent host directory
|
||||
populated over prior boots).
|
||||
|
||||
### Verification of canary's cache state
|
||||
|
||||
```
|
||||
$ ls /home/fabi/.local/share/Xenia/cache/d4ea4615/e/
|
||||
-rw-rw-r-- 1 fabi fabi 400 May 11 21:01 46ee8ca
|
||||
```
|
||||
|
||||
Single 400-byte file. Total cache: 23 files, ~5 MB across 16
|
||||
distinct top-level hash directories.
|
||||
|
||||
### Sibling-cache observations
|
||||
|
||||
ours.jsonl shows the SAME `NtQueryFullAttributesFile` fires for
|
||||
multiple cache paths within the 50M window — all returning
|
||||
`0xC0000034`. Example: idx 103810 queries
|
||||
`cache:\69d8e45c\8\3421153`. So the divergence is not a single
|
||||
missing file but a class of 16+ missing hashes.
|
||||
|
||||
## Phase 4: Classification + scope decision
|
||||
|
||||
Per the plan, the classes are:
|
||||
|
||||
* **(A) Missing file** — a single plant fixes it (small).
|
||||
* **(B) Path-normalization bug** — string operation (small).
|
||||
* **(C) VFS mount missing** — add the mount (small-medium).
|
||||
* **(D) Subsystem-required** — STFS or similar — **ESCALATE**.
|
||||
* **(E) Upstream divergence** — walk back.
|
||||
|
||||
This is **NOT (B)** — both engines normalize identically (verified
|
||||
by matching args_resolved.path).
|
||||
|
||||
This is **NOT (E)** — upstream is bit-identical for 102,403 events.
|
||||
|
||||
This is **NOT (A)** for any single file — the game queries 16+
|
||||
distinct cache hashes; planting one only postpones the divergence.
|
||||
|
||||
This is **closest to a hybrid (C+D)**:
|
||||
|
||||
* **(C)-ish**: canary's cache MOUNT resolves to a populated host dir;
|
||||
ours's mount resolves to a wiped tmp dir.
|
||||
* **(D)-ish**: canary's cache is populated because it ran the game
|
||||
before and the game **built** the cache. To match canary's state
|
||||
on a fresh boot, we either:
|
||||
- implement the game's cache-build logic (subsystem),
|
||||
- copy canary's pre-built cache (oracle state — AUDIT-038
|
||||
violation),
|
||||
- or accept that ours runs cold and the divergence is a
|
||||
fundamental cold-vs-warm asymmetry.
|
||||
|
||||
### AUDIT-053 cross-check (warm-start regression risk)
|
||||
|
||||
Per AUDIT-053 memo:
|
||||
> Phase 2 permanent fix REVERTED — warm-start regression from VFS
|
||||
> layout aliasing: `open_cache_file` treats all `NtCreateFile` as
|
||||
> files, but `cache:\d4ea4615 disp=CREATE` is meant as a DIRECTORY.
|
||||
|
||||
AUDIT-054 fixed that specific aliasing (FILE_DIRECTORY_FILE bit
|
||||
threading). But there's still the AUDIT-053 secondary concern:
|
||||
Sylpheed's `cache:\<hash>.tmp` journal-style writes append on each
|
||||
boot — making naive persistence self-inconsistent across boots.
|
||||
|
||||
Whether AUDIT-054's fix fully unblocks persistence is **NOT
|
||||
RE-VERIFIED** in this session. Re-testing the AUDIT-053 regression
|
||||
under AUDIT-054's fix-in-tree is itself a follow-up.
|
||||
|
||||
### Scope per user direction
|
||||
|
||||
User said:
|
||||
> If the fix requires major VFS work, STFS subsystem
|
||||
> implementation, or cache-population infrastructure: ESCALATE.
|
||||
|
||||
Choices 2-4 from `escalation.md` all qualify as "cache-population
|
||||
infrastructure":
|
||||
* Choice 1 (single file plant) won't solve the problem (16+ hashes).
|
||||
* Choice 2 (seed from canary) is oracle state + warm-start regression
|
||||
risk per AUDIT-053.
|
||||
* Choice 3 (synthesize cache reads) is multi-export semantic-change.
|
||||
* Choice 4 (build cache from scratch) is a full subsystem.
|
||||
|
||||
**ESCALATION declared.** Phase 1 emitter extension landed as the
|
||||
session's permanent infrastructure contribution.
|
||||
|
||||
## Discipline check
|
||||
|
||||
* **Reading-error #28** (canary source-of-truth): verified canary's
|
||||
actual `NtQueryFullAttributesFile_entry` body
|
||||
(`xboxkrnl_io.cc:474-513`), did not assume.
|
||||
* **Reading-error #23** (downstream regression): no fix landed, so
|
||||
no regression risk. Emitter extension is cvar-OFF zero-cost.
|
||||
* **Escalation discipline**: triggered cleanly; explicit memo;
|
||||
contributing infrastructure (emitter path resolution) kept.
|
||||
* **Path encoding**: ANSI_STRING raw bytes captured; both engines
|
||||
agree byte-for-byte; no Unicode issues for the queried path.
|
||||
* **AUDIT-054 deferred-item**: not re-touched. Cache persistence
|
||||
remains opt-in via `XENIA_CACHE_PERSIST=1`. Default keeps the
|
||||
AUDIT-038 wipe behavior.
|
||||
* **`--mute=true`**: every canary run.
|
||||
* **Renamed binaries**: `xrs-c10` / `xc-c10.exe`.
|
||||
|
||||
## Confidence
|
||||
|
||||
* **Phase 1 emitter extension**: HIGH — schema-compliant, additive,
|
||||
cvar-OFF zero-cost verified via determinism.
|
||||
* **Phase 4 classification**: HIGH — three independent observations
|
||||
agree (canary cache populated, ours cache wiped, multiple hashes).
|
||||
* **Cascade prediction at 102,404**: cache fix lands only the
|
||||
FIRST in a series — next cache hash will be the next divergence.
|
||||
Likely net delta of several hundred to a few thousand matched
|
||||
events per cache slot resolved, until a non-cache divergence
|
||||
appears.
|
||||
Reference in New Issue
Block a user