handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO): - xenia-kernel/exports.rs: nt_create_event manual_reset polarity + related event wiring - xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps (.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as regenerable local artifacts — see memory + HANDOFF for the running findings. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
212
audit-runs/phase-c10-NtQueryFullAttributesFile/escalation.md
Normal file
212
audit-runs/phase-c10-NtQueryFullAttributesFile/escalation.md
Normal file
@@ -0,0 +1,212 @@
|
||||
# Phase C+10 — NtQueryFullAttributesFile — ESCALATION
|
||||
|
||||
## Outcome
|
||||
|
||||
**Phase 1 (emitter extension) — LANDED**.
|
||||
**Phase 4 fix (cache-state seeding) — ESCALATED**, deferred to a
|
||||
dedicated cache-subsystem session.
|
||||
|
||||
The Phase A emitter now resolves OBJECT_ATTRIBUTES path arguments on
|
||||
both engines (cvar-gated, default-off, behaviorally inert when off).
|
||||
That permanent infrastructure win surfaces the divergence string for
|
||||
this and every future file-IO divergence.
|
||||
|
||||
The actual cache-seeding fix needed to advance main matched-prefix
|
||||
past 102,404 is out of scope per the user's escalation criteria.
|
||||
|
||||
## Captured framing (post-extension)
|
||||
|
||||
Both engines now log the resolved path at `kernel.call.args_resolved`:
|
||||
|
||||
```
|
||||
canary[6][102403]: NtQueryFullAttributesFile args_resolved.path = "cache:\\d4ea4615\\e\\46ee8ca"
|
||||
ours [1][102403]: NtQueryFullAttributesFile args_resolved.path = "cache:\\d4ea4615\\e\\46ee8ca"
|
||||
|
||||
canary[6][102404]: kernel.return return_value = 0 (STATUS_SUCCESS)
|
||||
ours [1][102404]: kernel.return return_value = 0xC0000034 (STATUS_OBJECT_NAME_NOT_FOUND)
|
||||
```
|
||||
|
||||
Both engines query the **same path**. Canary returns SUCCESS because
|
||||
its cache directory (`/home/fabi/.local/share/Xenia/cache/`) is
|
||||
**pre-populated** with 23 files (~5 MB) accumulated over prior
|
||||
Sylpheed boots. Ours's cache directory is fresh-wiped per AUDIT-038.
|
||||
|
||||
After this query, canary follows up with `NtCreateFile` for the same
|
||||
path (idx 102481) — it actually reads the cached data. So just lying
|
||||
SUCCESS without backing bytes would only push the divergence ~78
|
||||
events forward.
|
||||
|
||||
## Classification (per plan Phase 4)
|
||||
|
||||
**(A) Missing file — narrowly true (this single cache entry), but**
|
||||
**(D) Subsystem-required — actual scope**.
|
||||
|
||||
Choices considered:
|
||||
|
||||
1. **Plant a single file**: would only push the divergence to the
|
||||
next cache-existence query (16+ distinct hashes in
|
||||
`cache:\<HASH1>\<X>\<HASH2>` form). 23 files in canary's cache,
|
||||
most of them follow this pattern. After each plant the next
|
||||
query still misses.
|
||||
|
||||
2. **Seed ours's cache from canary's**: 23 files, ~5 MB. Mechanically
|
||||
easy (~30 LOC `copy_dir_all`) but violates AUDIT-038's no-oracle-
|
||||
state line AND AUDIT-053's documented warm-start regression
|
||||
(Sylpheed's `cache:\*.tmp` journal-style writes append per boot,
|
||||
making a naive persistent seed self-inconsistent after the second
|
||||
boot — `runtime_error` throws from version-check on reload).
|
||||
|
||||
3. **Lie SUCCESS on cache: existence + lie SUCCESS on subsequent
|
||||
NtCreateFile + return zero-byte file**: changes Nt semantics
|
||||
game-wide, likely breaks any read that expects valid content.
|
||||
|
||||
4. **Implement the game's cache-generation logic**: that's the
|
||||
shader/PSO/material cache build subsystem — multi-hundred-LOC
|
||||
generative subsystem, not in scope.
|
||||
|
||||
The user's escalation criteria explicitly call out
|
||||
"cache-population infrastructure" as ESCALATION. Choices 2-4 fit
|
||||
that. Choice 1 doesn't solve the problem.
|
||||
|
||||
## What was landed (Phase 1 only)
|
||||
|
||||
Permanent emitter extension on both engines, schema-v1-compatible
|
||||
(`args_resolved` was already part of v1, this just populates it for
|
||||
OBJECT_ATTRIBUTES*-taking exports).
|
||||
|
||||
### Ours side (~50 LOC additive)
|
||||
|
||||
- `xenia-rs/crates/xenia-kernel/src/event_log.rs`:
|
||||
- New `emit_kernel_call_with_path(tid, cycle, name, Option<&str>)`
|
||||
that mirrors `emit_kernel_call` but adds
|
||||
`args_resolved:{"path":"..."}` when the path is non-empty.
|
||||
Degrades to the existing empty-object form otherwise so output
|
||||
is byte-identical to pre-extension when the path is null.
|
||||
|
||||
- `xenia-rs/crates/xenia-kernel/src/path.rs`:
|
||||
- New `object_attributes_raw_name(mem, ptr) -> Option<String>`
|
||||
that returns the **raw** trimmed path (no prefix-strip, no
|
||||
case-fold). The emitter uses raw form so the diff surfaces
|
||||
upstream differences (e.g. if one engine called with one prefix
|
||||
and the other with a different prefix), not just post-normalize
|
||||
differences.
|
||||
|
||||
- `xenia-rs/crates/xenia-kernel/src/state.rs`:
|
||||
- In `call_export`, when `phase_a_on` and `name` matches one of
|
||||
`{NtCreateFile, NtOpenFile, NtQueryFullAttributesFile,
|
||||
NtOpenSymbolicLinkObject}`, resolve OBJECT_ATTRIBUTES* from the
|
||||
appropriate gpr position (verified against canary's
|
||||
xboxkrnl_io.cc signatures) and call
|
||||
`emit_kernel_call_with_path`. Otherwise call the legacy
|
||||
`emit_kernel_call`.
|
||||
|
||||
### Canary side (~80 LOC additive)
|
||||
|
||||
- `xenia-canary/src/xenia/kernel/event_log.h`:
|
||||
- New `EmitKernelCallWithPath(name, path)` mirroring ours.
|
||||
|
||||
- `xenia-canary/src/xenia/kernel/event_log.cc`:
|
||||
- Implementation of `EmitKernelCallWithPath`.
|
||||
- New `phase_a_bridge::EmitImportAndCallWithCtx(module, ord, name,
|
||||
ppc_context)` that dispatches by `name` to read OBJECT_ATTRIBUTES
|
||||
from the PPCContext gpr and call the path-bearing form. Falls
|
||||
back to the legacy form when name doesn't match.
|
||||
- Helper `ReadObjectAttributesRawName(obj_attrs_ptr)` that mirrors
|
||||
ours's `object_attributes_raw_name` semantically (raw trimmed,
|
||||
no normalization).
|
||||
|
||||
- `xenia-canary/src/xenia/kernel/util/shim_utils.h`:
|
||||
- Both trampolines (X::Trampoline / Y::Trampoline) switched from
|
||||
`EmitImportAndCall(...)` to `EmitImportAndCallWithCtx(...,
|
||||
ppc_context)`. PPCContext is already in scope at that call site
|
||||
(it's the first argument the trampoline receives).
|
||||
|
||||
Total: ~80 LOC each side. Both behaviorally inert when cvar OFF.
|
||||
|
||||
## Gates (Phase 1 extension only — all pass)
|
||||
|
||||
| # | gate | result |
|
||||
|---|---|---|
|
||||
| 1 | cvar-OFF determinism 50M (3 runs) | PASS — all 3 = `b8fa0e0460359a4f660adb7605e053de` (matches C+9 baseline, unchanged) |
|
||||
| 2 | Phase B `image_loaded_sha256` | PASS — `ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18` (matches baseline) |
|
||||
| 3 | Phase A main matched-prefix | UNCHANGED — 102404 (extension was framing-only; no fix landed; no advance expected) |
|
||||
| 4 | Both engines build clean | PASS |
|
||||
| 5 | Phase A emitter det fields (2 runs) | PASS — both = `7489e90ef4c9be629af8c9fabb1cbdd7` (new; replaces C+9's `0b299c37…` because the new args_resolved.path field is part of the det signature) |
|
||||
| 6 | Unit tests | PASS — 165 → 165 (no new, no regressions) |
|
||||
|
||||
## Schema status
|
||||
|
||||
The args_resolved field is part of schema-v1 already; this Phase only
|
||||
**populates** it for a subset of exports. No schema version bump.
|
||||
|
||||
The schema-v1 example (`schema-v1.md:112`) shows exactly the form we
|
||||
emit. We are now compliant with the documented schema for path-bearing
|
||||
exports rather than emitting an empty stub.
|
||||
|
||||
## Cascade prediction (resolution / next steps)
|
||||
|
||||
| stage | predicted | outcome |
|
||||
|---|---|---|
|
||||
| A=extend emitter cleanly | ~80% | LANDED |
|
||||
| B=capture path string both engines | ~85% | LANDED — `cache:\d4ea4615\e\46ee8ca` matched both engines |
|
||||
| C=classify root cause | ~75% | DONE — Class D (subsystem-required) |
|
||||
| D=land fix in scope | ~55% | **ESCALATED** — fix is choice 2-4 above |
|
||||
| E=main chain advances past 102404 | ~50% | NOT THIS SESSION |
|
||||
|
||||
## Reading-error class
|
||||
|
||||
NO new class. Existing classes #15 / ζ (VFS layout aliasing,
|
||||
AUDIT-053) and AUDIT-038 (no oracle state) are re-affirmed:
|
||||
|
||||
* Class #15 ζ (AUDIT-053): persistent cache + journal `.tmp` writes
|
||||
create a warm-start regression.
|
||||
* AUDIT-038 line: oracle state is forbidden in default boot.
|
||||
|
||||
Both rules together make the cache-seeding fix subsystem-tier, not
|
||||
single-fix-tier.
|
||||
|
||||
## Handoff to dedicated cache-subsystem session
|
||||
|
||||
The next session targeting this divergence should:
|
||||
|
||||
1. **Decide cache-state strategy**:
|
||||
- (a) Implement Sylpheed's cache-generation logic so ours builds
|
||||
its own cache from scratch (matches canary's own bootstrap
|
||||
experience — but multi-hundred-LOC).
|
||||
- (b) Seed-once-then-persist: copy canary's cache into ours's
|
||||
cache_root behind a new cvar `--cache-seed-from=<path>`, then
|
||||
enable persistence. AUDIT-053's warm-start regression must be
|
||||
re-tested with AUDIT-054's FILE_DIRECTORY_FILE fix in tree
|
||||
(it landed AFTER 053's regression was observed).
|
||||
- (c) Hybrid: synthesize a stub success at NtQueryFullAttributesFile
|
||||
for known-good cache hashes, then synthesize NtCreateFile/Read
|
||||
responses with bytes captured from canary's cache files. Closest
|
||||
to a "single missing file plant" but for 23 files.
|
||||
|
||||
2. **Re-validate after the fix** that the warm-start regression
|
||||
identified in AUDIT-053 doesn't recur (AUDIT-054 may have fixed
|
||||
it; needs explicit re-test).
|
||||
|
||||
3. **Expect cascading Phase A divergences**: each cache hash the
|
||||
game looks up in turn — the divergence at 102,404 is only the
|
||||
FIRST. After cache:\d4ea4615 is resolved, the game queries
|
||||
cache:\69d8e45c (idx 103810 already visible in ours.jsonl) and
|
||||
so on through 16+ distinct hashes per AUDIT-052.
|
||||
|
||||
## Files in this audit run
|
||||
|
||||
| file | content |
|
||||
|---|---|
|
||||
| `escalation.md` | this file |
|
||||
| `investigation.md` | Phase 1-4 walkthrough |
|
||||
| `re-validation.md` | gate results (Phase 1 extension only) |
|
||||
| `ours.jsonl`, `ours-determ.jsonl`, `canary.jsonl` | Phase A logs with new args_resolved field |
|
||||
| `diff-report.md` | re-run with path field populated |
|
||||
| `snap/ours/` | Phase B snapshot (unchanged from C+9) |
|
||||
| `digest-cvaroff-{1,2,3}.json` | 3× determinism (all = C+9 baseline) |
|
||||
|
||||
## Next target
|
||||
|
||||
**Same idx 102,404 NtQueryFullAttributesFile**, but in a dedicated
|
||||
cache-subsystem session. Path framing is now captured for the next
|
||||
investigator's first read.
|
||||
Reference in New Issue
Block a user