Files
xenia-rs/audit-runs/phase-c10-NtQueryFullAttributesFile/escalation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

9.2 KiB
Raw Blame History

Phase C+10 — NtQueryFullAttributesFile — ESCALATION

Outcome

Phase 1 (emitter extension) — LANDED. Phase 4 fix (cache-state seeding) — ESCALATED, deferred to a dedicated cache-subsystem session.

The Phase A emitter now resolves OBJECT_ATTRIBUTES path arguments on both engines (cvar-gated, default-off, behaviorally inert when off). That permanent infrastructure win surfaces the divergence string for this and every future file-IO divergence.

The actual cache-seeding fix needed to advance main matched-prefix past 102,404 is out of scope per the user's escalation criteria.

Captured framing (post-extension)

Both engines now log the resolved path at kernel.call.args_resolved:

canary[6][102403]: NtQueryFullAttributesFile  args_resolved.path = "cache:\\d4ea4615\\e\\46ee8ca"
ours  [1][102403]: NtQueryFullAttributesFile  args_resolved.path = "cache:\\d4ea4615\\e\\46ee8ca"

canary[6][102404]: kernel.return  return_value = 0           (STATUS_SUCCESS)
ours  [1][102404]: kernel.return  return_value = 0xC0000034  (STATUS_OBJECT_NAME_NOT_FOUND)

Both engines query the same path. Canary returns SUCCESS because its cache directory (/home/fabi/.local/share/Xenia/cache/) is pre-populated with 23 files (~5 MB) accumulated over prior Sylpheed boots. Ours's cache directory is fresh-wiped per AUDIT-038.

After this query, canary follows up with NtCreateFile for the same path (idx 102481) — it actually reads the cached data. So just lying SUCCESS without backing bytes would only push the divergence ~78 events forward.

Classification (per plan Phase 4)

(A) Missing file — narrowly true (this single cache entry), but (D) Subsystem-required — actual scope.

Choices considered:

  1. Plant a single file: would only push the divergence to the next cache-existence query (16+ distinct hashes in cache:\<HASH1>\<X>\<HASH2> form). 23 files in canary's cache, most of them follow this pattern. After each plant the next query still misses.

  2. Seed ours's cache from canary's: 23 files, ~5 MB. Mechanically easy (~30 LOC copy_dir_all) but violates AUDIT-038's no-oracle- state line AND AUDIT-053's documented warm-start regression (Sylpheed's cache:\*.tmp journal-style writes append per boot, making a naive persistent seed self-inconsistent after the second boot — runtime_error throws from version-check on reload).

  3. Lie SUCCESS on cache: existence + lie SUCCESS on subsequent NtCreateFile + return zero-byte file: changes Nt semantics game-wide, likely breaks any read that expects valid content.

  4. Implement the game's cache-generation logic: that's the shader/PSO/material cache build subsystem — multi-hundred-LOC generative subsystem, not in scope.

The user's escalation criteria explicitly call out "cache-population infrastructure" as ESCALATION. Choices 2-4 fit that. Choice 1 doesn't solve the problem.

What was landed (Phase 1 only)

Permanent emitter extension on both engines, schema-v1-compatible (args_resolved was already part of v1, this just populates it for OBJECT_ATTRIBUTES*-taking exports).

Ours side (~50 LOC additive)

  • xenia-rs/crates/xenia-kernel/src/event_log.rs:

    • New emit_kernel_call_with_path(tid, cycle, name, Option<&str>) that mirrors emit_kernel_call but adds args_resolved:{"path":"..."} when the path is non-empty. Degrades to the existing empty-object form otherwise so output is byte-identical to pre-extension when the path is null.
  • xenia-rs/crates/xenia-kernel/src/path.rs:

    • New object_attributes_raw_name(mem, ptr) -> Option<String> that returns the raw trimmed path (no prefix-strip, no case-fold). The emitter uses raw form so the diff surfaces upstream differences (e.g. if one engine called with one prefix and the other with a different prefix), not just post-normalize differences.
  • xenia-rs/crates/xenia-kernel/src/state.rs:

    • In call_export, when phase_a_on and name matches one of {NtCreateFile, NtOpenFile, NtQueryFullAttributesFile, NtOpenSymbolicLinkObject}, resolve OBJECT_ATTRIBUTES* from the appropriate gpr position (verified against canary's xboxkrnl_io.cc signatures) and call emit_kernel_call_with_path. Otherwise call the legacy emit_kernel_call.

Canary side (~80 LOC additive)

  • xenia-canary/src/xenia/kernel/event_log.h:

    • New EmitKernelCallWithPath(name, path) mirroring ours.
  • xenia-canary/src/xenia/kernel/event_log.cc:

    • Implementation of EmitKernelCallWithPath.
    • New phase_a_bridge::EmitImportAndCallWithCtx(module, ord, name, ppc_context) that dispatches by name to read OBJECT_ATTRIBUTES from the PPCContext gpr and call the path-bearing form. Falls back to the legacy form when name doesn't match.
    • Helper ReadObjectAttributesRawName(obj_attrs_ptr) that mirrors ours's object_attributes_raw_name semantically (raw trimmed, no normalization).
  • xenia-canary/src/xenia/kernel/util/shim_utils.h:

    • Both trampolines (X::Trampoline / Y::Trampoline) switched from EmitImportAndCall(...) to EmitImportAndCallWithCtx(..., ppc_context). PPCContext is already in scope at that call site (it's the first argument the trampoline receives).

Total: ~80 LOC each side. Both behaviorally inert when cvar OFF.

Gates (Phase 1 extension only — all pass)

# gate result
1 cvar-OFF determinism 50M (3 runs) PASS — all 3 = b8fa0e0460359a4f660adb7605e053de (matches C+9 baseline, unchanged)
2 Phase B image_loaded_sha256 PASS — ea8d160e9369328a5b922258a92113efb8d7ce3e1a5c12cc521e375985c91c18 (matches baseline)
3 Phase A main matched-prefix UNCHANGED — 102404 (extension was framing-only; no fix landed; no advance expected)
4 Both engines build clean PASS
5 Phase A emitter det fields (2 runs) PASS — both = 7489e90ef4c9be629af8c9fabb1cbdd7 (new; replaces C+9's 0b299c37… because the new args_resolved.path field is part of the det signature)
6 Unit tests PASS — 165 → 165 (no new, no regressions)

Schema status

The args_resolved field is part of schema-v1 already; this Phase only populates it for a subset of exports. No schema version bump.

The schema-v1 example (schema-v1.md:112) shows exactly the form we emit. We are now compliant with the documented schema for path-bearing exports rather than emitting an empty stub.

Cascade prediction (resolution / next steps)

stage predicted outcome
A=extend emitter cleanly ~80% LANDED
B=capture path string both engines ~85% LANDED — cache:\d4ea4615\e\46ee8ca matched both engines
C=classify root cause ~75% DONE — Class D (subsystem-required)
D=land fix in scope ~55% ESCALATED — fix is choice 2-4 above
E=main chain advances past 102404 ~50% NOT THIS SESSION

Reading-error class

NO new class. Existing classes #15 / ζ (VFS layout aliasing, AUDIT-053) and AUDIT-038 (no oracle state) are re-affirmed:

  • Class #15 ζ (AUDIT-053): persistent cache + journal .tmp writes create a warm-start regression.
  • AUDIT-038 line: oracle state is forbidden in default boot.

Both rules together make the cache-seeding fix subsystem-tier, not single-fix-tier.

Handoff to dedicated cache-subsystem session

The next session targeting this divergence should:

  1. Decide cache-state strategy:

    • (a) Implement Sylpheed's cache-generation logic so ours builds its own cache from scratch (matches canary's own bootstrap experience — but multi-hundred-LOC).
    • (b) Seed-once-then-persist: copy canary's cache into ours's cache_root behind a new cvar --cache-seed-from=<path>, then enable persistence. AUDIT-053's warm-start regression must be re-tested with AUDIT-054's FILE_DIRECTORY_FILE fix in tree (it landed AFTER 053's regression was observed).
    • (c) Hybrid: synthesize a stub success at NtQueryFullAttributesFile for known-good cache hashes, then synthesize NtCreateFile/Read responses with bytes captured from canary's cache files. Closest to a "single missing file plant" but for 23 files.
  2. Re-validate after the fix that the warm-start regression identified in AUDIT-053 doesn't recur (AUDIT-054 may have fixed it; needs explicit re-test).

  3. Expect cascading Phase A divergences: each cache hash the game looks up in turn — the divergence at 102,404 is only the FIRST. After cache:\d4ea4615 is resolved, the game queries cache:\69d8e45c (idx 103810 already visible in ours.jsonl) and so on through 16+ distinct hashes per AUDIT-052.

Files in this audit run

file content
escalation.md this file
investigation.md Phase 1-4 walkthrough
re-validation.md gate results (Phase 1 extension only)
ours.jsonl, ours-determ.jsonl, canary.jsonl Phase A logs with new args_resolved field
diff-report.md re-run with path field populated
snap/ours/ Phase B snapshot (unchanged from C+9)
digest-cvaroff-{1,2,3}.json 3× determinism (all = C+9 baseline)

Next target

Same idx 102,404 NtQueryFullAttributesFile, but in a dedicated cache-subsystem session. Path framing is now captured for the next investigator's first read.