Files
xenia-rs/audit-runs/audit-068-host-mem-watch/writer-report-v2.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

13 KiB
Raw Blame History

AUDIT-068 Session 2 — writer report (extended coverage)

Date: 2026-05-19

Summary

Session 2 extends Session 1's host-side write watch from xe::store_and_swap<T> + xe::store<T> + Memory::Zero/Fill/Copy to ALSO cover:

  1. xe::endian_store<T,E>::set() (the underlying impl of xe::be<T>/xe::le<T>), gated on Memory::Memory() having registered the host→guest thunk so static-init order doesn't race the cvar.
  2. Memory::Copy full byte-scan over every 4-byte-aligned source offset (gated on g_active & 0x1).
  3. XEX loader memcpy/lzx_decompress pre-scan at 4 sites in xenia/cpu/xex_module.cc (patch-memcpy, uncompressed-image memcpy, basic-block memcpy, LZX-decompress output).

The static-init gate proved load-bearing: my initial Run 5 (XEX section sanity) produced 0 hits because endian_store::set() was fired during static-init before cvars::audit_68_host_mem_watch_* objects were constructed; parse_locked() ran with empty strings and permanently latched g_active=0. Fix: defer parse until g_host_to_guest_thunk is non-null (set inside Memory::Memory()).

LOC added (canary only)

File LOC delta Purpose
src/xenia/base/byte_order.h +27 endian_store::set() hook (gated on g_host_to_guest_thunk != nullptr) + #include <type_traits> + #include "audit_68_host_mem_watch_fwd.h"
src/xenia/memory.cc +35 / -17 Memory::Copy byte-scan over 4-byte-aligned source positions; preserves addr-only coarse event
src/xenia/cpu/xex_module.cc +35 Inline helper audit68_prescan_memcpy() + wraps at sites 427 (patch image), 592 (uncompressed exe load), 668 (basic-block memcpy), 840 (post-lzx_decompress scan of guest-image bytes)
src/xenia/base/audit_68_host_mem_watch_base.cc +12 Static-init gate in check_host_write_slowpath and check_guest_va_slowpath
Total ~110 LOC additive (cvar-gated; zero cost when off, modest cost when on)

xenia-rs HEAD e6d43a23ac393004d2e5adf2f0395fd0b5e6448b UNCHANGED.

Captures

All runs cold-boot (cache wipe before each), --mute=true, against the Sylpheed ISO.

Run 5 — XEX .text region sanity (validates Step 3)

Cmdline: --audit_68_host_mem_watch_addrs=0x82000000-0x82010000 --mute=true. 70 s wallclock.

Result: 1 hit, in INIT line + 1 HOST-WRITE. This is the Step 3 validation — Session 1's smoking-gun absence of writes to the XEX .text region IS now caught.

i> 00000114 AUDIT-068-INIT values_csv="" addrs_csv="0x82000000-0x82010000" values_parsed=0 addr_ranges_parsed=1 active=0x2
i> 00000114 AUDIT-068-INIT addr_range[0] = 0x82000000-0x82010000
i> 00000114 AUDIT-068-HOST-WRITE guest_va=0x82000000 host_ptr=0x0000000000000000 val=0x000000004D5A9000 sz=8 fn=xex_lzx_decompress_output host_ns=300 tid=276

The value 0x4D5A9000 is the BE-encoded first 4 bytes of the XEX image: "MZ\x90\x00" = PE/EXE magic. Exactly as expected — lzx_decompress writes the decoded image starting at base_address_=0x82000000. Session 1's reading-error class #35 is now mitigated.

Note: only ONE hit appears (the coarse addr-only event for the start of the lzx output region) because the addr-range 0x82000000-0x82010000 intersects only the head of the ~2 MB decompress span. The per-4-byte value loop is skipped (no values configured, active & 0x1 == 0).

Run 3 — vtable 0x8200A208 / 0x8200A928 writers (extended)

Cmdline: --audit_68_host_mem_watch_values=0x8200A208,0x8200A928,0x080082A2,0x2829820 --audit_68_host_mem_watch_addrs=0xBCE25340 --mute=true. 90 s wallclock.

Result: 0 HOST-WRITE hits (INIT lines present; active=0x3). Boot reaches tid=29 spawn (post-Phase-NonMatch trigger window).

i> 00000114 AUDIT-068-INIT values_csv="0x8200A208,0x8200A928,0x080082A2,0x2829820" addrs_csv="0xBCE25340" values_parsed=4 addr_ranges_parsed=1 active=0x3
i> 00000114 AUDIT-068-INIT value[0] = 0x8200A208
i> 00000114 AUDIT-068-INIT value[1] = 0x8200A928
i> 00000114 AUDIT-068-INIT value[2] = 0x080082A2
i> 00000114 AUDIT-068-INIT value[3] = 0x02829820
i> 00000114 AUDIT-068-INIT addr_range[0] = 0xBCE25340-0xBCE25347

Critical implication: with Session 2's extended coverage, NONE of the following surfaces ever wrote the target value or to the target VA in canary's full boot:

  • xe::store_and_swap<T> (T = u8/u16/u32/u64/i8/i16/i32/i64)
  • xe::store<T> (host-endian sibling)
  • Memory::Zero/Fill/Copy (incl. full byte-scan in Memory::Copy)
  • xe::endian_store<T,E>::set() (the underlying be<T>/le<T> write path)
  • XEX loader memcpy at 4 sites + lzx_decompress output

AUDIT-067 already ruled out all 16 PPC JIT'd store opcodes (stw/stwu/stwx/stwux/stwbrx/stwcx./stmw/std/stdu/stdux/stdx/stdbrx/stdcx./stvx/stvxl/stvewx). Combined verdict: 0xBCE25340 is never explicitly written via any known canonical write surface. Yet sub_825070F0 reads [0xBCE25340]=0x8200A208 per AUDIT-058/063/067 trigger fire. New search candidates listed below.

Run 4 — voice-struct field clear extended

Cmdline: --audit_68_host_mem_watch_addrs=0x42500000-0x42600000 --mute=true. 60 s wallclock.

Result: 0 HOST-WRITE hits (INIT lines present; active=0x2).

Per Session 1 plan, the addr range 0x42500000-0x42600000 was a guess. With Session 2's extended coverage it remains a guess — voice struct base is unknown. Next step (Session 3+): instrument canary's XAudio2AudioDriver::CreateVoice (or equivalent) to log the heap region holding the voice array, then re-run with that range.

Sanity (value=0) — confirms full-surface coverage

Cmdline: --audit_68_host_mem_watch_values=0x00000000 --mute=true. 20 s wallclock.

Result: 78,738 hits across all hooked surfaces:

Surface Hits Notes
xex_lzx_decompress_output 78,655 Every 4-byte-zero u32 in the LZX-decompressed Sylpheed image (.bss/.padding)
Memory::Zero 39 Heap-page zero on Memory::Initialize + stack zeros
be<T>::set 35 NEW hook — proves Step 1 works. Header writes from kernel_state.cc / xboxkrnl_threading.cc etc.
store_and_swap<u32> 5 TIB/kernel-pointer init (same as Session 1)
Memory::Fill 4 RtlFillMemory equivalents

Session 1 sanity was 1,639 hits — Session 2 covers ~48× more surface area, validating that the new hooks fire correctly during boot.

Headline finding

Session 2 expanded the host-write watch from ~5 surfaces (store_and_swap, store, Memory::Zero/Fill/Copy) to ~9 surfaces (+ be::set, + xex_module memcpy at 4 sites, + lzx_decompress output). Sanity went from 1,639 → 78,738 hits, validating the new hooks.

Despite this expansion, the vtable install at [0xBCE25340] = 0x8200A208 STILL produces 0 hits across canary's full boot. Combined with AUDIT-067's 16 PPC JIT store hooks producing 0 hits, the install path is officially OUTSIDE the known canonical write surfaces. Possible remaining paths (Session 3+ search space):

  1. Direct *reinterpret_cast<T*>(host_ptr) = value in kernel-import handlers (raw pointer assignment, bypassing xe::be<T>::set(), xe::store_and_swap, and Memory::*). Audit needs ripgrep on kernel/xboxkrnl/*.cc for patterns matching the above.
  2. Allocator-side initial-state writesMmAllocatePhysicalMemoryEx returning a block that already contains the value from a prior committed-but-deallocated page (cross-page artifact). Memory protection routines (MmSetAllocationProtect etc.) may also mutate.
  3. GPU/HostMemory mmio mappings — D3D12 backbuffer / texture upload may write to guest VA ranges directly via mapped allocations.
  4. VFS file readback into guest VANtReadFile writes the file contents into guest memory via Memory::Copy (now scanned) OR via a direct memcpy(host_ptr, src, n) in xfile.cc/host_path_file.cc. Need to audit those.
  5. Kernel-import handler using a typed POD struct copy — e.g. *reinterpret_cast<X_FOO*>(host_ptr) = X_FOO{...} where memberwise assignment runs through neither be<T>::set() (because POD struct copy uses memcpy semantics) nor store_and_swap.

Path 5 is the most likely candidate. The implicit copy-assignment of a struct containing be<T> members would NOT route through set() — only through bytewise memcpy. This is a hook-surface gap that Session 3 should target.

Cross-reference each captured writer in ours

xex_lzx_decompress_output (Run 5 — 1 hit)

Captures the LZX decompress of the XEX image into guest VA base_address_=0x82000000. In canary: xenia/cpu/xex_module.cc:840 calls lzx_decompress(compress_buffer, ..., buffer, uncompressed_size, ...) where buffer = memory()->TranslateVirtual(base_address_).

Ours-side analog: xenia-rs/crates/xenia-xex/src/lzx.rs + xenia-rs/crates/xenia-xex/src/loader.rs. Per Phase B image_loaded_sha256 ea8d160e… matching across cold runs, ours's LZX decoder produces byte-identical output to canary's. No fix needed. GAP CLASS: NONE.

be<T>::set (sanity-v2 — 35 hits in 20 s)

Per sanity capture, these are likely kernel-state header writes (kernel_state.cc:create_dispatch_table etc.). Ours's analog: xenia-rs/crates/xenia-kernel/src/state.rs + exports.rs (each kernel handler that writes a be<T> field). Without enabling per-event tagging in the canary log we can't enumerate which handler produced which hit; full cross-reference deferred to Session 3.

GAP CLASS: UNKNOWN — needs per-tid stack-trace enrichment in canary instrumentation.

Memory::Zero, Memory::Fill, store_and_swap<u32> (sanity-v2 — 48 hits combined)

Already covered by Session 1 cross-reference. No new gaps surfaced.

Predicted vs actual outcomes

Cascade rung Prediction Actual
A=catch vtable installer ~75% FAIL — 0 hits despite ~9-surface coverage. Hook-surface still incomplete OR install is via path-5-style POD struct copy.
B=catch voice-struct clearer ~50% FAIL — 0 hits. Addr range was a guess; needs guest-side voice-base probe first.
C=identify ours's gap if A succeeds ~70% (cond. on A) N/A (A failed).
D=Session 3 progression-metric move ~40-50% (cond. on A+C) N/A (A failed).

Validated rungs:

Rung Actual
E=Step 3 validation (XEX section caught) PASS — Run 5 caught xex_lzx_decompress_output at 0x82000000 with MZ\x90\x00 magic. Session 1 reading-error #35 resolved at the hook level.
F=be::set() hook fires correctly PASS — sanity-v2 saw 35 be::set hits in 20 s without crashing static init.

Session 3 recommendation

Three concrete next steps in priority order:

Step 1 — Hook raw pointer assignments inside kernel/util/shim_utils.h. Per shim_utils.h, kernel-import handlers receive typed pointers (X_HANDLE*, etc.) and assign via *ptr = value raw assignment. be<T> field assignment in a POD struct does NOT go through set() because struct-level memcpy semantics skip the member init. Add a XAUDIT_68_WRITE_FIELD(host_ptr, value) macro to be invoked at known write sites OR (more invasive) instrument each *ptr = ... pattern. ~50-100 LOC additive.

Step 2 — Add a memory-protection trap on guest VA 0xBCE25340 (4 bytes). Use a guard page (Memory::Protect to read-only) and trap the host signal handler to log the writer's RIP/x86 instruction. This is the nuclear option — bypasses ALL emulation-layer hooks and catches the actual host store instruction. Requires platform-specific SIGSEGV/AEH handler integration. ~150-200 LOC platform-gated.

Step 3 — Read-mode probe instead of write-mode. Place a RtlReadGuestU32(0xBCE25340) probe at the FIRST iteration of canary's main loop AFTER memory init; log the VALUE at that address. If the value is 0 early then 0x8200A208 later, we know it's written between those moments. Combined with --audit_61_branch_probe_pcs=0x825070F0 (which AUDIT-067 confirmed fires) and a binary-bisect over the boot trajectory.

Step 3 is cheapest (~20 LOC) and may pinpoint the install epoch without finding the writer; pair with bisection across the audit-068 event log.

Cascade outcome

  • A (vtable installer caught): FAIL — surfaces still incomplete, but space narrowed.
  • B (voice-struct clearer caught): FAIL — addr range remains a guess.
  • C (ours gap identified): N/A (A failed).
  • D (Session 3 progression move): N/A.
  • E (Step 3 XEX-section validation): PASS — proves Session 1's #35 surface gap is at least partially closed.
  • F (be::set hook works): PASS.

Net: 2 cascade wins (E, F) for "instrumentation is sound and now covers ~9 surfaces"; 2 cascade losses (A, B) for "the actual writer is in a path that's STILL un-hooked or doesn't exist as a canonical write at all".

Artifacts (this dir)

  • instrumentation-design.md (Session 1)
  • fix-canary.diff (Session 1 — 5-file diff)
  • fix-canary-v2.diff (Session 2 — extends with 4 more sites)
  • run1-vtable-writers.log (Session 1 — 0 hits)
  • run2-voice-struct-writers.log (Session 1 — 0 hits)
  • run3-vtable-extended.log (Session 2 — 0 HOST-WRITE hits, INIT confirmed)
  • run4-voice-struct-extended.log (Session 2 — 0 hits)
  • run5-xex-section-sanity.log (Session 2 — 1 hit validating Step 3)
  • sanity-value0.log (Session 1 — 1,639 hits)
  • sanity-v2-value0.log (Session 2 — 78,738 hits incl. 35 from be::set)
  • writer-report.md (Session 1)
  • writer-report-v2.md (this file)
  • session-2-plan.md