handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,178 @@
# AUDIT-068 Session 2 — writer report (extended coverage)
Date: 2026-05-19
## Summary
Session 2 extends Session 1's host-side write watch from `xe::store_and_swap<T>` + `xe::store<T>` + `Memory::Zero/Fill/Copy` to ALSO cover:
1. **`xe::endian_store<T,E>::set()`** (the underlying impl of `xe::be<T>`/`xe::le<T>`), gated on `Memory::Memory()` having registered the host→guest thunk so static-init order doesn't race the cvar.
2. **`Memory::Copy` full byte-scan** over every 4-byte-aligned source offset (gated on `g_active & 0x1`).
3. **XEX loader memcpy/lzx_decompress pre-scan** at 4 sites in `xenia/cpu/xex_module.cc` (patch-memcpy, uncompressed-image memcpy, basic-block memcpy, LZX-decompress output).
The static-init gate proved load-bearing: my initial Run 5 (XEX section sanity) produced 0 hits because `endian_store::set()` was fired during static-init before `cvars::audit_68_host_mem_watch_*` objects were constructed; `parse_locked()` ran with empty strings and permanently latched `g_active=0`. Fix: defer parse until `g_host_to_guest_thunk` is non-null (set inside `Memory::Memory()`).
## LOC added (canary only)
| File | LOC delta | Purpose |
|---|---:|---|
| `src/xenia/base/byte_order.h` | +27 | `endian_store::set()` hook (gated on `g_host_to_guest_thunk != nullptr`) + `#include <type_traits>` + `#include "audit_68_host_mem_watch_fwd.h"` |
| `src/xenia/memory.cc` | +35 / -17 | `Memory::Copy` byte-scan over 4-byte-aligned source positions; preserves addr-only coarse event |
| `src/xenia/cpu/xex_module.cc` | +35 | Inline helper `audit68_prescan_memcpy()` + wraps at sites 427 (patch image), 592 (uncompressed exe load), 668 (basic-block memcpy), 840 (post-`lzx_decompress` scan of guest-image bytes) |
| `src/xenia/base/audit_68_host_mem_watch_base.cc` | +12 | Static-init gate in `check_host_write_slowpath` and `check_guest_va_slowpath` |
| **Total** | **~110 LOC additive** (cvar-gated; zero cost when off, modest cost when on) | |
xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` UNCHANGED.
## Captures
All runs cold-boot (cache wipe before each), `--mute=true`, against the Sylpheed ISO.
### Run 5 — XEX .text region sanity (validates Step 3)
Cmdline: `--audit_68_host_mem_watch_addrs=0x82000000-0x82010000 --mute=true`. 70 s wallclock.
**Result: 1 hit, in INIT line + 1 HOST-WRITE.** This is the Step 3 validation — Session 1's smoking-gun absence of writes to the XEX `.text` region IS now caught.
```
i> 00000114 AUDIT-068-INIT values_csv="" addrs_csv="0x82000000-0x82010000" values_parsed=0 addr_ranges_parsed=1 active=0x2
i> 00000114 AUDIT-068-INIT addr_range[0] = 0x82000000-0x82010000
i> 00000114 AUDIT-068-HOST-WRITE guest_va=0x82000000 host_ptr=0x0000000000000000 val=0x000000004D5A9000 sz=8 fn=xex_lzx_decompress_output host_ns=300 tid=276
```
The value `0x4D5A9000` is the BE-encoded first 4 bytes of the XEX image: `"MZ\x90\x00"` = PE/EXE magic. Exactly as expected — `lzx_decompress` writes the decoded image starting at `base_address_=0x82000000`. **Session 1's reading-error class #35 is now mitigated**.
Note: only ONE hit appears (the coarse addr-only event for the start of the lzx output region) because the addr-range `0x82000000-0x82010000` intersects only the head of the ~2 MB decompress span. The per-4-byte value loop is skipped (no values configured, `active & 0x1 == 0`).
### Run 3 — vtable `0x8200A208 / 0x8200A928` writers (extended)
Cmdline: `--audit_68_host_mem_watch_values=0x8200A208,0x8200A928,0x080082A2,0x2829820 --audit_68_host_mem_watch_addrs=0xBCE25340 --mute=true`. 90 s wallclock.
**Result: 0 HOST-WRITE hits** (INIT lines present; `active=0x3`). Boot reaches tid=29 spawn (post-Phase-NonMatch trigger window).
```
i> 00000114 AUDIT-068-INIT values_csv="0x8200A208,0x8200A928,0x080082A2,0x2829820" addrs_csv="0xBCE25340" values_parsed=4 addr_ranges_parsed=1 active=0x3
i> 00000114 AUDIT-068-INIT value[0] = 0x8200A208
i> 00000114 AUDIT-068-INIT value[1] = 0x8200A928
i> 00000114 AUDIT-068-INIT value[2] = 0x080082A2
i> 00000114 AUDIT-068-INIT value[3] = 0x02829820
i> 00000114 AUDIT-068-INIT addr_range[0] = 0xBCE25340-0xBCE25347
```
**Critical implication**: with Session 2's extended coverage, NONE of the following surfaces ever wrote the target value or to the target VA in canary's full boot:
- `xe::store_and_swap<T>` (T = u8/u16/u32/u64/i8/i16/i32/i64)
- `xe::store<T>` (host-endian sibling)
- `Memory::Zero/Fill/Copy` (incl. full byte-scan in `Memory::Copy`)
- `xe::endian_store<T,E>::set()` (the underlying `be<T>`/`le<T>` write path)
- XEX loader memcpy at 4 sites + `lzx_decompress` output
AUDIT-067 already ruled out all 16 PPC JIT'd store opcodes (stw/stwu/stwx/stwux/stwbrx/stwcx./stmw/std/stdu/stdux/stdx/stdbrx/stdcx./stvx/stvxl/stvewx). Combined verdict: **`0xBCE25340` is never explicitly written via any known canonical write surface**. Yet `sub_825070F0` reads `[0xBCE25340]=0x8200A208` per AUDIT-058/063/067 trigger fire. New search candidates listed below.
### Run 4 — voice-struct field clear extended
Cmdline: `--audit_68_host_mem_watch_addrs=0x42500000-0x42600000 --mute=true`. 60 s wallclock.
**Result: 0 HOST-WRITE hits** (INIT lines present; `active=0x2`).
Per Session 1 plan, the addr range `0x42500000-0x42600000` was a guess. With Session 2's extended coverage it remains a guess — voice struct base is unknown. Next step (Session 3+): instrument canary's `XAudio2AudioDriver::CreateVoice` (or equivalent) to log the heap region holding the voice array, then re-run with that range.
### Sanity (value=0) — confirms full-surface coverage
Cmdline: `--audit_68_host_mem_watch_values=0x00000000 --mute=true`. 20 s wallclock.
**Result: 78,738 hits** across all hooked surfaces:
| Surface | Hits | Notes |
|---|---:|---|
| `xex_lzx_decompress_output` | 78,655 | Every 4-byte-zero u32 in the LZX-decompressed Sylpheed image (.bss/.padding) |
| `Memory::Zero` | 39 | Heap-page zero on Memory::Initialize + stack zeros |
| `be<T>::set` | 35 | **NEW hook — proves Step 1 works.** Header writes from `kernel_state.cc` / `xboxkrnl_threading.cc` etc. |
| `store_and_swap<u32>` | 5 | TIB/kernel-pointer init (same as Session 1) |
| `Memory::Fill` | 4 | RtlFillMemory equivalents |
Session 1 sanity was 1,639 hits — Session 2 covers ~48× more surface area, validating that the new hooks fire correctly during boot.
## Headline finding
Session 2 expanded the host-write watch from **~5 surfaces** (store_and_swap, store, Memory::Zero/Fill/Copy) to **~9 surfaces** (+ be<T>::set, + xex_module memcpy at 4 sites, + lzx_decompress output). Sanity went from 1,639 → 78,738 hits, validating the new hooks.
**Despite this expansion**, the vtable install at `[0xBCE25340] = 0x8200A208` STILL produces 0 hits across canary's full boot. Combined with AUDIT-067's 16 PPC JIT store hooks producing 0 hits, the install path is officially OUTSIDE the known canonical write surfaces. Possible remaining paths (Session 3+ search space):
1. **Direct `*reinterpret_cast<T*>(host_ptr) = value`** in kernel-import handlers (raw pointer assignment, bypassing `xe::be<T>::set()`, `xe::store_and_swap`, and `Memory::*`). Audit needs ripgrep on `kernel/xboxkrnl/*.cc` for patterns matching the above.
2. **Allocator-side initial-state writes**`MmAllocatePhysicalMemoryEx` returning a block that already contains the value from a prior committed-but-deallocated page (cross-page artifact). Memory protection routines (`MmSetAllocationProtect` etc.) may also mutate.
3. **GPU/HostMemory mmio mappings** — D3D12 backbuffer / texture upload may write to guest VA ranges directly via mapped allocations.
4. **VFS file readback into guest VA**`NtReadFile` writes the file contents into guest memory via `Memory::Copy` (now scanned) OR via a direct `memcpy(host_ptr, src, n)` in `xfile.cc`/host_path_file.cc. Need to audit those.
5. **Kernel-import handler using a typed POD struct copy** — e.g. `*reinterpret_cast<X_FOO*>(host_ptr) = X_FOO{...}` where memberwise assignment runs through neither `be<T>::set()` (because POD struct copy uses memcpy semantics) nor `store_and_swap`.
Path 5 is the most likely candidate. The implicit copy-assignment of a struct containing `be<T>` members would NOT route through `set()` — only through bytewise memcpy. This is a hook-surface gap that Session 3 should target.
## Cross-reference each captured writer in ours
### `xex_lzx_decompress_output` (Run 5 — 1 hit)
Captures the LZX decompress of the XEX image into guest VA `base_address_=0x82000000`. In canary: `xenia/cpu/xex_module.cc:840` calls `lzx_decompress(compress_buffer, ..., buffer, uncompressed_size, ...)` where `buffer = memory()->TranslateVirtual(base_address_)`.
**Ours-side analog**: `xenia-rs/crates/xenia-xex/src/lzx.rs` + `xenia-rs/crates/xenia-xex/src/loader.rs`. Per Phase B `image_loaded_sha256 ea8d160e…` matching across cold runs, ours's LZX decoder produces byte-identical output to canary's. No fix needed. **GAP CLASS: NONE.**
### `be<T>::set` (sanity-v2 — 35 hits in 20 s)
Per sanity capture, these are likely kernel-state header writes (`kernel_state.cc:create_dispatch_table` etc.). Ours's analog: `xenia-rs/crates/xenia-kernel/src/state.rs` + `exports.rs` (each kernel handler that writes a `be<T>` field). Without enabling per-event tagging in the canary log we can't enumerate which handler produced which hit; full cross-reference deferred to Session 3.
**GAP CLASS: UNKNOWN — needs per-tid stack-trace enrichment in canary instrumentation.**
### `Memory::Zero`, `Memory::Fill`, `store_and_swap<u32>` (sanity-v2 — 48 hits combined)
Already covered by Session 1 cross-reference. No new gaps surfaced.
## Predicted vs actual outcomes
| Cascade rung | Prediction | Actual |
|---|---|---|
| A=catch vtable installer | ~75% | **FAIL** — 0 hits despite ~9-surface coverage. Hook-surface still incomplete OR install is via path-5-style POD struct copy. |
| B=catch voice-struct clearer | ~50% | **FAIL** — 0 hits. Addr range was a guess; needs guest-side voice-base probe first. |
| C=identify ours's gap if A succeeds | ~70% (cond. on A) | **N/A** (A failed). |
| D=Session 3 progression-metric move | ~40-50% (cond. on A+C) | **N/A** (A failed). |
Validated rungs:
| Rung | Actual |
|---|---|
| **E=Step 3 validation (XEX section caught)** | **PASS** — Run 5 caught `xex_lzx_decompress_output` at `0x82000000` with `MZ\x90\x00` magic. Session 1 reading-error #35 resolved at the hook level. |
| **F=be<T>::set() hook fires correctly** | **PASS** — sanity-v2 saw 35 be<T>::set hits in 20 s without crashing static init. |
## Session 3 recommendation
Three concrete next steps in priority order:
**Step 1 — Hook raw pointer assignments inside `kernel/util/shim_utils.h`.** Per shim_utils.h, kernel-import handlers receive typed pointers (`X_HANDLE*`, etc.) and assign via `*ptr = value` raw assignment. `be<T>` field assignment in a POD struct does NOT go through `set()` because struct-level memcpy semantics skip the member init. Add a `XAUDIT_68_WRITE_FIELD(host_ptr, value)` macro to be invoked at known write sites OR (more invasive) instrument each `*ptr = ...` pattern. ~50-100 LOC additive.
**Step 2 — Add a memory-protection trap on guest VA `0xBCE25340` (4 bytes).** Use a guard page (`Memory::Protect` to read-only) and trap the host signal handler to log the writer's RIP/x86 instruction. This is the nuclear option — bypasses ALL emulation-layer hooks and catches the actual host store instruction. Requires platform-specific SIGSEGV/AEH handler integration. ~150-200 LOC platform-gated.
**Step 3 — Read-mode probe instead of write-mode.** Place a `RtlReadGuestU32(0xBCE25340)` probe at the FIRST iteration of canary's main loop AFTER memory init; log the VALUE at that address. If the value is `0` early then `0x8200A208` later, we know it's written between those moments. Combined with `--audit_61_branch_probe_pcs=0x825070F0` (which AUDIT-067 confirmed fires) and a binary-bisect over the boot trajectory.
Step 3 is cheapest (~20 LOC) and may pinpoint the install epoch without finding the writer; pair with bisection across the audit-068 event log.
## Cascade outcome
- A (vtable installer caught): **FAIL** — surfaces still incomplete, but space narrowed.
- B (voice-struct clearer caught): **FAIL** — addr range remains a guess.
- C (ours gap identified): **N/A** (A failed).
- D (Session 3 progression move): **N/A**.
- **E (Step 3 XEX-section validation)**: **PASS** — proves Session 1's #35 surface gap is at least partially closed.
- **F (be<T>::set hook works)**: **PASS**.
Net: 2 cascade wins (E, F) for "instrumentation is sound and now covers ~9 surfaces"; 2 cascade losses (A, B) for "the actual writer is in a path that's STILL un-hooked or doesn't exist as a canonical write at all".
## Artifacts (this dir)
- `instrumentation-design.md` (Session 1)
- `fix-canary.diff` (Session 1 — 5-file diff)
- `fix-canary-v2.diff` (Session 2 — extends with 4 more sites)
- `run1-vtable-writers.log` (Session 1 — 0 hits)
- `run2-voice-struct-writers.log` (Session 1 — 0 hits)
- `run3-vtable-extended.log` (Session 2 — 0 HOST-WRITE hits, INIT confirmed)
- `run4-voice-struct-extended.log` (Session 2 — 0 hits)
- `run5-xex-section-sanity.log` (Session 2 — **1 hit** validating Step 3)
- `sanity-value0.log` (Session 1 — 1,639 hits)
- `sanity-v2-value0.log` (Session 2 — 78,738 hits incl. 35 from be<T>::set)
- `writer-report.md` (Session 1)
- `writer-report-v2.md` (this file)
- `session-2-plan.md`