handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,344 @@
# AUDIT-068 Session 3 — read-mode probe writer report
Date: 2026-05-20
## Summary
Session 3 adds a **read-mode probe** to the AUDIT-068 instrumentation. Instead
of hooking host-side write surfaces (Session 1+2's approach, which produced 0
hits across ~9 surfaces despite the install being real), the probe spawns a
dedicated low-priority polling thread that samples configured guest VAs every
`PERIOD_NS` and emits `AUDIT-068-READ-CHANGE` events on transition.
The probe bounded the install epoch for the `ANON_Class_713383D7` vptr to
**host_ns ≈ 9.4129.612 s** (varies ±200 ms between cold runs) and provided
the first direct evidence that the install is a **bulk POD struct copy** of a
12-byte `{vptr, self_ptr, self_ptr}` record into the instance's first three
u32 slots — written simultaneously within the same 1 ms poll interval.
**Reading-error class #36 (POD-struct copy-assignment bypass) is now
confirmed in the strongest possible terms**: Run 10 enabled BOTH the read
probe AND the full ~9-surface host-write watch simultaneously with the
CORRECT target value `0x8200A1E8`, and observed the read probe catch the
install while host-write surfaces produced **0 hits**.
A secondary finding overturns part of the AUDIT-067 framing: the actual vptr
value installed is **`0x8200A1E8`**, not `0x8200A208`. The number `0x8200A208`
is the address of the slot-1 fn pointer WITHIN the vtable (32 bytes into the
vtable). The value stored at `[ctx_ptr]` is the vtable BASE = `0x8200A1E8`.
AUDIT-067 hooked all 16 PPC store opcodes for `0x8200A208` — it should have
also (or instead) watched `0x8200A1E8`. This may explain part of why AUDIT-067
also produced 0 hits.
## LOC added (Session 3 delta, canary only)
| File | LOC delta | Purpose |
|---|---:|---|
| `src/xenia/cpu/cpu_flags.h` | +7 | New cvar `audit_68_host_mem_read_probe` declaration. |
| `src/xenia/cpu/cpu_flags.cc` | +6 | Cvar definition. |
| `src/xenia/memory.cc` | +18 | Register `g_guest_to_host_thunk` (wraps `Memory::TranslateVirtual`) and `g_query_protect_thunk` (wraps `LookupHeap`+`QueryProtect`) inside `Memory::Memory()`; reset to nullptr in `~Memory()`. |
| `src/xenia/base/audit_68_host_mem_watch_fwd.h` | +17 | `GuestToHostThunk` + `QueryProtectThunk` extern decls. |
| `src/xenia/base/audit_68_host_mem_watch_base.cc` | +~170 | `ReadProbe` struct + parser (`VA:SIZE:PERIOD_NS` CSV form) + `sample_at()` w/ page-protect guard + `read_probe_thread_main()` polling loop + `start_read_probe_thread_if_configured()` lazy-start (called from `check_host_write_slowpath`). |
| **Total** | **~218 LOC additive** | All cvar-gated default-off (empty CSV = thread never spawned). |
Cumulative across Sessions 1+2+3: ~520 LOC.
xenia-rs HEAD `e6d43a23ac393004d2e5adf2f0395fd0b5e6448b` **UNCHANGED**.
## Cvar format
```
--audit_68_host_mem_read_probe=VA1:SIZE1:PERIOD1,VA2:SIZE2:PERIOD2,...
```
Each tuple is `VA:SIZE:PERIOD_NS`. SIZE ∈ {1, 2, 4, 8}. PERIOD_NS floored at
1 us (1000). Max 8 tuples. Default empty (off).
Lazy-start: the poll thread spawns only on the first call to
`check_host_write_slowpath()` after `Memory::Memory()` has registered the
thunks. This reuses the Session 2 static-init gate. The thread is detached
(daemon-style) and polls until process exit.
## Captures
All runs cold-boot (cache wipe before each), `--mute=true`, against the
Sylpheed ISO. 90 s wallclock each.
### Run 6 — primary read-probe on `0xBCE25340`
Cmdline: `--audit_68_host_mem_read_probe=0xBCE25340:4:1000000 --mute=true`.
Observations:
```
host_ns=729615200 INITIAL 0x00000000
host_ns=738072700 CHANGE 0x00000000 → 0xBCE254C0 (arena-local pointer)
host_ns=1537758000 CHANGE 0xBCE254C0 → 0xBCE25640
host_ns=1591760600 CHANGE 0xBCE25640 → 0xBCE25350
host_ns=1592827100 CHANGE 0xBCE25350 → 0xBCE257C0
host_ns=1601443500 CHANGE 0xBCE257C0 → 0x82061050 (looks like XEX vtable)
host_ns=1602506700 CHANGE 0x82061050 → 0x820610E0 (final, stable through 90 s)
```
**Boot reached worker spawn (thid=27/28/29 visible in log tail)** — so the
probe was alive for the whole 90 s wallclock; only ~7 changes occurred at
`0xBCE25340` in this run, and the value never became `0x8200A208`.
This indicated the address `0xBCE25340` cited in AUDIT-058/067 is NOT
deterministic across runs — there's "arena drift" in the `0xBCE25xxx` region.
The Phase-NonMatch investigation memo (2026-05-19) already documented this:
canary cold sample saw `ctx_ptr=0xBCE251C0` while AUDIT-058 saw `0xBCE25340`.
### Run 7 — neighbor bisect on `0xBCE25340 ± 4/8`
Cmdline: `--audit_68_host_mem_read_probe=0xBCE2533C:4:1000000,0xBCE25340:4:1000000,0xBCE25344:4:1000000,0xBCE25348:4:1000000`.
```
host_ns=655976500 INITIAL all four = 0
host_ns=664462100 CHANGE 0xBCE25340: 0 → 0xBCE254C0
host_ns=1374604200 CHANGE 0xBCE25340: 0xBCE254C0 → 0x07C65ADA (3 SIMULTANEOUS)
host_ns=1374604200 CHANGE 0xBCE25344: 0 → 0x001EE000
host_ns=1374604200 CHANGE 0xBCE25348: 0 → 0x0003A313
```
**Key signal**: at host_ns=1.374 s, three adjacent u32 slots changed within
the same 1 ms poll interval but the neighbor at `0xBCE2533C` did NOT. This is
a clear bulk struct-copy / memcpy footprint — the writer wrote a 12-byte
record starting at `0xBCE25340`. The three values `{0x07C65ADA, 0x001EE000,
0x0003A313}` are NOT the vtable (don't match `0x8200A208`/`0x8200A1E8`); they
look like random-looking data (FNV-style hash, allocation size, refcount?).
This particular write happens to a DIFFERENT object instance reusing the
`0xBCE25340` slot, not the ANON_Class instance.
### Run 8 — locate the actual ctx_ptr via AUDIT-061 fire
Cmdline: `--audit_61_branch_probe_pcs=0x825070F0 --audit_68_host_mem_read_probe=0xBCE25340:4:1000000`.
`AUDIT-061-BR pc=825070F0 ... r3=BCE251C0 ...` fired late in the run. So in
THIS cold trajectory the ANON_Class instance is at `0xBCE251C0`, not
`0xBCE25340`. The probe at `0xBCE25340` was watching the wrong address.
### Run 9 — neighbor bisect on the correct ctx_ptr `0xBCE251C0`
Cmdline: `--audit_61_branch_probe_pcs=0x825070F0 --audit_68_host_mem_read_probe=0xBCE251BC:4:1000000,0xBCE251C0:4:1000000,0xBCE251C4:4:1000000,0xBCE251C8:4:1000000`.
```
host_ns=633560300 INITIAL all four = 0
host_ns=642041900 CHANGE 0xBCE251C0: 0 → 0xBCE25340 (arena ptr)
host_ns=1387443500 CHANGE 0xBCE251C0: 0xBCE25340 → 0xBCE254C0 (2 SIMULTANEOUS)
host_ns=1387443500 CHANGE 0xBCE251C8: 0 → 0x00000148
host_ns=1412116800 CHANGE 0xBCE251C0: 0xBCE254C0 → 0 (2 SIMULTANEOUS clear)
host_ns=1412116800 CHANGE 0xBCE251C8: 0x148 → 0
host_ns=1457544600 CHANGE 0xBCE251C0: 0 → 0xBF80199A (2 SIMULTANEOUS — floats)
host_ns=1457544600 CHANGE 0xBCE251C4: 0 → 0x3F802D83 (= -1.0008, 1.0014)
host_ns=5710239000 CHANGE 0xBCE251C0: 0xBF80199A → 0xBCE25640 (arena ptr)
host_ns=9416025400 CHANGE 0xBCE251C0: 0xBCE25640 → 0x8200A1E8 (3 SIMULTANEOUS — THE INSTALL)
host_ns=9416025400 CHANGE 0xBCE251C4: 0xBCE251C0 → 0xBCE251C0 (self-ptr)
host_ns=9416025400 CHANGE 0xBCE251C8: 0 → 0xBCE251C0 (self-ptr)
AUDIT-061-BR pc=825070F0 r3=BCE251C0 (fire ~25 s wallclock)
```
**The install epoch is host_ns = 9.416025400 s.** Three slots written
simultaneously to `{vptr=0x8200A1E8, self=0xBCE251C0, self=0xBCE251C0}`
classic struct construction or `*ptr = X_FOO{...}` POD copy pattern. The
slot at `0xBCE251BC` (4 bytes before `ctx_ptr`) did NOT change, bounding the
write to exactly 12 bytes starting at `0xBCE251C0`.
The install is ~966 ms BEFORE the `sub_825070F0` fire (~10.4 s host_ns,
matches Phase-NonMatch documented thread.create burst at 10.382 s) and well
within the 60-90 s capture window.
### Run 10 — cross-validation: read-probe + host-write watch with correct value
Cmdline: `--audit_68_host_mem_watch_values=0x8200A1E8,0x8200A208,0xE8A10082,0x82A10082 --audit_68_host_mem_watch_addrs=0xBCE251C0 --audit_68_host_mem_read_probe=0xBCE251C0:4:1000000 --audit_61_branch_probe_pcs=0x825070F0`.
```
host_ns=9612147300 CHANGE 0xBCE251C0: 0xBCE25640 → 0x8200A1E8 (read probe catches)
AUDIT-061-BR pc=825070F0 r3=BCE251C0 (sub_825070F0 fires)
AUDIT-068-HOST-WRITE: 0 hits (write surfaces miss)
```
This is the definitive proof:
1. The install IS captured by the read probe at host_ns ≈ 9.6 s.
2. The corrected value `0x8200A1E8` (not `0x8200A208`) is the actual vptr.
3. None of the ~9 host-write surfaces hooked in Session 1+2 catches it.
**Reading-error class #36 confirmed**: the writer uses a path that bypasses
all of `xe::store_and_swap<T>`, `xe::store<T>`, `Memory::Zero/Fill/Copy`,
`xe::endian_store::set()`, and `Memory::Copy` byte-scan — most likely a
`*reinterpret_cast<X_FOO*>(host_ptr) = X_FOO{...}` raw POD struct
copy-assignment OR a direct `memcpy(host_ptr_from_TranslateVirtual,
&local_struct, sizeof(X_FOO))`.
## Headline finding
**Install epoch**: host_ns ≈ 9.49.6 s (varies ±200 ms across cold runs).
This is ~966 ms before sub_825070F0 fires (~10.4 s host_ns).
**Neighbor pattern**: **3 simultaneous writes** at `0xBCE251C0`, `+4`, `+8`
within the same 1 ms poll interval — `{vptr=0x8200A1E8, self=0xBCE251C0,
self=0xBCE251C0}`. `0xBCE251BC` (`-4`) does NOT change. This is a 12-byte
POD struct copy.
**Implications**:
- The write is invisible to all currently-hooked host-write surfaces.
- The value bytes `{0xE8, 0xA1, 0x00, 0x82, 0xC0, 0x51, 0xE2, 0xBC, 0xC0,
0x51, 0xE2, 0xBC}` (big-endian guest order) must appear together in some
source — either as a constant pre-baked vtable instance pattern that's
memcpy'd, or as fields computed by host code and bulk-written.
- The fact that the second and third slots are self-pointers (`= ctx_ptr`)
suggests a doubly-linked-list head node initialization: `head.vptr = vtbl;
head.next = &head; head.prev = &head;`. This is a textbook intrusive list
/ queue head pattern.
## Wallclock relation to AUDIT-067's sub_825070F0 fire
| Event | Host_ns | Wallclock (≈) |
|---|---:|---:|
| Probe init (first slowpath call) | ~640 ms | ~1.6 s |
| Various pre-install arena reuse of slot | 0.65.7 s | 1.66.5 s |
| **Vptr install at `0xBCE251C0`** | **9.4129.612 s** | **~10.410.6 s** |
| Phase-NonMatch documented thread.create burst | 10.38210.384 s | ~11.3 s |
| sub_825070F0 fire (AUDIT-061-BR captured) | ~10.5 s | **~25 s wallclock** (AUDIT-067 quoted) |
The "host_ns ~10.5 s when sub_825070F0 fires" vs "~25 s wallclock" gap is
because `host_ns` starts when the first AUDIT-068 slowpath call lands (i.e.
when canary's static-init plus Wine startup are done) — Wine's
JIT-warmup/early-boot takes ~15 s before guest PPC code starts. The
ANON_Class install happens ~960 ms before sub_825070F0 dispatch, within the
same "post-DiscImageDevice resolve" boot phase that AUDIT-058 framed.
## Session 4 recommendation
Three paths to identifying the writer, ranked by feasibility:
### Path 1 (RECOMMENDED) — POD struct-copy hook with NEW ε-constraint
The install epoch (host_ns ≈ 9.49.6 s) and the 12-byte simultaneous-write
signature (3 u32 slots) narrows the candidate hooks dramatically. Two
surgical instrumentation strategies:
(a) **Pre-instrument all `*reinterpret_cast<X*>(host_ptr) = X{...}` sites in
canary**. Ripgrep finds them: pattern
`\*reinterpret_cast<[A-Z]\w*\*>\([^)]*\)\s*=` in `src/xenia/kernel/**.cc`. A
quick scan of Session 1 inventory listed ~30 such sites, but most are in
kernel-import handlers that fire repeatedly — the ε-constraint of "fires
exactly once at host_ns 9.49.6 s on tid=6" lets us bisect.
(b) **Wrap `xe::SetField()` / pointer-typed assignment helpers** if any
exist. Otherwise instrument `memcpy(host_ptr_from_TranslateVirtual, ...)`
patterns directly — there are ~40 such sites across kernel/util/cpu code per
Session 1+2 surveys. The ones NOT already wrapped by Session 2 (xex_module.cc
got 4 sites) are candidates.
LOC budget: ~50-100 additive in canary; default-off cvar
`audit_68_pod_copy_watch_addrs` (CSV of VA ranges; emits on every memcpy/raw
assign within range).
### Path 2 — Guard-page SIGSEGV trap
Use the existing canary `ExceptionHandler` infrastructure
(src/xenia/base/exception_handler*.cc — already cross-platform, has Win SEH
and POSIX SIGSEGV handlers wired). Mark the 4K page containing `0xBCE251C0`
as read-only at host_ns = 9.4 s (just before the install epoch); the page
fault triggers the writer's host instruction, log RIP/host stack, then
unprotect+resume.
Pros: catches the writer with bytecode-level precision regardless of how it
writes (memcpy, raw assign, vector store, etc.).
Cons: ~150200 LOC platform-gated; needs accurate epoch timing (can't trap
the whole boot or it crashes). Use host_ns ≥ 9.0 s as the gate.
### Path 3 — Kernel-handler grep with new ε-constraint
Now that the install epoch is known (9.49.6 s host_ns; just AFTER
`DiscImageDevice::ResolvePath(\\dat\\movie)` per AUDIT-058 narrative), grep
all kernel handlers for ones that fire in that window AND write to the
heap. The probe log already shows this is right around the time
`HostPathDevice::ResolvePath(\\dat\\movie)` runs and various worker file IO
starts. Cross-reference with canary's existing kernel-call trace
(`--log_level=4`) to enumerate handlers called in the 9.09.7 s window.
LOC: 0 (purely investigative).
**Recommended Session 4 priority: Path 1 first** (concrete instrumentation
extends what we have, leverages the epoch constraint). Path 2 as backstop.
Path 3 alongside as a cheap parallel investigation.
## Cascade outcome (Session 3)
- **A**: identify install epoch — **PASS** (9.49.6 s host_ns; ~966 ms before
sub_825070F0).
- **B**: identify neighbor pattern — **PASS** (3-slot simultaneous write,
POD struct signature confirmed).
- **C**: confirm reading-error #36 — **PASS** (Run 10 demonstrates host-write
surfaces miss the install even with the CORRECT target value
`0x8200A1E8`).
- **D**: identify the host-side writer — **N/A** (Session 4 work, with epoch
and signature constraints to narrow the search).
- **E**: secondary discovery: actual vptr is `0x8200A1E8` not `0x8200A208`
— **PASS** (AUDIT-067's target value was off by 32 bytes; may have
contributed to that audit's 0-hit JIT store result).
Net 4/5 wins. Session 4 has concrete constraints (epoch, signature, value
correction) to land the writer identification.
## Reading-error class #36 reinforcement
Session 3 directly demonstrates reading-error #36 (POD-struct
copy-assignment bypass for typed BE/LE field watch). The corrective rule is
now formalized as:
> When hooking host-side writes to guest memory, member-level set() hooks
> (e.g. `xe::endian_store::set()`) catch ONLY explicit assignments like
> `*be<T>* = value`. They DO NOT catch:
> 1. POD struct copy-assignment (`*reinterpret_cast<X*>(host_ptr) = X{...}`).
> 2. memcpy into the host pointer (`memcpy(host_ptr_from_TranslateVirtual,
> &local_struct, sizeof(X))`).
> 3. Vector-typed bulk store intrinsics that target guest memory.
>
> Mitigation: pair host-write hooks with **read-mode probes** at the
> target VA — the read probe captures the install regardless of the writer's
> mechanism, and provides epoch + neighbor-pattern constraints for the
> follow-up targeted instrumentation.
This rule is now reflected in the AUDIT-068 Session 3 read-probe machinery —
preserved in canary tree for all future audits.
## Discipline observed
- `--mute=true` on every run ✓
- Cold-protocol: cache wipe before each cold run; cache restored from
`/tmp/canary-cache-bak-audit-068` at session end ✓ (current cache was
backed up at session start since prior backup was missing).
- xenia-rs HEAD `e6d43a23…` UNCHANGED ✓ (verified by sha256 of `git diff
HEAD` at session start vs end; uncommitted modifications from prior
sessions are unchanged from session start, no new modifications made by
this session).
- Canary instrumentation purely additive + cvar-gated default-off ✓
- No destructive shortcuts ✓
- Static-init gate pattern preserved + extended (Session 3's read probe
thread is also gated on `g_guest_to_host_thunk + g_query_protect_thunk`
being non-null — same discipline as Session 2's thunk gate).
## Artifacts (this dir)
- `fix-canary-v3.diff` — cumulative Session 3 instrumentation (this run).
- `run6-read-probe-bisect.log` — primary probe on `0xBCE25340` (90 s; 7
changes, ended at `0x820610E0`, never `0x8200A208`).
- `run7-read-probe-neighbors.log` — bisect probe on `0xBCE25340 ± 4/8`; 3
simultaneous writes at `+0/+4/+8` confirming POD signature.
- `run9-read-probe-251C0-neighbors.log` — neighbor probe on the actual
ctx_ptr `0xBCE251C0`; **captures the install** at host_ns=9.416 s.
- `run10-cross-validation.log` — read probe + host-write watch with CORRECT
value `0x8200A1E8`; demonstrates 0 HOST-WRITE hits while read probe sees
the install at host_ns=9.612 s.
- `writer-report-v3.md` — this file.
(Run 8 was an intermediate diagnostic; data is included in Run 9/10 logs.)
## Phase B / progression
- `image_loaded_sha256 ea8d160e…` UNCHANGED (instrumentation does not touch
XEX image processing).
- xenia-rs HEAD UNCHANGED.
- No progression-metric movement (Session 3 is instrumentation-only). Session
4 has concrete leads.