Files
xenia-rs/audit-runs/iterate-2D-peer-producer-trace/investigation.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

286 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.D — Peer-producer LR trace (investigation log)
**Date:** 2026-05-21
**Mode:** WRITE (investigation only; no engine source changes).
**Binaries:** `xenia-canary/.../xenia_canary_i2d.exe`, `xenia-rs/target/release/xrs-i2d`.
**Reuses:** canary `--audit_69_log_all_sets=true --audit_70_log_all_releases=true` (existing),
ours `--lr-trace` (existing); no new cvars added.
**LOC delta:** engine 0, canary 0.
## Step 0 — Existing artifact triage
`audit-runs/audit-069-wait-signal-producer/s5/canary-release-trace.log` (S5) holds only
NtReleaseSemaphore fires (414 events, single-handle restricted by configured handle list);
**NtSetEvent dimension was never captured**. Audit-069 S6 bridge wrote a 414-vs-99 first-N
release diff but did not extend coverage to NtSetEvent or to canary's wider tid family.
Therefore a fresh capture covering BOTH operations was required.
## Step 1 — Capture both engines
### Canary cold run
```
wine xenia_canary_i2d.exe "<iso>" --mute=true \
--audit_69_log_all_sets=true --audit_70_log_all_releases=true \
--log_file=canary-i2d.log
```
Wallclock 90 s (timeout). Result: **79,014 peer-producer events** (61,659 NtSetEvent/XEvent::Set
+ 17,355 NtReleaseSemaphore/xeKeReleaseSemaphore). 21 distinct guest tids.
### Ours cold run
```
xrs-i2d exec "<iso>" --quiet \
--lr-trace=0x8284DDDC,0x8284E49C,0x8284DF5C,0x8284E07C \
--lr-trace-out=ours-i2d-iat-trace.jsonl
```
The probe PCs are the IAT thunks for KeSetEvent / KeReleaseSemaphore / NtSetEvent /
NtReleaseSemaphore respectively — covering BOTH wrapper paths and direct callers (matches
canary's `audit_69`/`audit_70` C++ hook coverage). Wallclock 60 s (timeout). Result:
**153 peer-producer events** across 7 distinct guest tids (1, 2, 5, 6, 8, 9, 11, 13).
A first pass with `--lr-trace=0x824AA2F0,0x824AB158` (game's NtSetEvent/NtReleaseSemaphore
wrappers, sub_824AA2F0/sub_824AB158) yielded 150 events but **missed all KeReleaseSemaphore
direct calls** (which bypass the game wrappers via the audio subsystem's `sub_824D21F0` and
others). That trace is kept as `ours-i2d-lr-trace.jsonl` for cross-check; the IAT-thunk trace
`ours-i2d-iat-trace.jsonl` is the authoritative one used for alignment.
## Step 2 — Cross-engine alignment
### Handle namespaces
Canary handles `0xF8000XXX` (8-bit XAM-style); ours handles `0x10XX` (slot ids). Not directly
comparable by raw value — must use (op, lr-containing-fn) tuple.
### Documented tid map (per AUDIT-068/S6 bridge)
- canary tid=6 ↔ ours tid=1 (main)
- canary tid=10 ↔ ours tid=5 (worker)
- canary tid=17 ↔ ours tid=13 (cache thread)
Other tid pairs (audio threads tid=14 / tid=2 / tid=4 etc.) require entry-PC matching since
their IDs are not stably mapped.
### LR alignment (operation + lr-containing-fn)
Cross-engine matching key: `(op_category, lr_value)`. Since BOTH engines route through
identical game-side dispatch code, the same call site (lr) means the same source-level
producer.
| LR | Op | Function | Canary fires | Ours fires | Status |
|----|----|----------|---:|---:|--------|
| `0x824D229C` | release | sub_824D21F0 (audio dispatch) | 16,452 | **1** | UNDER (×16,452) |
| `0x824D2A44` | set | sub_824D29F0 (audio worker entry) | 16,452 | **0** | MISSING |
| `0x824D292C` | set | sub_824D2878 (audio worker entry-2) | 16,452 | **0** | MISSING |
| `0x824AA304` | set | sub_824AA2F0 (NtSetEvent wrapper, generic) | 15,765 | (multi) | MIXED |
| `0x82506C90` | set | sub_82506B08 (worker dispatch +0x188) | 2,378 | **0** | MISSING |
| `0x82508510` | set | sub_82508400 (+0x110) | 2,373 | **0** | MISSING |
| `0x82508524` | set | sub_82508400 (+0x124) | 2,373 | **0** | MISSING |
| `0x82506F9C` | set | sub_82506DE8 (worker dispatch) | 2,355 | **0** | MISSING |
| `0x82508358` | set | sub_825078D8 (worker dispatch +0xa80) | 2,350 | **0** | MISSING |
| `0x824AAFC8` | set | sub_824AAF50 (KeSetEvent wrapper) | 1,113 | **0** | MISSING |
| `0x824AB168` | release | sub_824AB158 (NtReleaseSemaphore wrapper internal) | 903 | 90 | UNDER (×10) |
| `0x82450D2C` | release | sub_82450B68 (worker self-release-2) | (multi via 0x824AB168) | 75 | match-class |
| `0x82450CE0` | release | sub_82450B68 (worker self-release-1) | (multi via 0x824AB168) | 7 | match-class |
| `0x82450314` | release | sub_82450218 | (multi via 0x824AB168) | 8 | match-class |
Total LRs missing in ours: **31 distinct call sites**, **61,659 canary fires** with **0 analogs
in ours**. Plus the matched-class but-under-firing audio release (1 vs 16,452, ratio 1/16,452).
## Step 3 — First divergent producer (chronological)
Time-ordered scan of canary's events (host_ns) against ours's events (per-tid cycle):
- **Canary events 014** (host_ns 4.8 µs → 162.6 ms): all from tids 6/10/0 — these have
exact ours analogs (tids 1/5/bootstrap), counts match approximately.
- **Canary event 15** at **host_ns=277,967,100** (278 ms): tid=14 fires
`xeKeReleaseSemaphore` on handle `0xF800006C` from lr=`0x824D229C` (sub_824D21F0+0xAC, audio
dispatch). This is the **FIRST canary peer-producer ours doesn't match in volume.**
### Attribution
`sub_824D21F0` is the audio subsystem's "post-process-then-release-semaphore" leaf, called from
3 sites:
- `sub_824D2878+0xA0` (audio worker entry A)
- `sub_824D2940+0x64` (audio worker entry B)
- `sub_824D29F0+0xC4` (audio worker dispatch loop body)
`sub_824D29F0` is the main audio worker fn: enters CS, fires `KeSetEvent`, calls
`KeWaitForMultipleObjects`, then either `sub_824D2108` or `sub_824D21F0` (the semaphore
release). The fn pointers for both audio worker entries (`sub_824D2878`/`sub_824D2940`) are
loaded by `sub_824D2C08` (audio subsystem init), which then calls `ExCreateThread` ×2 to
spawn them, followed by `KeSetBasePriorityThread` and `KeResumeThread`.
### Ours's actual audio thread behavior
Phase-A event log (`/tmp/ours-i2d-events.jsonl`, 118,149 events) shows:
- **host_ns=1,586,993,047** (1.587 s): `thread.create` entry_pc=`0x824d2878` (audio thread A), suspended
- **host_ns=1,587,001,117**: `ObReferenceObjectByHandle` (audio init handshake)
- **host_ns=1,587,011,797**: `KeSetBasePriorityThread`
- **host_ns=1,587,018,827**: `KeResumeThread` (audio thread A resumed)
- **host_ns=1,587,049,878**: `ExCreateThread` (audio thread B, entry_pc=`0x824d2940`)
- **host_ns=1,587,088,839**: `KeResumeThread` (audio thread B resumed)
- **host_ns=1,587,097,519**: ours **tid=10** (audio thread A) starts: `KeWaitForSingleObject`,
`KeRaiseIrqlToDpcLevel`, `KeAcquireSpinLockAtRaisedIrql`, `KeReleaseSpinLockFromRaisedIrql`,
`KfLowerIrql` (17 events total). Then **silent.**
- **host_ns=1,659,028,012**: ours **tid=11** (audio thread B) starts: `RtlEnterCriticalSection`,
`KeSetEvent`, `KeWaitForMultipleObjects` (11 events total). Then **silent.**
- **ours tid=9** (also audio family, ctx_ptr=0x828a3230 = audio static dispatcher) fires
`KeReleaseSemaphore` ONCE at cycle=631 from lr=`0x824d229c`, then **silent.**
**Conclusion**: ours audio threads are spawned, resumed, execute *one* iteration of their
work loop, then wedge in `KeWaitForMultipleObjects` (sub_824D29F0+0xA0) — the same wedge
shape as tid=13 (cache thread). Canary's audio thread iterates ~16,452× over the same
window; ours iterates ~1×. **The wedge is not localized to one thread — it is the same wedge
pattern recurring in every peer-producer thread family.**
### Timing skew
Canary boot timeline (audio subsystem):
- host_ns=4.8 µs: first NtReleaseSemaphore (tid=6 main bootstrap)
- host_ns=277.97 ms: audio worker (tid=14) first fires
Ours boot timeline:
- host_ns=5,4 ms: first NtReleaseSemaphore (tid=1 main, matched)
- host_ns=**1,586.99 ms** = 1.587 s: audio thread spawned (5.7× later than canary's first audio fire)
This 1.3-second delay implies **upstream init phase divergence** — ours's main thread is taking
significantly longer to reach the `sub_824D2C08` init call than canary does. The cause of this
upstream delay is likely the cumulative effect of every prior subsystem wedge: each
producer-consumer pair that wedges in ours costs time before the next subsystem can init.
## Step 4 — Outcome class
The plan defined three outcomes:
- **(A) Single missing producer, thread-spawn-dependent**: NO — ours DOES spawn the audio
threads (tid=9, tid=10, tid=11) successfully. The threads execute briefly then wedge.
- **(B) Single missing producer, state-divergence in an existing-but-divergent thread**: NO —
the divergence is not in *one* thread but in *all* peer-producer threads.
- **(C) Many distributed producers missing**: **YES.** 31 distinct LRs missing across at least
4 thread families:
- Audio workers (sub_824D2878, sub_824D2940, sub_824D29F0, sub_824D21F0 chain): ~50k fires
canary, 1 fire ours.
- Worker dispatch (sub_82506B08, sub_82508400, sub_82506DE8, sub_825078D8 family — the
AUDIT-049 "sub_825070F0" caller universe): ~10k fires canary, 0 ours.
- NtSetEvent wrapper-internal (lr=0x824AA304): 15,765 canary, 0 directly observed in ours
via this LR (ours uses the IAT path differently).
- Misc (sub_827E*, sub_82178*, sub_824D0*): smaller counts, mostly init paths.
Outcome class = **(C) Many distributed producers missing.**
### Structural interpretation
Every peer-producer thread family in ours executes its FIRST iteration normally (the
bootstrap), then stalls in its wait primitive. The wait primitives are different across
families (different events, different semaphores), but the **pattern is identical**:
```
ThreadFamily X enters loop
→ does bootstrap-once setup (1 release/set fires)
→ enters KeWaitForMultipleObjects or KeWaitForSingleObject
→ blocks forever because the producer for ITS wait event is itself another wedged thread
```
This is a **multi-producer ladder collapse**: each thread depends on a peer (in the same OR
different family) for its wake-up signal; that peer is also wedged on a dependency; etc. The
graph is not strictly circular (each thread's specific wait may differ) but the topology means
**no thread can advance because every thread's wake-source is also blocked.**
This subsumes the AUDIT-049 / AUDIT-069 framings into a unified picture: the wedge family
includes at minimum {tid=10 audio-A, tid=11 audio-B, tid=9 audio-aux, tid=13 cache, all four
sub_825070F0 worker spawnees}, all wedged on different wait sites, all unable to wake each
other.
## Step 5 — Recommended next iterate
Given outcome class (C), single-keystone iterates (2.B branch-probe, 2.C arg/return capture,
single-thread wedge investigation) will not unlock the whole ladder — each one would unblock
ONE thread, only to find it blocks again on the next un-signaled event.
The plan's recommended pivot from outcome (C) is: **"may need a fundamentally different
methodology"**.
### Option (1): Critical-path sweep (~400-600 LOC over multiple sessions)
Identify which thread families' first-iteration produces signals consumed by another thread
family, build a dependency DAG, then trace each edge's first-divergence in ours. Many of these
edges may converge on a small number of "root cause" missed signals further upstream
(e.g., a single missing signal in init code that cascades).
### Option (2): Boot-time delta replay (~100 LOC investigation)
Compare canary's 0-278 ms event sequence (1,221 events before first audio fire) against ours's
0-278 ms event sequence. There's an upstream timing skew (canary boots audio at 278 ms, ours
at 1.587 s — 5.7× slower). The CAUSE of the slow init is upstream-of-audio and may be a
single fixable wedge in the init path.
**Recommended: Option (2)** — cheaper, more focused. Diff the first ~1200 phase-a events of
each engine to find the FIRST kernel-import-call divergence in early bootstrap. This may
identify a single missing wake-signal in the early init flow that cascades to delay every
subsequent subsystem.
### Option (3): Audio-specific micro-investigation (~50 LOC, narrower)
The single canary audio fire (lr=`0x824d229c` count=1 in ours) shows ours's audio thread DOES
reach the release site once. Find what event/semaphore canary signals between iterations 1
and 2 that ours doesn't. This is a narrower (B-shaped) sub-investigation that doesn't unblock
the full ladder but adds disambiguation between "thread didn't spawn", "first iter wedged",
and "subsequent iter wedged".
## Tripstones honored
- **#28 (per-engine tid stability)**: explicitly used (op, lr-containing-fn) tuple, not raw
tid. Confirmed via `entry_pc` matching in phase-a thread.create events.
- **#32 (canary jitter)**: no relevant variance — both engines bit-equivalent on first 15
events (host_ns and counts match). Divergence starts at canary event 15 (audio fire).
- **#37 (vtable base vs slot-N)**: not encountered.
- **#39 (composite progression metric)**: not moved this iterate; investigation-only.
- **#40 (single-keystone framing)**: explicitly broken by outcome (C). The "find THE missing
producer" framing of S5/S6/2.A is **falsified** at the structural level — there are 31, not
1.
## Cascade
- A (acquire both engines' producer traces): **PASS HIGH** (canary 79,014 events, ours 153)
- B (align sequences): **PASS HIGH** (LR-keyed alignment; clear bit-equivalence on first 15
canary events).
- C (identify first divergent producer): **PASS HIGH** — canary event 15 at host_ns=278ms,
tid=14, lr=0x824D229C, sub_824D21F0 (audio dispatch).
- D (attribute cause): **PASS MEDIUM** — distributed wedge ladder, not single-thread blockage.
- E (outcome class named): **PASS HIGH** — Class (C), 31 missing LRs, 4+ thread families.
5 PASS / 0 FAIL.
## Artifacts
All under `xenia-rs/audit-runs/iterate-2D-peer-producer-trace/`:
- `canary-i2d.log` (10.7 MB, 79,014 peer-producer events from canary's audit_69+audit_70).
- `canary-i2d.stdout` / `.stderr`: canary run logs.
- `canary-peer-producers.jsonl`: parsed structured form of canary events (79,014 records).
- `ours-i2d-lr-trace.jsonl`: ours first pass at game wrapper PCs (150 events; missing
KeReleaseSemaphore direct path — kept for cross-check only).
- `ours-i2d-iat-trace.jsonl`: ours authoritative trace at IAT thunks (153 events,
comprehensive Ke+Nt coverage).
- `ours-i2d.stdout` / `.stderr`: ours run logs.
- `aligned-sequence.csv`: chronological per-engine sequence (first 200 canary + all ours).
- `investigation.md`: this document.
- `report.md`: short outcome summary.
## Discipline
- xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED.
- Binaries `xenia_canary_i2d.exe` and `xrs-i2d` are renamed copies; originals untouched.
- Canary cache backed up to `/tmp/canary-cache-bak-iter2D` at session start; verified unchanged
at session end (`diff -rq` returns empty).
- `--mute=true` honored on canary run.
- Investigation-only; no engine source changes.
LOC delta: 0 engine, 0 canary.