handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes

Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
MechaCat02
2026-06-05 07:19:08 +02:00
parent acd1656753
commit ef93a4fa14
620 changed files with 108303 additions and 1 deletions

View File

@@ -0,0 +1,284 @@
# Iterate 2.A — Branch-probe of sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0
**Date:** 2026-05-21
**Mode:** WRITE (investigation + branch-probe configuration; no engine LOC change).
**Sources:** `xenia-canary/.../xenia_canary_i2a.exe`, `xenia-rs/target/release/xrs-i2a`,
`xenia-rs/sylpheed.db`, prior Step 2 report at `audit-runs/review-a-step2-natural-trigger/step2-report.md`.
## TL;DR
The plan's framing — "find the conditional branch inside `sub_821CB030 → … → sub_821C4EB0`
where canary takes the `bl NtSetEvent` arm and ours takes the `bl NtReleaseSemaphore` arm" — **does
not match the actual control-flow lattice**. Across the entire call-graph reachable from those
four functions (depth ≤ 6, 736 functions scanned, each scanned for conditional branches whose
two arms reach the two wrappers within ≤ 4 BBs), **exactly one candidate branch exists**:
PC=`0x82452E0C` inside `sub_82452DC0`, with
- taken-eq (r3==0) arm → reaches `bl 0x824AB158` (NtReleaseSemaphore) via `sub_8245B000` /
`sub_82450218`;
- not-taken (r3!=0) arm → reaches `bl 0x824AA2F0` (NtSetEvent) via `sub_82452AB8` /
`sub_8245FEB8`.
The branch-probe (canary + ours) was run at that PC plus six context PCs. **Canary's branch at
0x82452E0C ALWAYS takes the eq arm** (r3=0, 44 fires across tids 6/17/18/26, never the
NtSetEvent arm). Ours's JIT only emits BB-entry PCs in the probe, so 0x82452E0C did not fire
directly, but `sub_82452DC0` recursion arrived via `lr=0x82452E64` (the recursive call site at
0x82452E60 inside the eq-taken arm) on ours tid=13 once and on tid=1 multiple times — confirming
both engines also take the eq arm at 0x82452E0C in their executions.
**The candidate branch is NOT divergent at runtime.** The Step 2 framing of "NtSetEvent vs
NtReleaseSemaphore is an if/else inside this chain" is **falsified at the source level**: those
two operations live on disjoint call-graph paths, NOT alternate arms of a same branch.
The actual divergence is **loop iteration count**, not branch direction:
- canary tid=17 calls `sub_82452DC0` (and thus NtReleaseSemaphore via `sub_82450218`) multiple
times across its 154 ms lifetime, then upstream `sub_821CBA08` calls `sub_82453910` AFTER
`sub_821CB030` returns — that's where NtSetEvent at canary idx=347 originates.
- ours tid=13 calls `sub_82452DC0` ONCE (fires once at cycle=7963 from lr=0x821CB1D0), executes
the eq-arm path, fires NtReleaseSemaphore at 0x82452F8C, then wedges in the NtWait at
`sub_821CB030+0x1AC` (0x821CB1DC) before `sub_821CB030` can return to `sub_821CBA08` and call
`sub_82453910`.
Recommended next iterate: **2.B (NtQueryFullAttributesFile arg/return capture)** or
**2.C (ctx-field read-probe)** to identify the upstream state that gates whether the wedge wait
ever gets signaled. The wedge itself was already correctly identified in AUDIT-069 S5: ours has
1 "other producer" vs canary's 25; the missing 24 producers are not present because their guest
state is downstream of the same tid=13 wedge (circular). The fix path traces to *what signals
event `d5e23609d3948568`* in canary that doesn't in ours.
## Step 1 — Candidate branch enumeration
### Initial pass (target fns only, depth 4)
Conditional branches inside `sub_821CB030`, `sub_821CBA08`, `sub_821CC3F8`, `sub_821C4EB0`
where taken-arm first call reaches NtSetEvent wrapper (`0x824AA2F0`) AND not-taken-arm reaches
NtReleaseSemaphore wrapper (`0x824AB158`), or vice versa, within ≤ 4 BBs per arm.
**Result: 0 candidate branches.**
This is the Step 1 pivot trigger from the plan — broaden search.
### Broadened pass (call-graph depth 6, arm reach depth 4)
Reachable function set: 736 functions.
**Result: 1 candidate branch.**
| PC | Function | Branch type | Taken arm first call | Not-taken arm first call | Set-event reach | Release-sem reach |
|---|---|---|---|---|---|---|
| `0x82452E0C` | `sub_82452DC0` | bc 12,4*cr6+eq | `0x8245B000` (eq) | `0x82452AB8` (ne) | not-taken | taken |
Disassembly context:
```
0x82452DF8 bl sub_82452200 ; r3 = sub_82452200(...)
0x82452DFC addis r11, r0, 0x41FF
0x82452E00 addi r28, r0, 0
0x82452E04 ori r23, r11, 0xFFFD
0x82452E08 cmpli cr6, 0, r3, 0x0
0x82452E0C bc 12, 4*cr6+eq, 0x82452E1C ; if r3==0 → eq-arm to 0x82452E1C
0x82452E10 or r24, r28, r28
0x82452E14 or r29, r3, r3
0x82452E18 b 0x82452E88 ; not-eq → NtSetEvent path
0x82452E1C ... ; eq → NtReleaseSemaphore path
```
CSV saved at `candidate-branches.csv`.
## Step 2 — Branch-probe both engines (cold boot)
Probe PCs: `0x82452E0C, 0x821CB1DC, 0x82452F10, 0x82452F8C, 0x82453910, 0x824539A4, 0x82452DC0`
Canary command (cold, 120 s wallclock):
```
wine xenia_canary_i2a.exe ".../Project Sylpheed (...).iso" \
--mute=true \
--audit_61_branch_probe_pcs="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
```
Ours command (cold, n=50M instructions):
```
xrs-i2a exec ".../Project Sylpheed (...).iso" -n 50000000 \
--branch-probe="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
```
### Canary fires (109 total)
Per-PC per-tid:
| PC | tid=6 | tid=17 | tid=18 | tid=26 |
|---|---|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 16 | 15 | 2 |
| 0x82452E0C (branch) | 11 | 16 | 15 | 2 |
| 0x82452F10 (NtReleaseSem #1) | 1 | 8 | 5 | 1 |
| 0x82452F8C (NtReleaseSem #2) | 4 | 0 | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 1 | 0 | 1 |
| 0x82453910 (NtSetEvent helper entry) | 0 | 0 | 0 | 0 |
| 0x824539A4 (the branch INSIDE sub_82453910) | 0 | 0 | 0 | 0 |
Branch decision at 0x82452E0C (44 fires across all tids): **r3=0x00000000, cr6=..E (equal) 100%**.
Notable absences in canary's probe: `sub_82453910` entry never fires. The NtSetEvent at canary
tid=17 idx=347 in Step 2's timeline must therefore enter `sub_82453910` via a path NOT in our
probe set — meaning the canary tid=17 NtSetEvent at idx=347 is NOT from this iteration's
upstream chain at all; it's likely from a DIFFERENT call site under a different parent function.
### Ours fires (13 total BRANCH-PROBE lines)
| PC | tid=1 | tid=13 |
|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 2 |
| 0x82452E0C | 0 | 0 |
| 0x82452F10 | 0 | 0 |
| 0x82452F8C | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 0 |
| 0x82453910 | 0 | 0 |
| 0x824539A4 | 0 | 0 |
**Ours's JIT only updates `ctx.pc` at BB boundaries, so interior PCs do not fire in
`fire_branch_probe_if_match` even when they should mathematically.** Only the function-entry
PC 0x82452DC0 (a JIT lookup target) fires.
However, **two of ours tid=13's `sub_82452DC0` entries have `lr=0x82452E64`** (return address
from the recursive `bl 0x82452DC0` at 0x82452E60), which is INSIDE the eq-taken arm of
0x82452E0C. This confirms ours's tid=13 entry into sub_82452DC0 from sub_821CB030 (cycle=7963,
lr=0x821CB1D0) then recursively re-entered (cycle=20030, lr=0x82452E64) — meaning ours took
the **same eq-arm direction** at 0x82452E0C that canary took.
## Step 3 — First divergent branch — NOT FOUND
The single candidate branch is **not divergent** at runtime. Both engines select the eq-arm
(r3=0 returned by `sub_82452200`) at 0x82452E0C in their first traversal.
This is the "broaden the search" pivot from the plan (`If 0, ... OR call-resolution heuristic
missed it`). The broadened search to depth 6 found 1 candidate, but that candidate is not
runtime-divergent.
## Step 4 / 5 — Re-attributing the divergence
The Step 2 report framed canary's `NtSetEvent` (idx=347) vs ours's `NtReleaseSemaphore`
(idx=429) as alternate arms of the same branch inside `sub_821CB030`'s chain. Re-analysis of
the source disasm + branch-probe data shows this is **incorrect**:
1. The only function on the chain reaching NtReleaseSemaphore is `sub_82450218` (called from
`sub_82452DC0` at 0x82452F10 and 0x82452F8C). Ours fires this once on tid=13 in iteration 1.
2. The only fns on the chain reaching NtSetEvent are `sub_8245FEB8`, `sub_82453910`,
`sub_82458A70`, `sub_8245D9D8` (from the reach analysis). Of these, `sub_82453910` is
directly called from `sub_821CBA08` at 0x821CBBF0 — but AFTER `sub_821CB030` returns. Ours
never reaches that line because `sub_821CB030` wedges on its NtWait at +0x1AC.
3. The canary tid=17 NtSetEvent at idx=347 is NOT from `sub_82453910` (whose entry probe at
0x82453910 fired 0× in our canary probe). It must be from one of the other 4 NtSetEvent
callers in the reach set, or from a `sub_82453910` *not on tid=17 at the moment of idx=347*
(idx is global, tid attribution requires per-event check — handled in the Step 2 csv).
The real divergence is **loop iteration count** of `sub_82452DC0` and its upstream caller
`sub_821CB030` / `sub_821CBA08`. Each iteration of canary tid=17's body calls
`bl 0x82452F8C → bl 0x824AB158` (NtReleaseSemaphore) and then waits at sub_821CB030+0x1AC,
which RETURNS quickly because a peer thread has already signaled. Ours's iteration-1 wait at
that PC never returns because the corresponding signaler never fires.
## Cause attribution
Per the plan's Step 5 framework, attribute to one of 3 candidate causes:
1. **NtQueryFullAttributesFile**: NOT directly evidenced by this iterate. The probe didn't
capture file-attribute returns.
2. **Shared CS-protected ctx field set by another tid**: STILL UNTESTED. ours's tid=13 wait
on event `d5e23609d3948568` depends on another tid signaling it. AUDIT-069 S5's
"25 producers vs 1" finding confirms ours has 24 missing peer-producers — meaning peer
tids in ours aren't reaching the signal call sites.
3. **Vtable**: NOT directly evidenced.
4. **Loop-count circular wedge (NEW)**: ours tid=13 wedges on first wait because peer
producers (themselves blocked downstream of tid=13's blocked work) never fire. The
originating peer producer is on canary tid=4/10/14 (per AUDIT-069 / step2 report), all of
which are alive in ours but doing different work (per AUDIT-069 S2: ours's tid=5 fires
γ-signalers 81× vs canary tid=10's 492× — ours is **under-producing** signals by ~84%).
This iterate's negative result on the branch-arm hypothesis sharpens the picture: the
divergence is NOT a single-branch lattice mismatch inside sub_821CB030's chain. **It's a
distributed multi-thread producer underrun**, with the wedge a downstream symptom of upstream
under-signaling on peer tids that ARE running in ours but executing a different (shorter)
trajectory.
## Tripstones honored
- **#28 (per-engine tid is not stable cross-engine identity)**: confirmed by Step 2 finding
that ours tid=13 ≡ canary tid=17 (same entry sub_821748F0). Branch-probe data uses
per-engine tid; cross-engine comparison done by (entry_pc, lr) tuple, not raw tid.
- **#32 (canary jitter in contention regions)**: not relevant here — both engines select eq
arm at 0x82452E0C 100% across all observed fires. No jitter.
- **#37 (vtable base vs slot-N)**: not encountered (no vtable read at 0x82452E0C).
- **#39 (composite progression vs matched-prefix)**: this iterate produces neither; an
informative null at the source-control-flow lens.
- **#40 (single-keystone hypothesis falsified before)**: Step 2's "single branch arm
divergence" framing was itself a candidate keystone. Falsified here.
## Cascade
- A (identify candidate branch PCs in DB): **PASS** with caveat. 1 candidate at depth 6.
Initial depth-4 scan returned 0 — pivot trigger fired.
- B (run both engines with branch-probe): **PASS**. 109 fires canary, 13 fires ours.
- C (find FIRST divergent branch in candidates): **NEGATIVE / informative null**. The single
candidate is not divergent (both engines take eq arm).
- D (attribute to one of 3 candidate causes): **MEDIUM**. Reframed as "loop-count
circular wedge with under-producing peer tids", which subsumes candidate 2 (shared CS-
protected ctx).
- E (recommend specific next iterate with LOC estimate): **PASS** (see below).
## Recommended next iterate
### Option 2.B — Args/return-value capture for NtQueryFullAttributesFile and key kernel APIs (~3050 LOC canary)
Extend canary's Phase A event log to populate `args_resolved` and `return_value` for:
- `NtQueryFullAttributesFile`
- `NtCreateFile` (cache:\\<hash> paths)
- `NtReadFile`, `NtWriteFile`
Compare canary tid=17's 9 NtQueryFullAttributesFile invocations against ours tid=13's 1 to find
the first divergence in cache-state. Cheap, high signal.
### Option 2.C — read-probe on ours tid=13 wait event memory (~20 LOC reusing AUDIT-068 S3)
Use `audit_68_host_mem_read_probe` to sample event handle `d5e23609d3948568`'s underlying
KEVENT struct in ours, at 100 µs cadence over the 3 ms wedge window. Capture the moment
(if any) when its `Header.SignalState` would transition. Validates whether the kernel
plumbing is correct vs the producer is simply absent.
### Option 2.D — peer-producer LR trace (~0 LOC; reuses existing `--lr-trace` infra)
Per AUDIT-069 S5, ours has 1 producer where canary has 25. Use existing `--lr-trace` at the
NtReleaseSemaphore call site `0x82450310` + NtSetEvent wrappers on ALL tids in ours, capture
which guest LRs fire during the 03 ms window. Diff vs canary's audit_69_event_signal_watch
JSONL → find which peer-tid call site is MISSING in ours.
**Best minimum-LOC next step: 2.D** (zero LOC, existing instrumentation; capture peer-producer
absence directly).
**Best disambiguating step: 2.B** (~3050 LOC) to pin upstream cache-state divergence.
## Honest assessment
- The 2-hour timebox was respected.
- Step 1 returned 0 candidates at initial depth; broadened to find 1; that 1 is non-divergent.
- The Step 2 report's source-level branch framing **does not survive contact with the
call-graph** at source-level. The control-flow divergence is at a higher level (loop count,
not branch arm).
- The wedge at `sub_821CB030+0x1AC` remains the symptom; the cause is the **absence of a
signaler on a peer tid in the 03 ms window**. That peer-tid absence is what 2.D would
directly identify.
- Confidence in pivoting to 2.D/2.B: **HIGH**.
## Artifacts produced
All under `xenia-rs/audit-runs/iterate-2A-branch-probe/`:
- `candidate-branches.csv` — Step 1 broadened search result (1 row).
- `canary-probe.stdout` / `.stderr` / `.lines` — canary 120 s cold run with branch probe.
- `ours-probe.stdout` / `.stderr` — ours `-n 50M` cold run with branch probe.
- `run-commands.txt` — exact CLIs used.
- `iterate2A-report.md` — this report.
LOC delta: 0 to engine code, 0 to canary code. Read-only investigation.
xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED. Both binaries (`xenia_canary_i2a.exe`,
`xrs-i2a`) are renamed copies; original binaries untouched.