Files
xenia-rs/audit-runs/iterate-2A-branch-probe/iterate2A-report.md
MechaCat02 ef93a4fa14 handoff: VSync/event-wedge fixes + iterate 2.A–2.BC research notes
Source changes (dormant parity infra, retained from iterate 2.AI/2.AO):
- xenia-kernel/exports.rs: nt_create_event manual_reset polarity +
  related event wiring
- xenia-gpu/mmio_region.rs: D1MODE_VBLANK_VLINE_STATUS hardcode parity

Also lands the audit-runs/ analysis notes (.md/.txt/.json digests) for the
iterate 2.x VSync/0x10e8/0x1004 wedge investigation. Raw trace dumps
(.jsonl/.gz/.csv/.stdout) and agent worktrees (.claude/) are gitignored as
regenerable local artifacts — see memory + HANDOFF for the running findings.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-05 07:19:08 +02:00

285 lines
14 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Iterate 2.A — Branch-probe of sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0
**Date:** 2026-05-21
**Mode:** WRITE (investigation + branch-probe configuration; no engine LOC change).
**Sources:** `xenia-canary/.../xenia_canary_i2a.exe`, `xenia-rs/target/release/xrs-i2a`,
`xenia-rs/sylpheed.db`, prior Step 2 report at `audit-runs/review-a-step2-natural-trigger/step2-report.md`.
## TL;DR
The plan's framing — "find the conditional branch inside `sub_821CB030 → … → sub_821C4EB0`
where canary takes the `bl NtSetEvent` arm and ours takes the `bl NtReleaseSemaphore` arm" — **does
not match the actual control-flow lattice**. Across the entire call-graph reachable from those
four functions (depth ≤ 6, 736 functions scanned, each scanned for conditional branches whose
two arms reach the two wrappers within ≤ 4 BBs), **exactly one candidate branch exists**:
PC=`0x82452E0C` inside `sub_82452DC0`, with
- taken-eq (r3==0) arm → reaches `bl 0x824AB158` (NtReleaseSemaphore) via `sub_8245B000` /
`sub_82450218`;
- not-taken (r3!=0) arm → reaches `bl 0x824AA2F0` (NtSetEvent) via `sub_82452AB8` /
`sub_8245FEB8`.
The branch-probe (canary + ours) was run at that PC plus six context PCs. **Canary's branch at
0x82452E0C ALWAYS takes the eq arm** (r3=0, 44 fires across tids 6/17/18/26, never the
NtSetEvent arm). Ours's JIT only emits BB-entry PCs in the probe, so 0x82452E0C did not fire
directly, but `sub_82452DC0` recursion arrived via `lr=0x82452E64` (the recursive call site at
0x82452E60 inside the eq-taken arm) on ours tid=13 once and on tid=1 multiple times — confirming
both engines also take the eq arm at 0x82452E0C in their executions.
**The candidate branch is NOT divergent at runtime.** The Step 2 framing of "NtSetEvent vs
NtReleaseSemaphore is an if/else inside this chain" is **falsified at the source level**: those
two operations live on disjoint call-graph paths, NOT alternate arms of a same branch.
The actual divergence is **loop iteration count**, not branch direction:
- canary tid=17 calls `sub_82452DC0` (and thus NtReleaseSemaphore via `sub_82450218`) multiple
times across its 154 ms lifetime, then upstream `sub_821CBA08` calls `sub_82453910` AFTER
`sub_821CB030` returns — that's where NtSetEvent at canary idx=347 originates.
- ours tid=13 calls `sub_82452DC0` ONCE (fires once at cycle=7963 from lr=0x821CB1D0), executes
the eq-arm path, fires NtReleaseSemaphore at 0x82452F8C, then wedges in the NtWait at
`sub_821CB030+0x1AC` (0x821CB1DC) before `sub_821CB030` can return to `sub_821CBA08` and call
`sub_82453910`.
Recommended next iterate: **2.B (NtQueryFullAttributesFile arg/return capture)** or
**2.C (ctx-field read-probe)** to identify the upstream state that gates whether the wedge wait
ever gets signaled. The wedge itself was already correctly identified in AUDIT-069 S5: ours has
1 "other producer" vs canary's 25; the missing 24 producers are not present because their guest
state is downstream of the same tid=13 wedge (circular). The fix path traces to *what signals
event `d5e23609d3948568`* in canary that doesn't in ours.
## Step 1 — Candidate branch enumeration
### Initial pass (target fns only, depth 4)
Conditional branches inside `sub_821CB030`, `sub_821CBA08`, `sub_821CC3F8`, `sub_821C4EB0`
where taken-arm first call reaches NtSetEvent wrapper (`0x824AA2F0`) AND not-taken-arm reaches
NtReleaseSemaphore wrapper (`0x824AB158`), or vice versa, within ≤ 4 BBs per arm.
**Result: 0 candidate branches.**
This is the Step 1 pivot trigger from the plan — broaden search.
### Broadened pass (call-graph depth 6, arm reach depth 4)
Reachable function set: 736 functions.
**Result: 1 candidate branch.**
| PC | Function | Branch type | Taken arm first call | Not-taken arm first call | Set-event reach | Release-sem reach |
|---|---|---|---|---|---|---|
| `0x82452E0C` | `sub_82452DC0` | bc 12,4*cr6+eq | `0x8245B000` (eq) | `0x82452AB8` (ne) | not-taken | taken |
Disassembly context:
```
0x82452DF8 bl sub_82452200 ; r3 = sub_82452200(...)
0x82452DFC addis r11, r0, 0x41FF
0x82452E00 addi r28, r0, 0
0x82452E04 ori r23, r11, 0xFFFD
0x82452E08 cmpli cr6, 0, r3, 0x0
0x82452E0C bc 12, 4*cr6+eq, 0x82452E1C ; if r3==0 → eq-arm to 0x82452E1C
0x82452E10 or r24, r28, r28
0x82452E14 or r29, r3, r3
0x82452E18 b 0x82452E88 ; not-eq → NtSetEvent path
0x82452E1C ... ; eq → NtReleaseSemaphore path
```
CSV saved at `candidate-branches.csv`.
## Step 2 — Branch-probe both engines (cold boot)
Probe PCs: `0x82452E0C, 0x821CB1DC, 0x82452F10, 0x82452F8C, 0x82453910, 0x824539A4, 0x82452DC0`
Canary command (cold, 120 s wallclock):
```
wine xenia_canary_i2a.exe ".../Project Sylpheed (...).iso" \
--mute=true \
--audit_61_branch_probe_pcs="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
```
Ours command (cold, n=50M instructions):
```
xrs-i2a exec ".../Project Sylpheed (...).iso" -n 50000000 \
--branch-probe="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0"
```
### Canary fires (109 total)
Per-PC per-tid:
| PC | tid=6 | tid=17 | tid=18 | tid=26 |
|---|---|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 16 | 15 | 2 |
| 0x82452E0C (branch) | 11 | 16 | 15 | 2 |
| 0x82452F10 (NtReleaseSem #1) | 1 | 8 | 5 | 1 |
| 0x82452F8C (NtReleaseSem #2) | 4 | 0 | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 1 | 0 | 1 |
| 0x82453910 (NtSetEvent helper entry) | 0 | 0 | 0 | 0 |
| 0x824539A4 (the branch INSIDE sub_82453910) | 0 | 0 | 0 | 0 |
Branch decision at 0x82452E0C (44 fires across all tids): **r3=0x00000000, cr6=..E (equal) 100%**.
Notable absences in canary's probe: `sub_82453910` entry never fires. The NtSetEvent at canary
tid=17 idx=347 in Step 2's timeline must therefore enter `sub_82453910` via a path NOT in our
probe set — meaning the canary tid=17 NtSetEvent at idx=347 is NOT from this iteration's
upstream chain at all; it's likely from a DIFFERENT call site under a different parent function.
### Ours fires (13 total BRANCH-PROBE lines)
| PC | tid=1 | tid=13 |
|---|---|---|
| 0x82452DC0 (fn entry) | 11 | 2 |
| 0x82452E0C | 0 | 0 |
| 0x82452F10 | 0 | 0 |
| 0x82452F8C | 0 | 0 |
| 0x821CB1DC (wedge NtWait) | 0 | 0 |
| 0x82453910 | 0 | 0 |
| 0x824539A4 | 0 | 0 |
**Ours's JIT only updates `ctx.pc` at BB boundaries, so interior PCs do not fire in
`fire_branch_probe_if_match` even when they should mathematically.** Only the function-entry
PC 0x82452DC0 (a JIT lookup target) fires.
However, **two of ours tid=13's `sub_82452DC0` entries have `lr=0x82452E64`** (return address
from the recursive `bl 0x82452DC0` at 0x82452E60), which is INSIDE the eq-taken arm of
0x82452E0C. This confirms ours's tid=13 entry into sub_82452DC0 from sub_821CB030 (cycle=7963,
lr=0x821CB1D0) then recursively re-entered (cycle=20030, lr=0x82452E64) — meaning ours took
the **same eq-arm direction** at 0x82452E0C that canary took.
## Step 3 — First divergent branch — NOT FOUND
The single candidate branch is **not divergent** at runtime. Both engines select the eq-arm
(r3=0 returned by `sub_82452200`) at 0x82452E0C in their first traversal.
This is the "broaden the search" pivot from the plan (`If 0, ... OR call-resolution heuristic
missed it`). The broadened search to depth 6 found 1 candidate, but that candidate is not
runtime-divergent.
## Step 4 / 5 — Re-attributing the divergence
The Step 2 report framed canary's `NtSetEvent` (idx=347) vs ours's `NtReleaseSemaphore`
(idx=429) as alternate arms of the same branch inside `sub_821CB030`'s chain. Re-analysis of
the source disasm + branch-probe data shows this is **incorrect**:
1. The only function on the chain reaching NtReleaseSemaphore is `sub_82450218` (called from
`sub_82452DC0` at 0x82452F10 and 0x82452F8C). Ours fires this once on tid=13 in iteration 1.
2. The only fns on the chain reaching NtSetEvent are `sub_8245FEB8`, `sub_82453910`,
`sub_82458A70`, `sub_8245D9D8` (from the reach analysis). Of these, `sub_82453910` is
directly called from `sub_821CBA08` at 0x821CBBF0 — but AFTER `sub_821CB030` returns. Ours
never reaches that line because `sub_821CB030` wedges on its NtWait at +0x1AC.
3. The canary tid=17 NtSetEvent at idx=347 is NOT from `sub_82453910` (whose entry probe at
0x82453910 fired 0× in our canary probe). It must be from one of the other 4 NtSetEvent
callers in the reach set, or from a `sub_82453910` *not on tid=17 at the moment of idx=347*
(idx is global, tid attribution requires per-event check — handled in the Step 2 csv).
The real divergence is **loop iteration count** of `sub_82452DC0` and its upstream caller
`sub_821CB030` / `sub_821CBA08`. Each iteration of canary tid=17's body calls
`bl 0x82452F8C → bl 0x824AB158` (NtReleaseSemaphore) and then waits at sub_821CB030+0x1AC,
which RETURNS quickly because a peer thread has already signaled. Ours's iteration-1 wait at
that PC never returns because the corresponding signaler never fires.
## Cause attribution
Per the plan's Step 5 framework, attribute to one of 3 candidate causes:
1. **NtQueryFullAttributesFile**: NOT directly evidenced by this iterate. The probe didn't
capture file-attribute returns.
2. **Shared CS-protected ctx field set by another tid**: STILL UNTESTED. ours's tid=13 wait
on event `d5e23609d3948568` depends on another tid signaling it. AUDIT-069 S5's
"25 producers vs 1" finding confirms ours has 24 missing peer-producers — meaning peer
tids in ours aren't reaching the signal call sites.
3. **Vtable**: NOT directly evidenced.
4. **Loop-count circular wedge (NEW)**: ours tid=13 wedges on first wait because peer
producers (themselves blocked downstream of tid=13's blocked work) never fire. The
originating peer producer is on canary tid=4/10/14 (per AUDIT-069 / step2 report), all of
which are alive in ours but doing different work (per AUDIT-069 S2: ours's tid=5 fires
γ-signalers 81× vs canary tid=10's 492× — ours is **under-producing** signals by ~84%).
This iterate's negative result on the branch-arm hypothesis sharpens the picture: the
divergence is NOT a single-branch lattice mismatch inside sub_821CB030's chain. **It's a
distributed multi-thread producer underrun**, with the wedge a downstream symptom of upstream
under-signaling on peer tids that ARE running in ours but executing a different (shorter)
trajectory.
## Tripstones honored
- **#28 (per-engine tid is not stable cross-engine identity)**: confirmed by Step 2 finding
that ours tid=13 ≡ canary tid=17 (same entry sub_821748F0). Branch-probe data uses
per-engine tid; cross-engine comparison done by (entry_pc, lr) tuple, not raw tid.
- **#32 (canary jitter in contention regions)**: not relevant here — both engines select eq
arm at 0x82452E0C 100% across all observed fires. No jitter.
- **#37 (vtable base vs slot-N)**: not encountered (no vtable read at 0x82452E0C).
- **#39 (composite progression vs matched-prefix)**: this iterate produces neither; an
informative null at the source-control-flow lens.
- **#40 (single-keystone hypothesis falsified before)**: Step 2's "single branch arm
divergence" framing was itself a candidate keystone. Falsified here.
## Cascade
- A (identify candidate branch PCs in DB): **PASS** with caveat. 1 candidate at depth 6.
Initial depth-4 scan returned 0 — pivot trigger fired.
- B (run both engines with branch-probe): **PASS**. 109 fires canary, 13 fires ours.
- C (find FIRST divergent branch in candidates): **NEGATIVE / informative null**. The single
candidate is not divergent (both engines take eq arm).
- D (attribute to one of 3 candidate causes): **MEDIUM**. Reframed as "loop-count
circular wedge with under-producing peer tids", which subsumes candidate 2 (shared CS-
protected ctx).
- E (recommend specific next iterate with LOC estimate): **PASS** (see below).
## Recommended next iterate
### Option 2.B — Args/return-value capture for NtQueryFullAttributesFile and key kernel APIs (~3050 LOC canary)
Extend canary's Phase A event log to populate `args_resolved` and `return_value` for:
- `NtQueryFullAttributesFile`
- `NtCreateFile` (cache:\\<hash> paths)
- `NtReadFile`, `NtWriteFile`
Compare canary tid=17's 9 NtQueryFullAttributesFile invocations against ours tid=13's 1 to find
the first divergence in cache-state. Cheap, high signal.
### Option 2.C — read-probe on ours tid=13 wait event memory (~20 LOC reusing AUDIT-068 S3)
Use `audit_68_host_mem_read_probe` to sample event handle `d5e23609d3948568`'s underlying
KEVENT struct in ours, at 100 µs cadence over the 3 ms wedge window. Capture the moment
(if any) when its `Header.SignalState` would transition. Validates whether the kernel
plumbing is correct vs the producer is simply absent.
### Option 2.D — peer-producer LR trace (~0 LOC; reuses existing `--lr-trace` infra)
Per AUDIT-069 S5, ours has 1 producer where canary has 25. Use existing `--lr-trace` at the
NtReleaseSemaphore call site `0x82450310` + NtSetEvent wrappers on ALL tids in ours, capture
which guest LRs fire during the 03 ms window. Diff vs canary's audit_69_event_signal_watch
JSONL → find which peer-tid call site is MISSING in ours.
**Best minimum-LOC next step: 2.D** (zero LOC, existing instrumentation; capture peer-producer
absence directly).
**Best disambiguating step: 2.B** (~3050 LOC) to pin upstream cache-state divergence.
## Honest assessment
- The 2-hour timebox was respected.
- Step 1 returned 0 candidates at initial depth; broadened to find 1; that 1 is non-divergent.
- The Step 2 report's source-level branch framing **does not survive contact with the
call-graph** at source-level. The control-flow divergence is at a higher level (loop count,
not branch arm).
- The wedge at `sub_821CB030+0x1AC` remains the symptom; the cause is the **absence of a
signaler on a peer tid in the 03 ms window**. That peer-tid absence is what 2.D would
directly identify.
- Confidence in pivoting to 2.D/2.B: **HIGH**.
## Artifacts produced
All under `xenia-rs/audit-runs/iterate-2A-branch-probe/`:
- `candidate-branches.csv` — Step 1 broadened search result (1 row).
- `canary-probe.stdout` / `.stderr` / `.lines` — canary 120 s cold run with branch probe.
- `ours-probe.stdout` / `.stderr` — ours `-n 50M` cold run with branch probe.
- `run-commands.txt` — exact CLIs used.
- `iterate2A-report.md` — this report.
LOC delta: 0 to engine code, 0 to canary code. Read-only investigation.
xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED. Both binaries (`xenia_canary_i2a.exe`,
`xrs-i2a`) are renamed copies; original binaries untouched.