# Iterate 2.A — Branch-probe of sub_821CB030 → sub_821CBA08 → sub_821CC3F8 → sub_821C4EB0 **Date:** 2026-05-21 **Mode:** WRITE (investigation + branch-probe configuration; no engine LOC change). **Sources:** `xenia-canary/.../xenia_canary_i2a.exe`, `xenia-rs/target/release/xrs-i2a`, `xenia-rs/sylpheed.db`, prior Step 2 report at `audit-runs/review-a-step2-natural-trigger/step2-report.md`. ## TL;DR The plan's framing — "find the conditional branch inside `sub_821CB030 → … → sub_821C4EB0` where canary takes the `bl NtSetEvent` arm and ours takes the `bl NtReleaseSemaphore` arm" — **does not match the actual control-flow lattice**. Across the entire call-graph reachable from those four functions (depth ≤ 6, 736 functions scanned, each scanned for conditional branches whose two arms reach the two wrappers within ≤ 4 BBs), **exactly one candidate branch exists**: PC=`0x82452E0C` inside `sub_82452DC0`, with - taken-eq (r3==0) arm → reaches `bl 0x824AB158` (NtReleaseSemaphore) via `sub_8245B000` / `sub_82450218`; - not-taken (r3!=0) arm → reaches `bl 0x824AA2F0` (NtSetEvent) via `sub_82452AB8` / `sub_8245FEB8`. The branch-probe (canary + ours) was run at that PC plus six context PCs. **Canary's branch at 0x82452E0C ALWAYS takes the eq arm** (r3=0, 44 fires across tids 6/17/18/26, never the NtSetEvent arm). Ours's JIT only emits BB-entry PCs in the probe, so 0x82452E0C did not fire directly, but `sub_82452DC0` recursion arrived via `lr=0x82452E64` (the recursive call site at 0x82452E60 inside the eq-taken arm) on ours tid=13 once and on tid=1 multiple times — confirming both engines also take the eq arm at 0x82452E0C in their executions. **The candidate branch is NOT divergent at runtime.** The Step 2 framing of "NtSetEvent vs NtReleaseSemaphore is an if/else inside this chain" is **falsified at the source level**: those two operations live on disjoint call-graph paths, NOT alternate arms of a same branch. The actual divergence is **loop iteration count**, not branch direction: - canary tid=17 calls `sub_82452DC0` (and thus NtReleaseSemaphore via `sub_82450218`) multiple times across its 154 ms lifetime, then upstream `sub_821CBA08` calls `sub_82453910` AFTER `sub_821CB030` returns — that's where NtSetEvent at canary idx=347 originates. - ours tid=13 calls `sub_82452DC0` ONCE (fires once at cycle=7963 from lr=0x821CB1D0), executes the eq-arm path, fires NtReleaseSemaphore at 0x82452F8C, then wedges in the NtWait at `sub_821CB030+0x1AC` (0x821CB1DC) before `sub_821CB030` can return to `sub_821CBA08` and call `sub_82453910`. Recommended next iterate: **2.B (NtQueryFullAttributesFile arg/return capture)** or **2.C (ctx-field read-probe)** to identify the upstream state that gates whether the wedge wait ever gets signaled. The wedge itself was already correctly identified in AUDIT-069 S5: ours has 1 "other producer" vs canary's 25; the missing 24 producers are not present because their guest state is downstream of the same tid=13 wedge (circular). The fix path traces to *what signals event `d5e23609d3948568`* in canary that doesn't in ours. ## Step 1 — Candidate branch enumeration ### Initial pass (target fns only, depth 4) Conditional branches inside `sub_821CB030`, `sub_821CBA08`, `sub_821CC3F8`, `sub_821C4EB0` where taken-arm first call reaches NtSetEvent wrapper (`0x824AA2F0`) AND not-taken-arm reaches NtReleaseSemaphore wrapper (`0x824AB158`), or vice versa, within ≤ 4 BBs per arm. **Result: 0 candidate branches.** This is the Step 1 pivot trigger from the plan — broaden search. ### Broadened pass (call-graph depth 6, arm reach depth 4) Reachable function set: 736 functions. **Result: 1 candidate branch.** | PC | Function | Branch type | Taken arm first call | Not-taken arm first call | Set-event reach | Release-sem reach | |---|---|---|---|---|---|---| | `0x82452E0C` | `sub_82452DC0` | bc 12,4*cr6+eq | `0x8245B000` (eq) | `0x82452AB8` (ne) | not-taken | taken | Disassembly context: ``` 0x82452DF8 bl sub_82452200 ; r3 = sub_82452200(...) 0x82452DFC addis r11, r0, 0x41FF 0x82452E00 addi r28, r0, 0 0x82452E04 ori r23, r11, 0xFFFD 0x82452E08 cmpli cr6, 0, r3, 0x0 0x82452E0C bc 12, 4*cr6+eq, 0x82452E1C ; if r3==0 → eq-arm to 0x82452E1C 0x82452E10 or r24, r28, r28 0x82452E14 or r29, r3, r3 0x82452E18 b 0x82452E88 ; not-eq → NtSetEvent path 0x82452E1C ... ; eq → NtReleaseSemaphore path ``` CSV saved at `candidate-branches.csv`. ## Step 2 — Branch-probe both engines (cold boot) Probe PCs: `0x82452E0C, 0x821CB1DC, 0x82452F10, 0x82452F8C, 0x82453910, 0x824539A4, 0x82452DC0` Canary command (cold, 120 s wallclock): ``` wine xenia_canary_i2a.exe ".../Project Sylpheed (...).iso" \ --mute=true \ --audit_61_branch_probe_pcs="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0" ``` Ours command (cold, n=50M instructions): ``` xrs-i2a exec ".../Project Sylpheed (...).iso" -n 50000000 \ --branch-probe="0x82452E0C,0x821CB1DC,0x82452F10,0x82452F8C,0x82453910,0x824539A4,0x82452DC0" ``` ### Canary fires (109 total) Per-PC per-tid: | PC | tid=6 | tid=17 | tid=18 | tid=26 | |---|---|---|---|---| | 0x82452DC0 (fn entry) | 11 | 16 | 15 | 2 | | 0x82452E0C (branch) | 11 | 16 | 15 | 2 | | 0x82452F10 (NtReleaseSem #1) | 1 | 8 | 5 | 1 | | 0x82452F8C (NtReleaseSem #2) | 4 | 0 | 0 | 0 | | 0x821CB1DC (wedge NtWait) | 0 | 1 | 0 | 1 | | 0x82453910 (NtSetEvent helper entry) | 0 | 0 | 0 | 0 | | 0x824539A4 (the branch INSIDE sub_82453910) | 0 | 0 | 0 | 0 | Branch decision at 0x82452E0C (44 fires across all tids): **r3=0x00000000, cr6=..E (equal) 100%**. Notable absences in canary's probe: `sub_82453910` entry never fires. The NtSetEvent at canary tid=17 idx=347 in Step 2's timeline must therefore enter `sub_82453910` via a path NOT in our probe set — meaning the canary tid=17 NtSetEvent at idx=347 is NOT from this iteration's upstream chain at all; it's likely from a DIFFERENT call site under a different parent function. ### Ours fires (13 total BRANCH-PROBE lines) | PC | tid=1 | tid=13 | |---|---|---| | 0x82452DC0 (fn entry) | 11 | 2 | | 0x82452E0C | 0 | 0 | | 0x82452F10 | 0 | 0 | | 0x82452F8C | 0 | 0 | | 0x821CB1DC (wedge NtWait) | 0 | 0 | | 0x82453910 | 0 | 0 | | 0x824539A4 | 0 | 0 | **Ours's JIT only updates `ctx.pc` at BB boundaries, so interior PCs do not fire in `fire_branch_probe_if_match` even when they should mathematically.** Only the function-entry PC 0x82452DC0 (a JIT lookup target) fires. However, **two of ours tid=13's `sub_82452DC0` entries have `lr=0x82452E64`** (return address from the recursive `bl 0x82452DC0` at 0x82452E60), which is INSIDE the eq-taken arm of 0x82452E0C. This confirms ours's tid=13 entry into sub_82452DC0 from sub_821CB030 (cycle=7963, lr=0x821CB1D0) then recursively re-entered (cycle=20030, lr=0x82452E64) — meaning ours took the **same eq-arm direction** at 0x82452E0C that canary took. ## Step 3 — First divergent branch — NOT FOUND The single candidate branch is **not divergent** at runtime. Both engines select the eq-arm (r3=0 returned by `sub_82452200`) at 0x82452E0C in their first traversal. This is the "broaden the search" pivot from the plan (`If 0, ... OR call-resolution heuristic missed it`). The broadened search to depth 6 found 1 candidate, but that candidate is not runtime-divergent. ## Step 4 / 5 — Re-attributing the divergence The Step 2 report framed canary's `NtSetEvent` (idx=347) vs ours's `NtReleaseSemaphore` (idx=429) as alternate arms of the same branch inside `sub_821CB030`'s chain. Re-analysis of the source disasm + branch-probe data shows this is **incorrect**: 1. The only function on the chain reaching NtReleaseSemaphore is `sub_82450218` (called from `sub_82452DC0` at 0x82452F10 and 0x82452F8C). Ours fires this once on tid=13 in iteration 1. 2. The only fns on the chain reaching NtSetEvent are `sub_8245FEB8`, `sub_82453910`, `sub_82458A70`, `sub_8245D9D8` (from the reach analysis). Of these, `sub_82453910` is directly called from `sub_821CBA08` at 0x821CBBF0 — but AFTER `sub_821CB030` returns. Ours never reaches that line because `sub_821CB030` wedges on its NtWait at +0x1AC. 3. The canary tid=17 NtSetEvent at idx=347 is NOT from `sub_82453910` (whose entry probe at 0x82453910 fired 0× in our canary probe). It must be from one of the other 4 NtSetEvent callers in the reach set, or from a `sub_82453910` *not on tid=17 at the moment of idx=347* (idx is global, tid attribution requires per-event check — handled in the Step 2 csv). The real divergence is **loop iteration count** of `sub_82452DC0` and its upstream caller `sub_821CB030` / `sub_821CBA08`. Each iteration of canary tid=17's body calls `bl 0x82452F8C → bl 0x824AB158` (NtReleaseSemaphore) and then waits at sub_821CB030+0x1AC, which RETURNS quickly because a peer thread has already signaled. Ours's iteration-1 wait at that PC never returns because the corresponding signaler never fires. ## Cause attribution Per the plan's Step 5 framework, attribute to one of 3 candidate causes: 1. **NtQueryFullAttributesFile**: NOT directly evidenced by this iterate. The probe didn't capture file-attribute returns. 2. **Shared CS-protected ctx field set by another tid**: STILL UNTESTED. ours's tid=13 wait on event `d5e23609d3948568` depends on another tid signaling it. AUDIT-069 S5's "25 producers vs 1" finding confirms ours has 24 missing peer-producers — meaning peer tids in ours aren't reaching the signal call sites. 3. **Vtable**: NOT directly evidenced. 4. **Loop-count circular wedge (NEW)**: ours tid=13 wedges on first wait because peer producers (themselves blocked downstream of tid=13's blocked work) never fire. The originating peer producer is on canary tid=4/10/14 (per AUDIT-069 / step2 report), all of which are alive in ours but doing different work (per AUDIT-069 S2: ours's tid=5 fires γ-signalers 81× vs canary tid=10's 492× — ours is **under-producing** signals by ~84%). This iterate's negative result on the branch-arm hypothesis sharpens the picture: the divergence is NOT a single-branch lattice mismatch inside sub_821CB030's chain. **It's a distributed multi-thread producer underrun**, with the wedge a downstream symptom of upstream under-signaling on peer tids that ARE running in ours but executing a different (shorter) trajectory. ## Tripstones honored - **#28 (per-engine tid is not stable cross-engine identity)**: confirmed by Step 2 finding that ours tid=13 ≡ canary tid=17 (same entry sub_821748F0). Branch-probe data uses per-engine tid; cross-engine comparison done by (entry_pc, lr) tuple, not raw tid. - **#32 (canary jitter in contention regions)**: not relevant here — both engines select eq arm at 0x82452E0C 100% across all observed fires. No jitter. - **#37 (vtable base vs slot-N)**: not encountered (no vtable read at 0x82452E0C). - **#39 (composite progression vs matched-prefix)**: this iterate produces neither; an informative null at the source-control-flow lens. - **#40 (single-keystone hypothesis falsified before)**: Step 2's "single branch arm divergence" framing was itself a candidate keystone. Falsified here. ## Cascade - A (identify candidate branch PCs in DB): **PASS** with caveat. 1 candidate at depth 6. Initial depth-4 scan returned 0 — pivot trigger fired. - B (run both engines with branch-probe): **PASS**. 109 fires canary, 13 fires ours. - C (find FIRST divergent branch in candidates): **NEGATIVE / informative null**. The single candidate is not divergent (both engines take eq arm). - D (attribute to one of 3 candidate causes): **MEDIUM**. Reframed as "loop-count circular wedge with under-producing peer tids", which subsumes candidate 2 (shared CS- protected ctx). - E (recommend specific next iterate with LOC estimate): **PASS** (see below). ## Recommended next iterate ### Option 2.B — Args/return-value capture for NtQueryFullAttributesFile and key kernel APIs (~30–50 LOC canary) Extend canary's Phase A event log to populate `args_resolved` and `return_value` for: - `NtQueryFullAttributesFile` - `NtCreateFile` (cache:\\ paths) - `NtReadFile`, `NtWriteFile` Compare canary tid=17's 9 NtQueryFullAttributesFile invocations against ours tid=13's 1 to find the first divergence in cache-state. Cheap, high signal. ### Option 2.C — read-probe on ours tid=13 wait event memory (~20 LOC reusing AUDIT-068 S3) Use `audit_68_host_mem_read_probe` to sample event handle `d5e23609d3948568`'s underlying KEVENT struct in ours, at 100 µs cadence over the 3 ms wedge window. Capture the moment (if any) when its `Header.SignalState` would transition. Validates whether the kernel plumbing is correct vs the producer is simply absent. ### Option 2.D — peer-producer LR trace (~0 LOC; reuses existing `--lr-trace` infra) Per AUDIT-069 S5, ours has 1 producer where canary has 25. Use existing `--lr-trace` at the NtReleaseSemaphore call site `0x82450310` + NtSetEvent wrappers on ALL tids in ours, capture which guest LRs fire during the 0–3 ms window. Diff vs canary's audit_69_event_signal_watch JSONL → find which peer-tid call site is MISSING in ours. **Best minimum-LOC next step: 2.D** (zero LOC, existing instrumentation; capture peer-producer absence directly). **Best disambiguating step: 2.B** (~30–50 LOC) to pin upstream cache-state divergence. ## Honest assessment - The 2-hour timebox was respected. - Step 1 returned 0 candidates at initial depth; broadened to find 1; that 1 is non-divergent. - The Step 2 report's source-level branch framing **does not survive contact with the call-graph** at source-level. The control-flow divergence is at a higher level (loop count, not branch arm). - The wedge at `sub_821CB030+0x1AC` remains the symptom; the cause is the **absence of a signaler on a peer tid in the 0–3 ms window**. That peer-tid absence is what 2.D would directly identify. - Confidence in pivoting to 2.D/2.B: **HIGH**. ## Artifacts produced All under `xenia-rs/audit-runs/iterate-2A-branch-probe/`: - `candidate-branches.csv` — Step 1 broadened search result (1 row). - `canary-probe.stdout` / `.stderr` / `.lines` — canary 120 s cold run with branch probe. - `ours-probe.stdout` / `.stderr` — ours `-n 50M` cold run with branch probe. - `run-commands.txt` — exact CLIs used. - `iterate2A-report.md` — this report. LOC delta: 0 to engine code, 0 to canary code. Read-only investigation. xenia-rs HEAD UNCHANGED. canary HEAD UNCHANGED. Both binaries (`xenia_canary_i2a.exe`, `xrs-i2a`) are renamed copies; original binaries untouched.